Nvidia has unveiled new open artificial intelligence models aimed at equipping robots and autonomous vehicles with human-like “common sense,” accelerating the development of physical AI systems capable of interacting safely with the real world.
The centerpiece of these innovations is Alpamayo-R1 (AR1), described as the first industrial-scale, open Vision-Language-Action (VLA) model for mobility. It was presented during the NeurIPS 2024 conference and the Nemotron Summit.
AR1 integrates complex “chain-of-thought” reasoning with path planning to enhance safety in challenging environments. This allows autonomous systems to mimic human judgment, such as stopping for pedestrians or diverting from a bike lane to avoid collisions.
The model works by breaking down complex situations into logical steps, evaluating possible trajectories, and selecting the optimal route. This capability is crucial for making self-driving cars and robots more reliable and secure.
AR1 builds upon Cosmos-Reason, a foundational multimodal reasoning model launched earlier and updated in August. Its decision-making abilities are further refined through post-training with reinforcement learning, utilizing synthetic data to simulate real-world scenarios.
For developers and researchers, AR1 is available for non-commercial use on platforms like GitHub and Hugging Face. It includes a subset of training and evaluation data within the NVIDIA Physical AI Open Datasets.
Accompanying AR1, Nvidia released AlpaSim, an open framework on GitHub designed to evaluate AR1’s performance in autonomous vehicle simulations. This provides a standardized way to test the model’s effectiveness.
Nvidia also introduced the Cosmos Cookbook, an open-source guide available on GitHub. It offers step-by-step recipes, rapid inference examples, and advanced workflows, covering data curation, synthetic generation, and model evaluation.
These resources are designed to democratize physical AI development, making it easier for a global community of developers to adapt and build upon Nvidia’s advancements. The company emphasized fostering international collaboration.
Jensen Huang, Nvidia’s CEO, underscored the company’s vision, stating, “The next wave of AI is physical AI.” This philosophy drives the development of technology that bridges digital and real-world interactions.
Nvidia’s chief scientist, Bill Dally, previously noted in a summer interview that robots are poised to become a significant global player. He emphasized the company’s intent to develop the “brains of all the robots” by building key technologies.
Bryan Catanzaro, Nvidia’s Vice President of Applied Deep Learning Research, highlighted the company’s leadership in the Artificial Analysis Openness Index. This is attributed to its permissive licenses, data transparency, and detailed technical disclosures.
The company showcased its commitment to open innovation with over 70 publications, talks, and workshops at NeurIPS. These initiatives aim to accelerate breakthroughs in AI reasoning, medical applications, and autonomous vehicles.
Beyond AR1, other specific applications highlighted include LidarGen, which generates LiDAR data for driving simulations, and Omniverse NuRec Fixer, which corrects artifacts in neurally reconstructed data.
Cosmos Policy helps convert video models into robust policies for robots, while ProtoMotions3 is a GPU-accelerated framework that trains digital humans and humanoid robots using realistic scenes from Cosmos World Foundation Models (WFMs).
These tools integrate with platforms such as Isaac Lab and Isaac Sim, enabling the training of advanced robotics models like GR00T N.
Partners like Voxel51, 1X, Figure AI, Gatik, Oxa, and X-Humanoid, along with researchers from ETH Zurich, are already leveraging Cosmos WFMs. They use these models to create intricate 3D scenes for diverse physical AI applications, from logistics to humanoids.
The immediate availability of these resources positions Nvidia as a key enabler in the shift toward autonomous mobility and robotics. The goal is to create a future where vehicles and robots can “think” with human-like intelligence, reducing risks and expanding capabilities.
