Nvidia’s Cosmos Cause 2 goals to deliver reasoning VLMs into the bodily world



Nvidia CEO Jensen Huang stated final yr that we are now getting into the age of bodily AI. Whereas the firm continues to provide LLMs for software program use circumstances, Nvidia is increasingly positioning itself as a supplier of AI fashions for totally AI-powered programs — together with agentic AI in the bodily world.

At CES 2026, Nvidia introduced a slate of latest fashions designed to push AI brokers past chat interfaces and into bodily environments.

Nvidia launched Cosmos Reason 2, the newest model of its vision-language mannequin designed for embodied reasoning. Cosmos Cause 1, released last year, launched a two-dimensional ontology for embodied reasoning and at present leads Hugging Face’s physical reasoning for video leaderboard.

Cosmos Cause 2 builds on the identical ontology whereas giving enterprises extra flexibility to customise purposes and enabling bodily brokers to plan their subsequent actions, comparable to how software-based brokers cause by way of digital workflows.

Nvidia additionally launched a brand new model of Cosmos Switch, a mannequin that lets builders generate training simulations for robots.

Different vision-language fashions, akin to Google’s PaliGemma and Pixtral Large from Mistral, can course of visible inputs, however not all commercially obtainable VLMs help reasoning.

“Robotics is at an inflection level. We are transferring from specialist robots restricted to single duties to generalist specialist programs,” stated Kari Briski, Nvidia vp for generative AI software program, in a briefing with reporters. She was referring to robots that mix broad foundational data with deep task-specific expertise. “These new robots mix broad elementary data with deep proficiency and sophisticated duties.”

She added that Cosmos Cause 2 “enhances the reasoning capabilities that robots want to navigate the unpredictable bodily world.”

Shifting to bodily brokers

Briski famous that Nvidia’s roadmap follows “the identical sample of property throughout all of our open fashions.”

“In constructing specialised AI brokers, a digital workforce, or the bodily embodiment of AI in robots and autonomous autos, extra than simply the mannequin is wanted,” Briski stated. “First, the AI wants the compute assets to prepare, simulate the world round it. Information is the gasoline for AI to be taught and enhance and we contribute to the world’s largest assortment of open and numerous datasets, going past simply opening the weights of the fashions. The open libraries and coaching scripts give builders the instruments to purpose-build AI for his or her purposes, and we publish blueprints and examples to assist deploy AI as programs of fashions.”

The corporate now has open fashions particularly for bodily AI in Cosmos, robotics, with the open-reasoning vision-language-action (VLA) mannequin Gr00t and its Nemotron fashions for agentic AI. 

Nvidia is making the case that open fashions throughout completely different branches of AI type a shared enterprise ecosystem that feeds information, coaching, and reasoning to brokers in each the digital and bodily worlds. 

Additions to the Nemotron household

Briski stated Nvidia plans to proceed increasing its open fashions, together with its Nemotron household, past reasoning to embody a brand new RAG and embeddings mannequin to make information extra available to brokers. The corporate released Nemotron 3, the newest model of its agentic reasoning fashions, in December. 

Nvidia introduced three new additions to the Nemotron household: Nemotron Speech, Nemotron RAG and Nemotron Security. 

In a weblog submit, Nvidia stated Nemotron Speech delivers “real-time low-latency speech recognition for dwell captions and speech AI purposes” and is 10 occasions sooner than different speech fashions. 

Nemotron RAG is technically comprised of two fashions: an embedding mannequin and a rerank mannequin, each of which may perceive photos to present extra multimodal insights that information brokers will faucet. 

“Nemotron RAG is on prime of what we name the MMTab, or the Large Multilingual Textual content Embedding Benchmark, with robust multilingual efficiency whereas utilizing much less computing energy reminiscence, in order that they are match for programs that should deal with a variety of requests in a short time and with low delay,” Briski stated. 

Nemotron Security detects delicate information so AI brokers do not by accident unleash personally identifiable information.




Disclaimer: This article is sourced from external platforms. OverBeta has not independently verified the information. Readers are advised to verify details before relying on them.

0
Show Comments (0) Hide Comments (0)
0 0 votes
Article Rating
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Stay Updated!

Subscribe to get the latest blog posts, news, and updates delivered straight to your inbox.