Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

NVIDIA Cosmos - Open Source | Evermx | Evermx

Back to Open Source

Trending

NVIDIA Cosmos

NVIDIAOpenMDW-1.1

View on GitHub

Multimodal9.1K Stars582 Forks62 views

NVIDIA Cosmos is an open platform of world models, datasets, and tooling for building Physical AI. The June 2026 Cosmos 3 launch brought the project to 9,100+ GitHub stars and 580+ forks under the OpenMDW-1.1 license. Unlike a single model release, Cosmos is structured as two complementary runtime surfaces, the Reasoner and the Generator, sharing a common dataset and deployment stack aimed at robots, autonomous vehicles, and smart-infrastructure workloads. ## Two Surfaces, One Platform The Reasoner takes text and vision inputs and produces text outputs. Its job is world understanding, physical reasoning, and task planning, the cognitive layer of an embodied system. The Generator takes text, vision, sound, and action inputs and produces images, videos, synchronized audio, and action sequences, the perception and motor layer. The split mirrors how a real physical agent works: a planner that decides what should happen next, and a generator that imagines what that would look like and what action to take. Splitting them lets each side scale independently and lets teams adopt only the part they need. ## Model Family: Nano and Super Cosmos 3 ships two main sizes, Cosmos3-Nano at 16B parameters and Cosmos3-Super at 64B parameters, with specialized variants for text-to-image, image-to-video, and robot policy. The Nano size is the practical entry point for research and prototyping, while Super is aimed at higher-fidelity world generation and longer-horizon planning. Resolution ranges from 256p up to 720p, and frame rates from 10 to 30 FPS, which is the right operating envelope for both training data generation and on-device perception use cases. ## Datasets Treated as a First-Class Output The dataset side of Cosmos is unusually emphasized. NVIDIA ships curated multimodal datasets covering video, audio, and action trajectories that researchers can use to train their own physical AI models, not just consume the Cosmos models themselves. This matters because the bottleneck in robotics and AV today is data quality, not model architecture, and a permissively licensed action-trajectory dataset is genuinely scarce in open source. ## Production Deployment Path Cosmos is engineered for production, not just paper-tracking. The repository documents two deployment paths: research and development via Diffusers and Transformers for fast iteration, and production via vLLM-Omni, vLLM, and NIM containers for serving at scale. The vLLM-Omni path in particular is interesting because it extends vLLM's text-serving primitives to multimodal generation, which is the right direction for serving a Reasoner-plus-Generator stack with consistent latency and batching guarantees. ## OpenMDW License The OpenMDW-1.1 license used by Cosmos is unusual but deliberate. It is designed to balance permissive research use with some downstream protections, sitting between Apache-2.0 and the more restrictive responsible-use licenses that some labs have shipped. Teams considering Cosmos for commercial robotics or AV deployments should read the license carefully, but in practice it has been treated as workable for most production scenarios. ## Why Open World Models Matter Proprietary world models exist at most major labs, but very few are open. Cosmos changes that by shipping not just the weights but the training datasets, the deployment containers, and a coherent Reasoner-plus-Generator architecture. For research groups working on robotics, simulation, or autonomous driving, this collapses a year of infrastructure work into a working baseline. ## Limitations World models are compute-hungry. Cosmos3-Super at 64B parameters is not a workstation workload, and even Cosmos3-Nano benefits substantially from a recent NVIDIA datacenter GPU. The Jupyter Notebook-heavy presentation in the repo is friendly for evaluation but means production teams will end up extracting code into proper services. As with any large physical-AI model, generated video and action sequences should be treated as plausible hypotheses rather than ground truth, and safety-critical use (robotic control, autonomous driving decisions) requires the usual layered safeguards on top of the model output.

Key Features

Reasoner surface for world understanding and task planning (text and vision in, text out)
Generator surface producing images, video, synchronized audio, and action sequences
Cosmos3-Nano (16B) and Cosmos3-Super (64B) model family
Specialized variants for text-to-image, image-to-video, and robot policy
Multi-resolution output: 256p to 720p, 10 to 30 FPS
Curated multimodal datasets including action trajectories
Research deployment via Diffusers and Transformers
Production deployment via vLLM-Omni, vLLM, and NIM containers
Designed for robotics, autonomous vehicles, and smart infrastructure
Open weights and datasets under OpenMDW-1.1