According to an April 28 announcement from the official NVIDIA blog (author Kari Briski), NVIDIA released Nemotron 3 Nano Omni — an open-source multimodal model that integrates visual, speech, and language capabilities into a single model, aiming to provide an “perception layer” for AI agent systems with lower latency and lower cost.
Key specifications: 30B-A3B MoE, 256K context, 9x throughput, topped 6 leaderboard rankings
Key architecture:
30B-A3B hybrid mixture-of-experts (total parameters 30B, activated 3B)
Integrates Conv3D and EVS encoding
256K context length
Inputs: text, images, audio, video, documents, charts, GUI screen
Outputs: text
Performance signals: 9x throughput under the same level of interaction compared with other open-source omni models; ranked first on 6 benchmark leaderboard rankings across three categories—document intelligence, video understanding, and audio understanding (NVIDIA’s announcement does not list specific scores, guiding readers to the developer blog for details).
NVIDIA positions Nemotron 3 Nano Omni as the “eyes and ears” in agent systems. It can分業 with other family models such as Nemotron 3 Super (high-frequency execution) and Nemotron 3 Ultra (complex planning), and it can also interoperate with third-party cloud models. Three typical agent application scenarios:
Computer Use Agent: native visual reasoning at 1920×1080 resolution
Document intelligence: cross-layout, tables, screenshots, and mixed-media input reasoning
Audio/video understanding: integrate speech, scenes, and recordings into a single reasoning chain
Adopting lineup: Hon Hai, Palantir join; H Company CEO makes a named statement
NVIDIA’s announcement clearly distinguishes “production adoption” from “under evaluation”:
Production adoption: Aible, Applied Scientific Intelligence (ASI), Eka Care, Hon Hai (Foxconn), H Company, Palantir, Pyler
Under evaluation: Amdocs, Dell, Docusign, Infosys, IQVIA, Lila, Oracle, Quantiphi, TCS, Zefr, etc.
H Company CEO Gautier Cloix makes a named statement in the announcement: “To build useful agents, you can’t wait seconds for a model to interpret a screen. By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn’t practical before.” Translation: “To build useful agents, you can’t wait seconds for a model to interpret a screen. By building on Nemotron 3 Nano Omni, our agents can rapidly interpret full HD screen recordings — something that wasn’t practical before.”
Open-source strategy and deployment: weights / datasets / training methods all made public
At the time of release, NVIDIA also published:
Model weights
Training datasets
Training techniques/methodology
Deployment pipeline covers three layers:
Local workstation: NVIDIA DGX Spark, DGX Station
NIM microservices: build.nvidia.com
Third-party platforms: Hugging Face, OpenRouter, and through 25+ NVIDIA Cloud Partners, inference platforms, and cloud service providers
Custom tools use NVIDIA NeMo. Over the past year, the Nemotron 3 family (Nano/Super/Ultra) has accumulated more than 50 million downloads on Hugging Face. This time, the Omni extends the family’s capabilities into the multimodal and agentic domains.
This article NVIDIA’s release of Nemotron 3 Nano Omni open-source multimodal first appeared on 鏈新聞 ABMedia.
Related News
Does Claude/GPT love pleasing too much? A Claude.md prompt lets AI deliver tough, accurate answers
OpenAI launches ChatGPT Futures: 26 inaugural students receive $10k in funding, spanning more than 20 universities
NVIDIA and MediaTek team up to jointly build the future car for AI-native assistants
Chrome covertly replaced with a 4GB AI model, then deleted and reinstalled; researchers say it violates EU privacy laws
OpenAI Unveils the MRC Supercomputer Network Protocol! Teaming Up with NVIDIA, AMD, and Microsoft to Build the Stargate Infrastructure