NVIDIA Nemotron 3 Ultra 550B A55B BF16
nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
Open Source · chat · open-weights
Context
1M
Max output
—
Input $/1M
$0.50
Output $/1M
$2.20
Modalities
text
Released
03 Jun 2026
License: other · nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
AI summary
● machine-written
NVIDIA Nemotron 3 Ultra 550B: Open-weights MoE reasoning model
NVIDIA Nemotron 3 Ultra is an open-weights frontier reasoning model with 550B total parameters and 55B active parameters, built on a hybrid Transformer-Mamba mixture-of-experts architecture. It supports a 1M token context window and is designed for long-running agentic workflows including agent orchestration, coding agents, deep research, and complex enterprise tasks. The model is particularly strong at multi-step reasoning and planning with high-throughput inference for agent pipelines.
What's new
- 550B total parameters with 55B active (MoE architecture)
- 1M token context window
- Hybrid Transformer-Mamba architecture
- Suited for long-running agentic workflows and multi-step reasoning
Best for
Agent orchestration and agentic workflowsCoding agents and deep researchMulti-step reasoning and planningComplex enterprise tasks
Source: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16