Skip to content

NVIDIA Nemotron 3 Ultra 550B A55B BF16

nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16
Open Source · chat · open-weights
GA Alert me on changes
Context
1M
Max output
Input $/1M
$0.50
Output $/1M
$2.20
Modalities
text
Released
03 Jun 2026
AI summary
● machine-written

NVIDIA Nemotron 3 Ultra 550B: Open-weights MoE reasoning model

NVIDIA Nemotron 3 Ultra is an open-weights frontier reasoning model with 550B total parameters and 55B active parameters, built on a hybrid Transformer-Mamba mixture-of-experts architecture. It supports a 1M token context window and is designed for long-running agentic workflows including agent orchestration, coding agents, deep research, and complex enterprise tasks. The model is particularly strong at multi-step reasoning and planning with high-throughput inference for agent pipelines.

What's new
  • 550B total parameters with 55B active (MoE architecture)
  • 1M token context window
  • Hybrid Transformer-Mamba architecture
  • Suited for long-running agentic workflows and multi-step reasoning
Best for
Agent orchestration and agentic workflowsCoding agents and deep researchMulti-step reasoning and planningComplex enterprise tasks
Sources

Source: https://huggingface.co/nvidia/NVIDIA-Nemotron-3-Ultra-550B-A55B-BF16