Skip to content

Phi 4 multimodal instruct

microsoft/Phi-4-multimodal-instruct
Open Source · chat · open-weights
GA Alert me on changes
Context
131.1K
Max output
Input $/1M
$0.05
Output $/1M
$0.10
Modalities
text
Released
24 Feb 2025
AI summary
● machine-written

Microsoft releases Phi-4 Multimodal Instruct, 5.6B parameter open model

Phi-4 Multimodal Instruct is a 5.6 billion parameter open-weight model from Microsoft that accepts text, image, and audio inputs and generates text outputs. The model supports a 131K token context window and is available under the MIT license for commercial use. It was released in February 2025 and is positioned as a lightweight multimodal foundation model.

What's new
  • Supports 131K token context window
  • Accepts text, image, and audio inputs
  • 5.6B parameters trained on 5 trillion tokens
  • Available via open-weights under MIT license
Best for
Multimodal reasoning tasks combining text and imagesResource-constrained deployment scenariosCommercial applications requiring open-weight models
Sources

Source: https://huggingface.co/microsoft/Phi-4-multimodal-instruct