Phi 4 multimodal instruct

Name: Phi 4 multimodal instruct
Brand: Open Source
Author: Open Source

microsoft/Phi-4-multimodal-instruct

Open Source · chat · open-weights

GA Alert me on changes

Context

131.1K

Max output

—

Input $/1M

$0.05

Output $/1M

$0.10

Modalities

text

Released

24 Feb 2025

License: mit · microsoft/Phi-4-multimodal-instruct

AI summary

● machine-written

Microsoft releases Phi-4 Multimodal Instruct, 5.6B parameter open model

Phi-4 Multimodal Instruct is a 5.6 billion parameter open-weight model from Microsoft that accepts text, image, and audio inputs and generates text outputs. The model supports a 131K token context window and is available under the MIT license for commercial use. It was released in February 2025 and is positioned as a lightweight multimodal foundation model.

What's new

Supports 131K token context window
Accepts text, image, and audio inputs
5.6B parameters trained on 5 trillion tokens
Available via open-weights under MIT license

Best for

Multimodal reasoning tasks combining text and imagesResource-constrained deployment scenariosCommercial applications requiring open-weight models

Sources

Source: https://huggingface.co/microsoft/Phi-4-multimodal-instruct