🧀 BigCheese.ai

Social

Pixtral 12B

🧀

Mistral AI announced the release of Pixtral 12B, the first-ever multimodal model that combines a 400M parameter vision encoder with a 12B parameter multimodal decoder. It's trained on interleaved image and text data, offering excellent performance on multimodal tasks without compromising on text benchmarks. Pixtral 12B surpasses larger models with 52.5% on the MMMU reasoning benchmark and shows robust abilities in chart understanding, document question answering, and instruction following. The model is licensed under Apache 2.0 and is available to try on La Plateforme or Le Chat.

  • Pixtral 12B excels in multimodal tasks.
  • 400M vision encoder trained from scratch.
  • 12B parameter multimodal decoder.
  • Supports variable image sizes.
  • Apache 2.0 licensed.