PaliGemma: Open-Source Multimodal Model by Google


Google has released PaliGemma, an open-source multimodal vision language model (VLM) capable of understanding and generating content for images and texts. It outperforms other VLMs with its object detection and segmentation capabilities. PaliGemma is designed for fine-tuning on custom datasets, allowing users to optimize its performance for specific tasks.

  • Launched at 2024 Google I/O
  • 3 billion parameters
  • Multilingual support
  • Fine-tuning enabled
  • Commercial use permissible