PaliGemma: Open-Source Multimodal Model by Google

🧀

View Website Roboflow Blog PaliGemma GitHub PaliGemma Documentation Google Model Card

Google has released PaliGemma, an open-source multimodal vision language model (VLM) capable of understanding and generating content for images and texts. It outperforms other VLMs with its object detection and segmentation capabilities. PaliGemma is designed for fine-tuning on custom datasets, allowing users to optimize its performance for specific tasks.

Launched at 2024 Google I/O
3 billion parameters
Multilingual support
Fine-tuning enabled
Commercial use permissible

View Website Roboflow Blog PaliGemma GitHub PaliGemma Documentation Google Model Card

Social

PaliGemma: Open-Source Multimodal Model by Google