VLM2Vec
Visit ToolVLM2Vec is an open-source research tool that trains Vision-Language Models for massive multimodal embedding tasks. It provides a unified framework for images, videos, and visual documents.
At a glance
Trending
Also listed in
VLM2Vec is an open-source research tool that trains Vision-Language Models for massive multimodal embedding tasks. It provides a unified framework for images, videos, and visual documents.
Trending
Also listed in
About
VLM2Vec is an open-source project from TIGER-AI-Lab, providing a unified framework for training and evaluating powerful multimodal embeddings across diverse visual formats, including images, videos, and visual documents. It introduces MMEB-V2, a comprehensive benchmark with 78 tasks designed to systematically evaluate embedding models across these modalities. VLM2Vec-V2 sets a new state-of-the-art, outperforming strong baselines. The tool supports easy configuration of training and evaluation using YAML files and allows for easy extension with new datasets. It is built on state-of-the-art Vision-Language Models like Qwen2-VL, using instruction-guided contrastive training to produce fixed-dimensional embeddings for various inputs.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending