LaVIT
Visit ToolLaVIT is an open-source research and education tool that empowers large language models to understand and generate visual content. It provides a unified framework for visual understanding and generation.
At a glance
Trending
LaVIT is an open-source research and education tool that empowers large language models to understand and generate visual content. It provides a unified framework for visual understanding and generation.
Trending
About
LaVIT and Video-LaVIT are multi-modal large language models designed to empower LLMs with the ability to understand and generate visual content. This project introduces a unified framework for both visual understanding and generation through a proposed pre-training strategy. The core design involves a visual tokenizer that translates non-linguistic visual content (images, videos) into discrete tokens readable by LLMs, and a detokenizer to recover continuous visual signals from generated tokens. After pre-training, LaVIT and Video-LaVIT can read image and video content, generate captions, answer questions, and perform text-to-image, text-to-video, and image-to-video generation, including generation via multi-modal prompts.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending