Macaw-LLM
Visit ToolMacaw-LLM is an open-source research tool for multi-modal language modeling. It integrates image, video, audio, and text data, built upon CLIP, Whisper, and LLaMA foundations.
At a glance
Trending
Macaw-LLM is an open-source research tool for multi-modal language modeling. It integrates image, video, audio, and text data, built upon CLIP, Whisper, and LLaMA foundations.
Trending
About
Macaw-LLM is an exploratory open-source project that pioneers multi-modal language modeling by seamlessly combining image, video, audio, and text data. Built upon the foundations of CLIP, Whisper, and LLaMA, it offers a unique approach to integrating diverse data types. Key features include simple and fast alignment to LLM embeddings, one-stage instruction fine-tuning, and a newly created multi-modal instruction dataset covering image and video modalities. The architecture leverages CLIP for image/video encoding, Whisper for audio encoding, and LLaMA (or Vicuna/Bloom) as the core language model. This tool is designed for researchers and developers to explore and advance the field of multi-modal AI.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending
Also listed in