Mini-Omni2
Visit ToolMini-Omni2 is an open-source AI model that provides omni-interactive capabilities, understanding image, audio, and text inputs. It enables end-to-end voice conversations with real-time voice output.
At a glance
Trending
Mini-Omni2 is an open-source AI model that provides omni-interactive capabilities, understanding image, audio, and text inputs. It enables end-to-end voice conversations with real-time voice output.
Trending
About
Mini-Omni2 is an open-source, omni-interactive AI model designed to provide capabilities similar to GPT-4o, including vision, speech, and duplex interactions. It can understand image, audio, and text inputs, facilitating end-to-end voice conversations with users. A key feature is its real-time voice output and an interruption mechanism during speech, allowing for flexible interaction. The model leverages multimodal modeling by concatenating image, audio, and text features for comprehensive task performance, and uses text-guided delayed parallel output for real-time speech responses. It employs a multi-stage training approach, including encoder adaptation, modal alignment, and multimodal fine-tuning. The model is currently trained on English, though it can understand other languages supported by Whisper for audio encoding, with output remaining in English.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending
Also listed in