MMaDA
Visit ToolMMaDA is an open-source AI Frameworks & Infra tool that provides multimodal large diffusion language models. It excels at textual reasoning, multimodal understanding, and text-to-image generation.
At a glance
Trending
Also listed in
MMaDA is an open-source AI Frameworks & Infra tool that provides multimodal large diffusion language models. It excels at textual reasoning, multimodal understanding, and text-to-image generation.
Trending
Also listed in
About
MMaDA is an open-sourced family of multimodal diffusion foundation models designed for superior performance across diverse domains including textual reasoning, multimodal understanding, and text-to-image generation. It introduces a unified diffusion architecture with a shared probabilistic formulation and modality-agnostic design, eliminating the need for modality-specific components. MMaDA also features a mixed long chain-of-thought (CoT) fine-tuning strategy for a unified CoT format across modalities, and a unified policy-gradient-based RL algorithm called UniGRPO for consistent performance improvements in both reasoning and generation tasks. The project provides various checkpoints like MMaDA-8B-Base and MMaDA-8B-MixCoT, supporting capabilities from basic text and image generation to complex textual and multimodal reasoning.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending