What is the primary goal of the minimind project?
The minimind project aims to enable individuals to train a small-parameter GPT model from scratch with minimal cost and time. It focuses on providing a complete, understandable, and reproducible open-source framework for LLM development, making advanced AI accessible to a broader audience.
What kind of models can be trained using minimind?
minimind supports training various small language models, including 64M-parameter GPTs, MoE models, and experimental versions like MiniMind-V (vision multimodal), MiniMind-dLM (diffusion language model), and MiniMind-Linear (linear attention model).
Does minimind support different training stages for LLMs?
Yes, minimind covers a comprehensive range of LLM training stages. This includes pre-training, supervised fine-tuning (SFT), LoRA, RLHF (DPO), RLAIF (PPO/GRPO/CISPO), Tool Use, Agentic RL, adaptive thinking, and model distillation, providing a full development pipeline.
What are the hardware requirements for training with minimind?
The project emphasizes low-cost training, with the SFT stage on a single NVIDIA 3090 GPU taking about 2 hours and costing around 3 RMB. While it supports single and multi-GPU setups, a single 24GB GPU like the NVIDIA 3090 is sufficient for rapid reproduction.
Is minimind compatible with existing LLM frameworks and tools?
Yes, minimind is designed for compatibility. It works with mainstream frameworks such as transformers, trl, and peft, as well as popular inference engines like llama.cpp, vllm, and ollama. It also supports visualization tools like wandb and swanlab.