Xtuner

Visit Tool

xtuner is a next-generation training engine built for ultra-large Mixture of Experts (MoE) models. It offers scalable and efficient training, supporting long sequences and various AI models.

Claim this tool

1View

At a glance

Pricing

Open Source

Free tier

Yes

API

Skill level

Technical

About

What is xtuner?

xtuner is a next-generation LLM training engine specifically designed for ultra-large-scale MoE models. Unlike traditional 3D parallel training architectures, XTuner V1 is optimized for mainstream MoE training scenarios, enabling scalable training of 200B-scale MoE models without expert parallelism and 600B models with only intra-node expert parallelism. It features memory-efficient design for long sequence support, allowing 200B MoE models to train on 64k sequence lengths. The engine boasts superior efficiency, supporting MoE training up to 1T parameters and achieving breakthrough FSDP training throughput. It also integrates with leading inference frameworks like LMDeploy, vLLM, and SGLang.

Best used for

Ideal for developers and data scientists who need to train ultra-large Mixture of Experts (MoE) models, handle long sequence lengths efficiently, and achieve superior training throughput. Especially valuable for researchers and engineers working with cutting-edge LLM architectures.

Common actions

train large language models

fine-tune MoE models

optimize model training

scale AI algorithms

"AI Agents"github copilotface swappingworkflowsopen-sourcecollaborationautomated workflowdeepfakelow-code/no-code

Capabilities

Key features

Dropless Training
Long Sequence Support
Superior Efficiency
Multimodal Pre-training
Multimodal Supervised Fine-tuning
GRPO algorithm

Target Audience

developerdata scientist

Integrations

lmdeployvllmsglang

Pricing & Plans

Open Source

Free

FAQs

What kind of models does xtuner primarily support?

xtuner is specifically designed for ultra-large-scale Mixture of Experts (MoE) models, optimizing for mainstream MoE training scenarios. It also supports various other models like Intern S1, Intern VL, Qwen3 Dense, Qwen3 MoE, GPT OSS, Deepseek V3, and KIMI K2.

How does xtuner handle long sequence training?

xtuner features a memory-efficient design that allows training of 200B MoE models on 64k sequence lengths without requiring sequence parallelism. It also fully supports DeepSpeed Ulysses sequence parallelism for linearly scalable maximum sequence length.

What are the key performance advantages of xtuner?

xtuner supports MoE training up to 1T parameters and is the first to achieve FSDP training throughput that surpasses traditional 3D parallel schemes for MoE models above 200B scale. It also shows optimized efficiency on Ascend A3 Supernode.

Trending

Subcategories trending in Coding & Development

Code Assistants DevOps & Infrastructure No-Code / Low-Code Testing & QA Backend & APIs Prompt Engineering

Trending

Also listed in

This tool also appears in

AI Agents & Automation › AI Frameworks & Infra Research & Education › Scientific Computing

Explore

Browse AI tools by category

Content & Design Productivity & Business Coding & Development AI Agents & Automation Research & Education Wellness & Lifestyle Career Development Marketing & Growth Data & Analytics Customer Support & CX Finance E-commerce