VoxCPM

Visit Tool

VoxCPM is an Audio & Music tool that offers tokenizer-free Text-to-Speech for multilingual speech generation, creative voice design, and true-to-life cloning. It supports 30 languages and outputs 48kHz studio-quality audio.

Claim this tool

No Views Yet

At a glance

Pricing

Open Source

Free tier

Yes

API

Yes

Skill level

Technical

About

What is VoxCPM?

VoxCPM2 is a cutting-edge, tokenizer-free Text-to-Speech (TTS) system developed by OpenBMB, designed for highly natural and expressive speech synthesis. It bypasses discrete tokenization by directly generating continuous speech representations via an end-to-end diffusion autoregressive architecture. The latest version, VoxCPM2, is a 2B parameter model trained on over 2 million hours of multilingual speech data, supporting 30 languages. Key features include Voice Design, allowing users to create new voices from natural-language descriptions, and Controllable Voice Cloning, which enables cloning a voice from a short reference clip with optional style guidance. It also offers Ultimate Cloning for reproducing every vocal nuance and outputs 48kHz studio-quality audio. VoxCPM2 is fully open-source under the Apache-2.0 license, making it free for commercial use, and supports real-time streaming with low RTF.

Best used for

Ideal for content creators and podcasters who need to generate realistic multilingual speech, design unique voices from descriptions, and clone voices with precise control over style and emotion. Especially valuable for those requiring high-fidelity 48kHz audio output and real-time streaming capabilities for diverse audio projects.

Common actions

generate speech

design voices

clone voices

synthesize multilingual audio

stream audio

collaborationopen-sourceworkflowslow-code/no-code"AI Agents"github copilotdeepfakeface swappingautomated workflow

Capabilities

Key features

30-language multilingual TTS
Voice design from description
Controllable voice cloning
Ultimate voice cloning
48kHz studio-quality audio
Real-time streaming
Context-aware synthesis

Target Audience

content creatorpodcaster

Integrations

Not yet documented

Pricing & Plans

Open Source

Free

FAQs

What are the key differences between VoxCPM2, VoxCPM1.5, and VoxCPM-0.5B?

VoxCPM2 is the latest stable release, featuring 2B parameters, 30 languages, 48kHz audio, and advanced voice design/cloning. VoxCPM1.5 is a legacy 0.6B model with 2 languages and 44.1kHz audio. VoxCPM-0.5B is an older 0.5B model, also with 2 languages and 16kHz audio. VoxCPM2 offers the most comprehensive features and highest quality.

Can VoxCPM2 be used for commercial projects?

Yes, VoxCPM2 is fully open-source and released under the Apache-2.0 license. This license permits free use for commercial purposes, making it suitable for integration into commercial applications and services without licensing fees.

What are the system requirements for running VoxCPM?

To run VoxCPM, you need Python ≥ 3.10 (<3.13), PyTorch ≥ 2.5.0, and CUDA ≥ 12.0. For optimal performance and real-time streaming, an NVIDIA RTX 4090 GPU is recommended, especially when using Nano-vLLM or vLLM-Omni for production deployment.

How does VoxCPM2 achieve voice design from a natural language description?

VoxCPM2 allows users to create a new voice by providing a natural-language description (e.g., "A young woman, gentle and sweet voice"). The model then synthesizes speech based on this description, eliminating the need for reference audio to generate a unique voice profile.

Does VoxCPM2 support real-time streaming for speech generation?

Yes, VoxCPM2 supports real-time streaming with a Real-Time Factor (RTF) as low as ~0.3 on an NVIDIA RTX 4090. This can be further accelerated to ~0.13 using Nano-vLLM or vLLM-Omni, enabling efficient and responsive speech synthesis for live applications.

Trending

Subcategories trending in Content & Design

Image Generation AI Writing Assistants Video Generation Photo Editing Graphic Design Video Editing

Trending

Also listed in

This tool also appears in

Research & Education › Academic Research AI Agents & Automation › AI Frameworks & Infra Coding & Development › Open Source & Models AI Agents & Automation › Voice Agents

Explore

Browse AI tools by category

Content & Design Productivity & Business Coding & Development AI Agents & Automation Research & Education Wellness & Lifestyle Career Development Marketing & Growth Data & Analytics Customer Support & CX Finance E-commerce