Zonos
Visit ToolZonos is an open-weight text-to-speech model that delivers expressive and high-quality speech synthesis. It enables natural speech generation from text prompts with voice cloning capabilities.
At a glance
Trending
Also listed in
Zonos is an open-weight text-to-speech model that delivers expressive and high-quality speech synthesis. It enables natural speech generation from text prompts with voice cloning capabilities.
Trending
Also listed in
About
Zonos-v0.1 is a leading open-weight text-to-speech model trained on over 200,000 hours of varied multilingual speech. It delivers expressiveness and quality on par with, or even surpassing, top TTS providers. The model enables highly natural speech generation from text prompts when given a speaker embedding or audio prefix, and can accurately perform speech cloning with just a few seconds of reference audio. Zonos offers fine-grained control over speaking rate, pitch variation, audio quality, and emotions such as happiness, fear, sadness, and anger. It supports English, Japanese, Chinese, French, and German, and outputs speech natively at 44kHz. The model runs with a real-time factor of ~2x on an RTX 4090 and includes a Gradio WebUI for easy use.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending