WhisperSpeech
Visit ToolWhisperSpeech is an open-source text-to-speech (TTS) system that inverts OpenAI's Whisper model. It aims to be a powerful, hackable, and commercially safe speech generation tool.
At a glance
Trending
Also listed in
WhisperSpeech is an open-source text-to-speech (TTS) system that inverts OpenAI's Whisper model. It aims to be a powerful, hackable, and commercially safe speech generation tool.
Trending
Also listed in
About
WhisperSpeech is an open-source text-to-speech (TTS) system developed by inverting OpenAI's Whisper model. The project's ambition is to become for speech what Stable Diffusion is for images, offering a powerful, hackable, and commercially safe solution. It features rapid speech generation, with updates boasting 12x faster-than-real-time performance on consumer hardware, and includes one-click voice-cloning capabilities. The system supports multilingual output and is built with an Apache-2.0 / MIT license, using models trained exclusively on properly licensed data. Current releases include English (LibreLight), with multilingual support planned for the future. It follows a two-stage, token-based architecture similar to AudioLM and MusicGen, utilizing Semantic Whisper for transcription, EnCodec for waveform tokenization, and Vocos for high-fidelity audio.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending