whisper.cpp is an open-source AI tool that provides high-performance automatic speech recognition (ASR) using OpenAI's Whisper model. It offers a plain C/C++ implementation optimized for various platforms, including Apple Silicon, Windows, and Linux.
whisper.cpp is a high-performance, open-source C/C++ port of OpenAI's Whisper automatic speech recognition (ASR) model. Designed for efficiency, it boasts a plain C/C++ implementation with minimal dependencies, making it highly portable. The tool is optimized for various architectures, including Apple Silicon (with ARM NEON, Accelerate framework, Metal, and Core ML support), x86 (AVX intrinsics), and POWER (VSX intrinsics). It supports mixed F16/F32 precision, integer quantization, and zero memory allocations at runtime. Efficient GPU support is available for NVIDIA, Vulkan, OpenVINO, Ascend NPU, and Moore Threads GPUs. It also includes Voice Activity Detection (VAD) and a C-style API, allowing for easy integration into different applications and platforms like Mac OS, iOS, Android, Java, Linux, WebAssembly, Windows, and Raspberry Pi.
Best used for
Ideal for developers and engineers who need to implement robust, high-performance automatic speech recognition in their applications, process audio offline on various devices, and leverage GPU acceleration for faster transcription. Especially valuable for building custom voice assistants or integrating ASR into embedded systems.
developersmachine learning engineersembedded systems engineers
Integrations
Not yet documented
Pricing & Plans
Open Source
Free
FAQs
What platforms does whisper.cpp support?
whisper.cpp supports a wide range of platforms including Mac OS (Intel and Arm), iOS, Android, Java, Linux, FreeBSD, WebAssembly, Windows (MSVC and MinGW), and Raspberry Pi, making it highly versatile for various deployment scenarios.
Can whisper.cpp utilize GPU acceleration?
Yes, whisper.cpp offers efficient GPU support for NVIDIA (cuBLAS/CUDA), Vulkan, OpenVINO (Intel GPUs), Ascend NPU, and Moore Threads GPUs (muBLAS/MUSA). This allows for significant speed-ups in inference compared to CPU-only execution.
How can I reduce memory usage with whisper.cpp?
whisper.cpp supports integer quantization of Whisper ggml models. Quantized models require less memory and disk space, and can be processed more efficiently depending on the hardware, offering a way to optimize resource consumption.