What metadata does the Whissle Speech-to-Text API extract?
Beyond transcription, Whissle's Speech-to-Text API extracts rich metadata in a single pass, including intent detection, emotion recognition, named entity recognition (NER), speaker diarization, age and gender estimation, and punctuation. The same metadata is also available for text input via their Text Intelligence API.
Is Whissle free to use, and what are the pricing options?
Yes, the personal AI assistant at lulu.whissle.ai is free to use. The Speech-to-Text and Intelligence APIs have usage-based pricing starting at $0.003 per minute. Additionally, Whissle can be self-hosted via Docker at no cost, providing a full local setup.
Can I self-host Whissle, and what are the requirements?
Absolutely. Whissle provides a full Docker Compose setup that allows self-hosting the frontend, gateway (ASR + agent + proxy), and backend locally. It replaces cloud dependencies with SQLite and local storage, requiring only 16 GB RAM and a Gemini API key.
What is Live Assist / call coaching in Whissle?
Live Assist provides real-time AI coaching during phone calls or meetings. It listens to the conversation, detects intent and emotion, and surfaces contextual suggestions, key points, and action items. This all happens in real-time with low latency to support dynamic interactions.
How does Whissle compare to other speech-to-text APIs?
Whissle's META-1 model performs transcription and metadata extraction simultaneously in a single pass. This differs from traditional pipelines that require separate models for each task, resulting in lower latency, lower cost, and richer output from a single API call.