🤖

AI Agents & Automation

Browsing page 23 of AI tools for Voice Agents in AI Agents & Automation. Sorted by confidence score — our independent quality rating.

All AI Frameworks & Infra Browser & Web Agents Chatbots & Conversational AI General-Purpose Agents Multi-Agent Systems Personal Assistants RAG & Document AI RPA Scheduling & Task Agents Voice Agents Workflow Agents

Screen By AI

62%

Screen By AI is an AI-powered video interviewing platform designed to help companies screen, evaluate, and hire candidates faster and more efficiently. The platform features Sia, an AI hiring assistant that conducts real-time, human-like interviews, dynamically adapting to candidate responses. It offers unbiased candidate evaluation through AI-driven scoring across technical, communication, and behavioral metrics, providing in-depth reports highlighting strengths and red flags. Screen By AI also includes robust anti-cheating mechanisms like facial recognition and tab switching detection, ensuring interview integrity. All interviews are recorded for later review and detailed analysis, enabling data-driven hiring decisions and reducing the risk of mismatched hires.

Collaborative Conveyancing

62%

Collaborative Conveyancing offers unique AI-driven solutions specifically designed for conveyancers to enhance efficiency and reduce friction in property transactions. The platform features an Enquiry Manager for easy management of case enquiries, allowing users to track, respond, and generate documents. It also includes a Voice AI tool to transform calls, aiming to speed up transactions, drive down enquiries, and create overall efficiencies. The solutions are compliant and risk-aligned, helping conveyancers manage their workload more effectively, reduce tension between parties, and improve client service. The platform offers both a free basic plan and a premium paid plan for its Enquiry Manager.

Roojoom

62%

Roojoom provides AI-powered solutions designed to optimize customer journeys for both SMBs and large enterprises. For SMBs, it offers PickMyCall, an AI voice assistant that acts as a 24/7 receptionist, booking appointments and qualifying leads. Enterprise clients benefit from Roojoom's platform for orchestrating individualized customer journeys across all touchpoints, from onboarding to re-engagement. The platform uses AI-based journey logic to achieve business goals, enabling marketers to set objectives while the AI handles execution. It supports continuous, omni-channel journeys with memory functionality and auto-generates user experiences for web-based channels without requiring coding. Roojoom aims to transform customer service with intelligent service journeys, guiding customers through self-service options and seamlessly escalating to agents when necessary.

Openai Realtime Voice

62%

Openai Realtime Voice is a Hugging Face Space that facilitates real-time voice conversations with OpenAI's assistant. This tool allows users to interact by speaking into their microphone, with the AI assistant providing immediate responses. It requires an OpenAI API key to initiate conversations, making it suitable for those who wish to test and explore the capabilities of OpenAI's voice API. The platform is designed for direct, interactive voice communication, offering a straightforward way to experience AI-powered voice interactions.

Scribvet

62%

ScribVet is an AI-powered veterinary scribe designed to significantly streamline documentation for veterinary professionals. By allowing vets to speak naturally during exams, the tool automatically generates detailed medical records, including SOAP notes and client communications. It supports over 50 languages for transcription and offers customizable templates, enabling users to create notes in any required format, even specific PIMS formats like Cornerstone's "Physical Exam/Report Card." ScribVet aims to reduce late nights spent on paperwork, improve work-life balance, and enhance practice efficiency for both solo practitioners and teams. Users can record themselves, select a template, and then review and export the content into their PIMS system.

Scoopika

62%

Scoopika is an open-source platform designed for developers to build modern, fast, and reliable multimodal LLM-powered web applications. It provides a comprehensive toolkit for creating AI agents that can interact with various data types, including text, images, audio, and URLs, and integrate with external APIs. Key features include built-in error recovery, responses streaming, multimodal input handling, and LLM-output validation. Scoopika also offers serverless encrypted memory stores for managing conversation history and knowledge stores for expanding AI agents' knowledge by uploading files or websites. The platform is optimized for performance and real-time interactive applications, supporting global scalability and offering SDKs for server-side, client-side, and React development.

TaDiCodec TTS AR Qwen2.5 0.5B

62%

TaDiCodec TTS AR Qwen2.5 0.5B is an AI-powered text-to-speech (TTS) tool available as a Hugging Face Space. It enables users to convert written text into spoken audio. A key feature is its ability to perform voice cloning, allowing users to match the voice of a reference audio by providing both the audio sample and its corresponding text. This makes it suitable for generating custom voiceovers or personalized audio content. The tool leverages the Qwen2.5 0.5B model for its synthesis capabilities, offering an accessible solution for various audio generation needs.

Talk to OpenAI

62%

Talk to OpenAI is an innovative AI tool hosted on Hugging Face Spaces by fastrtc, designed to facilitate voice-based interaction with OpenAI's advanced GPT-4 model. Users can speak into a microphone, and the application will transcribe their voice input, process it using GPT-4, and then generate an audio response. This provides a hands-on and intuitive way to explore and experiment with AI-driven conversations, making the multimodal API accessible through a natural language interface. It's a practical demonstration of real-time voice-to-text and text-to-speech capabilities powered by OpenAI's technology.

Tortoise Tts

62%

Tortoise Tts is an AI-powered text-to-speech tool available as a Hugging Face Space. It allows users to convert written text into lifelike speech with a selection of voice options. Users can either provide text directly or upload a text file to generate audio. The tool focuses on creating expressive speech, making it suitable for various applications requiring natural-sounding voiceovers or audio content. While the live website currently shows a runtime error, its core functionality is designed for high-quality speech synthesis.

VibeVoice-Realtime-0.5B

62%

VibeVoice-Realtime-0.5B is an AI-powered tool hosted on Hugging Face that specializes in real-time text-to-speech conversion. Users can input English text and select a speaker voice to generate spoken audio. A key feature is the ability to fine-tune the voice fidelity using a slider, allowing for customization of the output quality. The application provides the generated audio as a downloadable WAV file, making it suitable for various applications requiring spoken content. This tool is designed for quick and efficient audio generation from text.

Vevo for Zero-shot VC, TTS, and More

62%

Vevo is an AI-powered tool hosted on Hugging Face Spaces, designed for controllable zero-shot voice imitation. It enables users to transform the style and timbre of an audio file by providing a reference audio file. This functionality is useful for voice cloning and text-to-speech applications, allowing for a high degree of control over the output audio. The tool requires users to upload two audio files: one for the content and another for the desired style or timbre. While the platform experienced a runtime error at the time of scraping, its core offering focuses on advanced audio manipulation for creative and practical purposes.

VibeVoice ASR

62%

VibeVoice ASR is an official playground for Microsoft's VibeVoice-ASR, an advanced AI tool designed for automatic speech recognition. Hosted on Hugging Face Spaces, this application enables users to easily convert spoken language into written text. Users can input either pre-recorded audio files or utilize live speech, and the system will generate precise text transcriptions. This tool is ideal for anyone needing to quickly and accurately transcribe audio, making it a valuable resource for various applications ranging from content creation to documentation.

LilyFM: AI Text to Podcast

62%

LilyFM is an innovative iOS mobile application designed to convert various forms of written content into engaging, AI-generated podcasts. Users can transform articles, PDFs, and even scanned documents into personalized audio experiences, making it ideal for learning and consuming information on the go. The app features cutting-edge AI voice models that deliver natural, human-like narration in over 6 languages, moving beyond robotic text-to-speech. Each podcast is tailored to the user's context and interests, providing AI-powered insights, summaries, and key takeaways. With deep iOS integration, including Live Activities and CarPlay support, LilyFM ensures seamless playback and accessibility, allowing users to learn while multitasking, driving, or offline. Privacy is a priority, with all uploaded documents stored exclusively in iCloud.

Streamer-Sales

62%

Streamer-Sales is an AI sales assistant designed to generate compelling product descriptions and sales pitches. It leverages a large language model, fine-tuned on InternLM2, to create engaging explanations of product features that inspire purchase intent. The tool integrates LMDeploy for accelerated inference, RAG for enhanced generation, TTS for natural text-to-speech, and digital human generation to create virtual presenters. It also includes Agent capabilities for real-time information retrieval, ASR for speech-to-text, and a robust backend with FastAPI and PostgreSQL, all deployable via Docker-compose. This comprehensive solution aims to boost sales efficiency and enhance user experience for online and offline sales.

wukong-robot

62%

wukong-robot is an open-source project designed for makers and hackers to build personalized Chinese voice dialogue robots and smart speakers. It offers a modular architecture, allowing for flexible integration of various speech recognition, speech synthesis, and dialogue robot technologies. The tool supports multiple Chinese speech recognition and synthesis providers, including Baidu, iFlytek, Alibaba, Tencent, OpenAI Whisper, Apple, Microsoft Edge, and VITS voice cloning TTS. It also integrates with online dialogue robots like ChatGPT and local AnyQ-based bots. Key features include global listening, offline wake-up with Porcupine and Snowboy engines, Muse brain-computer interaction, and shake-to-wake functionality. It supports smart home integration with devices like Xiaomi AI Speaker, Siri, MQTT, and HomeAssistant, and provides a backend for remote control, configuration, and log viewing.

WebAssembly English TTS (sherpa-onnx)

62%

WebAssembly English TTS (sherpa-onnx) is a text-to-speech tool hosted on Hugging Face Spaces that allows users to convert English text into spoken audio. The unique aspect of this tool is that it runs the speech-synthesis model entirely locally within your browser using WebAssembly. This means all processing happens on your device, ensuring privacy and instant audio generation. Users can type the desired text, adjust parameters like speaker ID and speech speed, and then generate an audio clip that can be played immediately. It's an efficient solution for generating speech without relying on external servers for processing.

Voice Clone convete 2 voz

62%

Voice Clone convete 2 voz is an AI-powered tool designed for voice cloning and conversion. Users can upload an existing audio file or record their own voice as the source, and then provide a target voice to mimic. The system processes these inputs to convert the source voice, adopting the tone and characteristics of the target voice. The output is an audio file containing the newly converted voice. This tool is suitable for various applications requiring personalized audio content, such as content creation or educational materials, offering a straightforward way to achieve voice transformation.

Voice Agent WebRTC + LangGraph

62%

Voice Agent WebRTC + LangGraph is a powerful AI tool developed by NVIDIA, designed for creating interactive voice agents. It leverages WebRTC for real-time communication, LangGraph for agent orchestration, Automatic Speech Recognition (ASR) to convert spoken language into text, and Text-to-Speech (TTS) to vocalize translated text. Users can speak into the application, and it processes their voice by converting it to text, translating it, and then speaking the translated text back. This eliminates the need for manual typing, offering a seamless and intuitive voice interaction experience. It's hosted on Hugging Face Spaces, making it accessible for developers and researchers to experiment with and build advanced voice applications.

Voice Chat AI

62%

Voice Chat AI is an innovative AI chatbot that provides a seamless voice-based interaction experience. Users can speak their queries, and the application converts their voice input into text for processing. The AI then generates a relevant response, offering an intuitive and hands-free way to engage with artificial intelligence. A key feature is the ability to integrate web search results into the AI's responses, ensuring that users receive up-to-date and comprehensive information. This tool is hosted on Hugging Face, making it accessible for anyone looking for a conversational AI with web access capabilities.

🎤SpeakUp🗣️ - ASR Speech 2 Text 2 Voice Generator

62%

🎤SpeakUp🗣️ - ASR Speech 2 Text 2 Voice Generator is a tool hosted on Hugging Face Spaces that facilitates the conversion of speech to text and text to voice. This application is designed to provide a seamless experience for users looking to transcribe audio and synthesize spoken content. While the live website indicates a build error, the tool's core functionality aims to support various applications, including content creation and educational purposes, by offering robust speech-to-text and text-to-speech capabilities. Its presence on Hugging Face suggests an accessible platform for those interested in leveraging AI for audio processing.

Whisp

62%

Whisp is an intelligent voice dictation platform designed to transform spoken words into polished text across all your applications. It enables users to write up to five times faster than traditional typing by speaking naturally, with AI automatically correcting grammar, removing filler words, and adapting to personal style. Whisp learns unique vocabulary and common phrases, storing them in a personal memory and context library for consistent and efficient transcription. The tool also adjusts tone based on the application being used and supports over 150 languages. Available on Windows, with Mac and iPhone versions coming soon, Whisp aims to provide a seamless voice interface for professionals, students, creators, and anyone looking to enhance their productivity or accessibility.

sherpa-onnx

62%

sherpa-onnx is a comprehensive open-source AI toolkit designed for offline speech and audio processing. It offers a wide array of functionalities including speech-to-text (ASR), text-to-speech (TTS), speaker diarization, speaker identification, speaker verification, spoken language identification, audio tagging, voice activity detection (VAD), speech enhancement, keyword spotting, and source separation. The tool is highly versatile, supporting numerous platforms such as Android, iOS, Windows, macOS, Linux, and HarmonyOS, across various architectures including x64, x86, ARM, and RISC-V. It also integrates with several NPUs like Rockchip, Qualcomm, Ascend, and Axera, and provides APIs for 12 programming languages, including C++, Python, Java, and Swift, along with WebAssembly support. This makes it ideal for developers building AI-powered audio applications for embedded systems and diverse environments.

MGM Omni

62%

MGM Omni is a Hugging Face Space designed to scale Omni LLMs for personalized, long-horizon speech generation. This application enables users to create voice responses that accurately match a provided reference voice. Users can either input text directly or upload existing audio to generate the desired personalized speech. The tool supports bot integration, making it suitable for various applications requiring custom voice output. It is intended for research and development in speech technology, offering a platform to explore advanced voice synthesis and personalization.

LoveLive-ShojoKageki VITS

62%

LoveLive-ShojoKageki VITS is an AI-powered voice generation tool designed for creating audio from text. It supports both Chinese and Japanese languages, offering flexibility for users working with either. The tool provides options to select different speakers, allowing for varied vocal outputs. Users can also fine-tune parameters such as noise and duration to achieve desired audio characteristics. While the current live website indicates a runtime error and storage limit exceeded, the tool's core functionality is focused on customizable text-to-speech generation, making it suitable for fans of LoveLive and those interested in AI voice technology.

EXPLORE OTHER CATEGORIES

🎨 Content & Design 📊 Productivity & Business 💻 Coding & Development 📚 Research & Education 🧘 Wellness & Lifestyle 💼 Career Development 📈 Marketing & Growth 📉 Data & Analytics 💬 Customer Support & CX 💰 Finance 🛒 E-commerce