AI Agents & Automation
Browsing page 26 of AI tools for Voice Agents in AI Agents & Automation. Sorted by confidence score — our independent quality rating.
llama-assistant
Llama-assistant is an AI-powered assistant designed to help users with daily tasks while prioritizing privacy. Powered by models such as Llama 3.2 and DeepSeek R1, it operates locally on your machine, ensuring no data is sent to external servers. The assistant can recognize voice commands, process natural language, and perform a variety of actions including text summarization, sentence rephrasing, question answering, and email writing. It supports both text-only and multimodal models like Moondream2 and LLaVA. Key features include voice recognition, natural language processing, customizable UI, and custom actions. The project is actively being developed with plans for wake word detection, offline STT, knowledge database integration, and multi-language support.
Terrakotta
Terrakotta is an AI-powered platform designed to streamline commercial real estate prospecting. It offers an all-in-one solution for sourcing leads, making calls, and delivering personalized AI voicemails. Key features include an automated dialing system, a comprehensive property database with AI skip tracing, and a commercial real estate GPT for enhanced research. The platform integrates seamlessly with major CRM systems like Salesforce, HubSpot, and RealNex, enriching data with property details, owner history, and market insights. Terrakotta aims to boost outreach efficiency by allowing users to create lightning-fast voice clones and send customized AI voicemails, ensuring every connection is meaningful and saving hours of research time for commercial real estate professionals.
LocalAIVoiceChat
LocalAIVoiceChat provides a completely local AI talk experience on your PC, integrating the powerful Zephyr 7B language model with real-time speech-to-text and text-to-speech libraries. It utilizes RealtimeSTT with faster_whisper for transcription and RealtimeTTS with Coqui XTTS for synthesis, allowing for customizable AI personalities and voices. This experimental alpha software requires a GPU with around 8 GB VRAM and specific NVIDIA CUDA or AMD ROCm installations. While not production-ready, it offers a fast and engaging voice-based local chatbot experience, with ongoing updates to improve stability and model performance.
10x.Team
10x.Team is an AI-powered talent execution engine designed to supercharge recruiting efforts by automating key stages of the hiring process. It allows users to generate job descriptions, select from over 50 AI recruiters in 70+ languages to conduct screening and first-round interviews, and reclaim significant recruiter time. The platform offers features like RoleCraft for AI briefing and knock-out questions, TrueTalk for configuring AI recruiter avatars, voices, and interview tones, and HireRank for unbiased candidate ranking with video replays. 10x.Team aims to provide a fast, fair, and compliant AI recruiting experience, reducing bias and allowing hiring managers to focus on more strategic tasks. It offers flexible pricing, including a free tier for initial testing.
Python-ai-assistant
Python-ai-assistant, also known as Jarvis, is an open-source voice-commanding AI assistant built with Python 3.8. It offers a range of functionalities including speech recognition, text-to-speech interaction, and the execution of various commands. Users can interact with Jarvis via voice or text to perform tasks such as opening web pages, playing music, checking weather, setting alarms, and performing basic calculations. The assistant supports asynchronous command execution and allows for easy customization of voice commands and configurable assistant names. It also keeps a history of commands and learned skills in MongoDB, making it a versatile tool for personal automation.
HappyRobot
HappyRobot is an AI-native operating system designed to power autonomous operations by deploying AI workers that understand your business, make intelligent decisions, and act in real-time. The platform allows users to build custom AI workers with access to various systems and tools, integrating via API, webhook, or AI browser agents. These AI workers can execute tasks across all channels, including conversation and document parsing, with features like smart escalation, collaboration, data extraction, and analysis. HappyRobot emphasizes robust auditing, performance reporting, and AI auditor supervision, ensuring guaranteed uptime and scalability for enterprise-level deployments. It's built for complex environments, offering rapid implementation and optimization through embedded engineers.
Acrely
Acrely specializes in developing enterprise-grade voice agents tailored for the specific needs of innovative companies. The platform provides flexible deployment options, including both cloud-based and on-premise solutions, to accommodate diverse organizational requirements. Acrely empowers businesses to leverage advanced AI capabilities across various functions, such as enhancing customer service interactions, streamlining sales processes, and optimizing operational workflows. This allows organizations to integrate sophisticated voice AI into their existing infrastructure, driving efficiency and improving engagement in critical business areas.
Supertone
Supertone is a comprehensive voice intelligence platform offering advanced AI voice technology for both individual creators and businesses. It provides a suite of tools including 'Play' for AI voice generation via text-to-speech, 'Shift' for real-time voice changing with various character options, and 'Clear' for de-noising and de-reverbing audio. Additionally, 'Air' helps match reverb and EQ for ADR, ensuring natural-sounding dialogue. Supertone also offers a natural and expressive speech synthesis API for integration into various projects, empowering users to bring their services and content to life with high-quality AI voices. Trusted by major brands like Netflix, Disney, and HYBE, Supertone aims to push the boundaries of creativity in audio production.
SpeakType
SpeakType is a macOS application offering privacy-first, offline voice dictation. Leveraging WhisperKit AI, all processing occurs entirely on your Mac, ensuring that audio and transcripts remain local without any cloud uploads. This design prioritizes user privacy and data security. The tool is optimized for Apple Silicon, providing efficient and real-time speech-to-text transcription. It integrates seamlessly across various applications via a customizable keyboard shortcut, making it suitable for dictating emails, documents, code, and web forms. SpeakType aims to provide a reliable and secure dictation solution for Mac users.
pi-card
pi-card is an open-source project designed to create an AI-powered voice assistant running locally on a Raspberry Pi. It functions similarly to standard LLMs like ChatGPT in a conversational setting, but operates completely offline. Users can interact with the assistant using a customizable wake word or a physical button connected via GPIO. The system supports configurable conversation memory and can be enhanced with a camera to describe images and answer questions about them. It leverages cpp implementations like whisper.cpp for audio transcription and llama.cpp for vision capabilities, aiming for efficiency on Raspberry Pi hardware. Docker support is provided for easier setup, making it accessible for developers and hobbyists interested in local AI projects.
Pipes.AI
Pipes.AI is an AI-powered platform designed for businesses that purchase and generate a high volume of leads, aiming to get them on the phone efficiently. It orchestrates and simplifies lead acquisition by intelligently intaking data and routing high-quality prospects to revenue teams. The platform offers powerful AI-powered voice and SMS solutions for lead engagement and optimization, leading to more calls and conversions. Key features include instant SMS & call outreach, DNC & compliance filtering, automated follow-ups, dynamic call routing, and real-time analytics. Pipes.AI also supports AI-powered SMS drip campaigns, call & text personalization, automated appointment scheduling, and seamless CRM integration, making it ideal for sales and customer acquisition, customer service, and specific industries like moving companies and home services.
Lmao
Lmao AI is the world's first real-time AI prank calling app, designed to deliver unhinged and hilarious prank calls. Unlike traditional apps that use tired recordings, Lmao AI leverages cutting-edge real-time AI voices that sound indistinguishable from real people, making every prank feel hilariously human. Users can choose from a rich library of voices, including various accents and iconic celebrity impressions like P Diddy, Donald Trump, and Joe Biden. The AI adapts on the fly, enabling dynamic conversations that never break character. Users can type what they want the AI to say, and recordings of the calls are saved for sharing. The app uses a spoofed number to keep the user's real number hidden, ensuring privacy.
Twilio
Twilio offers a comprehensive Customer Engagement Platform (CEP) that integrates communication APIs with AI and first-party data. Developers can leverage Twilio's APIs for various communication channels including SMS, WhatsApp, voice, and email, alongside features like conversational AI, customer data platforms, and authentication tools. The platform supports use cases such as fraud prevention, alerts, marketing, and customer support, allowing businesses to create personalized customer experiences. Twilio emphasizes its builder-centric approach, providing tools and support for developers to quickly integrate and scale communication solutions, backed by transparent pricing and a free trial option.
VoiceCalc
VoiceCalc is the #1 free AI voice calculator for iPhone that allows users to instantly solve math problems through natural speech. This innovative app eliminates the need for typing, enabling users to simply speak their math questions and receive immediate answers. Key functionalities include comparing prices, splitting bills with tips, converting units and currencies, and solving complex equations. It also features a time zone calculator and supports 17 languages. VoiceCalc prioritizes user privacy by processing basic calculations offline and on-device, ensuring voice data never leaves the device. For advanced AI features, only text is sent for processing, with no storage of user data or voice recordings. It's ideal for everyday math, shopping, and homework, acting as a smart, hands-free math assistant.
Xtreme Gen Ai
Xtreme Gen AI empowers brands to rapidly build and deploy 24/7, language-neutral, CRM-integrated voice AI agents. These agents are powered by advanced speech models, delivering natural pacing, clarity, and tone for human-like conversations. The platform seamlessly integrates with existing CRM systems to log conversations, capture leads, and automatically update appointment outcomes. It also offers real-time calendar integration for scheduling, availability checks, smart slot suggestions, and instant confirmations. Xtreme Gen AI serves businesses in the US, UK, and Canada, providing an AI front desk that answers every call around the clock, turning missed calls into appointments.
Dr. Lambda
Dr. Lambda, operating as ChatSlide AI, is an AI-powered platform designed to streamline the creation of presentations, videos, and social media content. Users can generate professional slides in seconds by uploading documents, pasting URLs, or describing a topic. The AI handles layout, design, and content organization, supporting various input formats like PDF, DOCX, PPTX, TXT, and images, as well as content from YouTube and research databases. It offers output in standard PPTX, PDF, and AI-generated video formats. ChatSlide AI utilizes GPT-4o as its default model, with premium plans offering access to GPT-5.3 and 29 AI models for image generation, including Imagen 4 and Stable Diffusion. The platform also supports multi-language content creation and translation, AI voiceovers, voice cloning, and smart chart generation, making it a versatile tool for content creators, educators, and business professionals.
Greetai.co
GreetAI is an AI-powered platform designed to automate the initial screening process for hiring, admissions, and team evaluation. It enables users to set up structured AI voice interviews with custom questions, scorecards, and scenarios. Candidates interact with an AI interviewer at their convenience, and the system generates detailed reports including match scores, transcripts, summaries, and recommendations. This helps hiring managers, founders, and CEOs quickly identify top talent, reduce manual screening time, and ensure consistent evaluation across all applicants. GreetAI supports various workflows, from recruitment to academic admissions and internal team assessments, offering features like CSV export for applicant data and custom logic for rate calculation.
A1Base (YC W25)
A1Base (YC W25) offers an API designed to give AI agents a trusted identity, complete with phone numbers and email capabilities. This platform aims to free AI agents from traditional chat interfaces, enabling them to interact with the real world more autonomously and securely. By providing these essential communication tools, A1Base helps unlock the full potential of AI, allowing developers to build more sophisticated and independent AI applications. The service emphasizes a secure platform for integrating these real-world communication features into AI agents.
ten-framework
TEN is an open-source framework designed for creating real-time multimodal conversational AI agents. It provides a comprehensive ecosystem including the TEN Framework itself, Agent Examples, VAD (Voice Activity Detector), Turn Detection, and a Portal. Developers can leverage TEN to build various voice AI applications, from low-latency multi-purpose voice assistants to specialized tools like Doodler for sketch generation, Speaker Diarization, Lip Sync Avatars, and SIP Call integration. The framework supports deployment via Docker or other cloud services, offering flexibility for self-hosting and customization. It also includes resources for quick starts, documentation, and community support through Discord, LinkedIn, and Hugging Face.
bntr - AI-Powered Virtual Agents
bntr is an AI-powered virtual agent platform designed to automate and enhance customer interactions through both voice and chat AI solutions. The platform is engineered for easy setup, leveraging customer data to quickly train its AI models. It aims to provide comprehensive 24/7 customer support, helping businesses manage high volumes of inquiries efficiently. By deploying bntr, organizations can reduce call handling times, improve customer satisfaction, and ensure consistent service quality across all touchpoints. The tool is ideal for businesses looking to scale their customer service operations without significantly increasing their human resource overhead.
VibeVoice-ComfyUI
VibeVoice-ComfyUI provides a comprehensive integration for Microsoft's VibeVoice text-to-speech model directly within ComfyUI workflows. This tool allows users to generate natural speech with single or multiple speakers, supporting up to four distinct voices in a conversation. Key features include optional voice cloning from audio samples, fine-tuning voices with custom LoRA adapters, and adjustable voice speed control. It also handles long texts seamlessly with automatic chunking and custom pause tags. The integration is self-contained, cross-platform, and supports various backends like CUDA, CPU, and Apple Silicon's MPS, offering flexible configuration for attention mechanisms, diffusion steps, and memory management, including 4-bit and 8-bit quantization for VRAM savings.
vosk-server
Vosk-server is an open-source speech recognition server designed for highly accurate offline transcription. It leverages the powerful Kaldi and Vosk-API libraries to deliver robust speech-to-text capabilities without requiring an internet connection. The server offers flexibility through its support for multiple communication protocols, including MQTT, gRPC, WebRTC, and Websocket, making it adaptable to various application environments. It can be deployed locally to provide speech recognition for smart home systems or PBX solutions like FreeSWITCH and Asterisk. Additionally, vosk-server can function as a backend for streaming speech recognition on the web, powering chatbots, websites, and telephony applications. Its focus on offline processing and high accuracy makes it a valuable tool for developers and organizations requiring reliable speech recognition in diverse settings.
alloy-voice-assistant
alloy-voice-assistant is an open-source project available on GitHub designed for developers to create and experiment with AI voice assistants. The project provides a foundational framework for building a sample AI assistant, requiring both an OPENAI_API_KEY and a GOOGLE_API_KEY for its functionality. Users can store these keys in a .env file or set them as environment variables. The repository includes clear instructions for setting up a virtual environment, installing necessary packages, and running the assistant, with specific guidance for Apple Silicon users. This tool is ideal for those looking to understand the mechanics of AI voice assistants and build custom applications.
Moneypenny USA
Moneypenny USA offers a comprehensive customer communication service, blending the expertise of human agents with advanced AI technology to provide 24/7 support across voice and text channels. The platform helps businesses manage customer interactions, qualify leads, and scale operations efficiently. Key features include a human-sounding AI Receptionist for automated call handling, KnowledgeBase for instant information retrieval, and MessageMaker for auto-drafting call summaries. Moneypenny also provides outsourced switchboard services, managed live chat, and multichannel customer service, ensuring consistent and emotionally intelligent responses. It serves a wide range of industries, helping businesses enhance customer experience and improve marketing performance.