Coding & Development
Browsing page 23 of AI tools for Testing & QA in Coding & Development. Sorted by confidence score — our independent quality rating.
MuukTest
MuukTest is an AI-driven test automation platform designed to accelerate software testing processes and achieve high test coverage. It combines AI-powered automation with expert QA engineers to deliver a fully managed testing solution. MuukTest integrates industry-standard frameworks like Selenium, Playwright, and Appium, offering resilient, self-healing tests that adapt to Agile and DevOps workflows. The platform provides comprehensive automated testing services for web, mobile, and API applications, including end-to-end, regression, integration, functional, accessibility, and performance testing. It aims to reduce testing bottlenecks, lower QA costs, and prevent critical bugs from reaching production, allowing development teams to focus on product innovation.
Lintrule
Lintrule is a command-line interface (CLI) tool designed to leverage large language models (LLMs) for automated code reviews. It allows developers to define custom rules in plain language, written as Markdown files, to enforce coding policies, identify potential bugs, and ensure compliance standards like SOC2. The tool integrates with Git, running checks on diffs by default, and can be configured to run on specific branches or commit ranges. Lintrule supports parallel rule execution for speed and offers flexibility in configuring rules to apply only to certain file types. It aims to enhance code quality and efficiency by automating aspects of the code review process that traditional linters or manual reviews might miss.
Rippletide Eval CLI
Rippletide Eval CLI is an interactive terminal tool designed for evaluating AI agent endpoints. It plays a crucial role in ensuring the trustworthiness of AI agents by delivering the right context at the right moment and enforcing decisions at runtime. The tool generates questions directly from an agent’s knowledge base, facilitating comprehensive evaluation. It supports predefined questions, which are essential for reproducible benchmarking and consistent testing. Rippletide Eval CLI provides instant feedback on performance with real-time progress reports and delivers key performance indicators (KPIs) related to hallucination, helping developers identify and mitigate issues effectively.
API-Monitor
API-Monitor is a specialized tool designed to provide instant alerts for changes in third-party APIs. It eliminates the need for constant dashboard monitoring, notifying users via email or webhooks when an API's structure or status code changes. The service checks API endpoints every 5, 15, or 60 minutes, tracking response structures and detecting modifications like missing fields, new fields, or type changes. This proactive monitoring helps prevent production failures and reduces debugging time, ensuring applications remain functional even when external APIs evolve. It offers a simple setup process, requiring only an API endpoint URL and optional headers or webhook configurations.
model_analyzer
Triton Model Analyzer is a command-line interface (CLI) tool designed to help users better understand the compute and memory requirements of models running on the Triton Inference Server. It assists in finding optimal configurations for various model types, including single, multiple, ensemble, and BLS models, on a given piece of hardware. The tool offers several search modes, such as Optuna Search for hyperparameter optimization, Quick Search for sparse exploration of batch size and instance group parameters, and Automatic/Manual Brute Search for exhaustive parameter sweeps. Model Analyzer also supports profiling Large Language Models (LLMs) and generates detailed and summary reports to highlight trade-offs between different model configurations. Users can apply QoS constraints to filter results based on specific latency or other performance requirements.
ollama-grid-search
ollama-grid-search is a multi-platform desktop application designed to evaluate and compare Large Language Models (LLMs). Written in Rust and React, it automates the process of selecting optimal models, prompts, or inference parameters for a given use case. Users can iterate over various combinations and visually inspect the results, making it an invaluable tool for prompt engineering and model selection. The application assumes Ollama is installed and serving endpoints, either locally or on a remote server. Key features include automatic fetching of models from Ollama servers, A/B testing of prompts, a fully functional prompt database, and the ability to list, inspect, and re-run past experiments.
strix
Strix is an open-source AI security tool designed to identify and remediate application vulnerabilities. It employs autonomous AI agents that mimic real hackers, dynamically running code to find and validate vulnerabilities with proof-of-concepts. Built for developers and security teams, Strix offers fast, accurate security testing without the overhead of manual penetration testing or the false positives common with static analysis tools. Key capabilities include a full hacker toolkit, collaborative agent teams, real validation with PoCs, a developer-first CLI with actionable reports, and auto-fix and reporting features to accelerate remediation. It integrates seamlessly with GitHub Actions and CI/CD pipelines, allowing for automatic vulnerability scanning on every pull request.
agent-device
agent-device is a command-line interface (CLI) designed for AI agents to control and observe iOS, tvOS, macOS, Android, and AndroidTV devices. It facilitates UI automation by providing structured snapshots of the accessibility tree, allowing agents to understand and interact with mobile UIs efficiently. The tool supports deterministic interactions, session-aware workflows, and replayable flows, making it suitable for repeated automation runs and debugging. Key features include inspecting UI states, collecting logs, network inspection, and performance snapshots. It also integrates with React DevTools for deeper component-level insights, making it a comprehensive solution for agent-driven mobile app testing and automation.
web-codegen-scorer
Web Codegen Scorer is a robust tool designed for evaluating the quality of web code generated by Large Language Models (LLMs). It enables developers to make evidence-based decisions regarding AI-generated code, offering features to iterate on system prompts, compare code quality across various models, and monitor generated code quality over time. The tool focuses specifically on web code and utilizes well-established measures of code quality, including built-in checks for build success, runtime errors, accessibility, security, LLM rating, and coding best practices. It also supports automatic repair attempts for detected issues and provides an intuitive report viewer UI to compare results.
AI Notes - Voice & Notepad
AI Notes - Voice & Notepad is an Android mobile application focused on enhancing productivity through efficient note-taking and task management. The app allows users to create notes, record voice memos, and manage to-do lists. It incorporates AI capabilities for smart note generation, aiming to simplify the process of capturing and organizing information on the go. The tool is designed to help users keep track of their thoughts and tasks, making it easier to stay organized and productive.
Modelwise
Modelwise offers Paitron, an AI-driven solution designed to automate functional safety analysis for critical hardware systems. It seamlessly integrates into existing engineering workflows, significantly reducing the time required for safety analysis from weeks to mere hours. Paitron automates at least 80% of FMEA tasks, including Design-, System-, and Piece-part-FMEA, by using model-based failure propagation and qualitative models. This allows for early identification of design flaws, leading to substantial cost savings and increased product quality. The software supports various modeling tools like Xpedition, Matlab Simulink, and LTspice, and is being qualified to industry standards such as IEC 61508 and ISO 26262.
Neurolabs
Neurolabs offers a powerful image recognition platform, ZIA, specifically designed for the CPG (Consumer Packaged Goods) space. It leverages synthetic data and digital twins to provide highly accurate inventory detection, even in challenging retail environments with poor lighting or damaged packaging. The platform aims to streamline retail execution by offering complete visibility across all operations, from sales and category management to field operations. Key capabilities include 8x faster store audits, 10x faster catalogue onboarding, and 4x faster deployment at scale. Neurolabs integrates seamlessly with existing Sales Force Automation technologies and offers real-time insights through its ChatCPG feature, helping businesses maximize revenue and optimize promotional activities.
QA.tech
QA.tech is an AI testing platform designed to automate end-to-end (E2E), regression, exploratory, and PR testing for web and mobile applications. It employs AI agents that act like real users to test full user journeys, including interactions with third-party apps and email verification, across various platforms. The tool provides instant feedback, integrating into modern development workflows without requiring extensive infrastructure. QA.tech aims to shorten the Dev-QA feedback loop, reduce manual testing hours, and catch bugs early. It offers actionable feedback, detailed logs for debugging, and the ability to ask the AI what to test next in plain English, covering new cases and exploring products like a user would. It also supports integration with tools like GitHub, Slack, and Linear.
Aspen
Aspen is a free, native API testing application designed for macOS, focusing specifically on REST APIs. It operates with a zero-trust policy, meaning all operations are performed locally on your machine without requiring a login, ensuring high data security and privacy. The tool integrates an AI assistant, named Alfred, to significantly speed up API integrations and development by generating data models, OpenAPI Specifications, and integration code. Aspen also features Collections, allowing users to organize, import, export, and share API requests, facilitating teamwork and reuse. It supports importing from tools like Postman, making it a versatile option for developers seeking an efficient and secure API testing solution.
jiwer
JiWER is a simple and fast Python package designed for evaluating automatic speech recognition (ASR) systems. It supports several key similarity measures, including word error rate (WER), match error rate (MER), word information lost (WIL), word information preserved (WIP), and character error rate (CER). These measures are computed efficiently using the minimum-edit distance algorithm, powered by the high-performance RapidFuzz library which leverages C++ for speed. The package also defines specific behaviors for empty reference and hypothesis pairs, addressing potential division-by-zero issues and allowing for testing models on silent audio. JiWER is released under the Apache License, Version 2.0, making it a robust and accessible tool for developers working with speech-to-text technologies.
Alfred AI
Alfred AI is an intelligent API assistant designed to transform developer portals by automating workflows and accelerating API operations. It can generate integration code and data models in any language and framework, simplifying the integration process for customers and speeding up onboarding. Users can ask Alfred anything about their API using natural language, and it will instantly provide answers, discover endpoints, and understand API structures. This tool aims to reduce integration support requests by 15x and accelerate API integrations, discovery, and adoption by 10x. Alfred AI can be easily embedded into any developer portal with a single line of code and an OpenAPI Specification, making it a powerful addition for enhancing developer experience and boosting revenue.
Lightrun
Lightrun is an AI SRE platform designed to enhance production reliability by providing live runtime context for incident investigation and resolution. It enables developers, SREs, and AI agents to autonomously prevent and remediate software issues from code to production. The platform offers features like sandboxed instrumentation for logs, traces, metrics, and snapshots, allowing for deep code research and real-time end-to-end remediation. Lightrun helps reduce Mean Time To Resolution (MTTR) by triaging alerts, inspecting live execution, generating runtime evidence, and correlating it with code and infrastructure changes to prove root causes. It also facilitates autonomous remediation, offering fix recommendations and postmortems, and allows for validation of changes before release and testing on production traffic. The platform integrates with various tools and supports multiple programming languages and IDEs, ensuring security and compliance with standards like ISO 27001 and SOC 2 Type II.
evidential-deep-learning
evidential-deep-learning is an open-source Python package designed to help neural networks learn their own measures of uncertainty directly from data. It provides the necessary code to reproduce the Deep Evidential Regression paper published in NeurIPS 2020, offering a general framework for evidential learning. The tool allows users to integrate evidential layers and loss functions into existing `tf.keras` model pipelines, supporting both fully connected and convolutional layers. This enables the development of models that can provide fast, scalable, and calibrated measures of uncertainty, enhancing their trustworthiness and utility. The package is compatible with Python (>=3.7) and TensorFlow (>=2.0), with PyTorch support planned.
AgentBench
AgentBench is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) as agents across a diverse spectrum of environments. It encompasses 8 distinct environments, including 5 newly created domains like Operating System (OS), Database (DB), Knowledge Graph (KG), Digital Card Game (DCG), and Lateral Thinking Puzzles (LTP), alongside 3 recompiled from published datasets (House-Holding, Web Shopping, Web Browsing). The platform offers both Dev and Test splits for each dataset, requiring LLMs to generate responses thousands of times for thorough evaluation. AgentBench also introduces VisualAgentBench for evaluating and training visual foundation agents based on large multimodal models (LMMs), covering embodied, GUI, and visual design environments. It supports quick setup using Docker Compose and provides benchmarking results via a leaderboard.
Bugzy AI
Bugzy AI acts as an autonomous QA agent, running comprehensive quality assurance on every pull request and deployment. It connects to development tools like repositories, ticketing systems, and documentation to build a knowledge graph of your project. Using this context, Bugzy generates and executes end-to-end tests, triages failures, and reports bugs with full context, including reproduction steps, severity, and suggested fixes. The tool integrates with modern tech stacks, supporting popular languages and frameworks, and hooks into GitHub, GitLab, and CI services. Bugzy focuses on outcome-based pricing, charging for triages and test case creations rather than compute time, making it a cost-effective solution for maintaining high coverage without increasing headcount.
eli5
eli5 is a Python package designed to help debug and inspect machine learning classifiers, providing explanations for their predictions. It supports a wide range of machine learning frameworks, including scikit-learn, Keras (for Grad-CAM visualizations), xgboost, LightGBM, CatBoost, and lightning. The library can explain weights and predictions of linear classifiers, print decision trees, show feature importances, and debug scikit-learn pipelines. Additionally, eli5 implements algorithms for inspecting black-box models, such as TextExplainer for LIME-based explanations and permutation importance for feature importances. Explanations can be formatted for console display, HTML embedding, pandas DataFrames, or JSON for custom rendering.
Review-Gate
Review-Gate is a specialized tool designed to integrate with the Cursor IDE, significantly enhancing the code review process. It provides interactive AI assistance, allowing developers to engage with the AI through various modalities including text, voice, and image uploads. This multi-modal interaction facilitates a more dynamic and efficient review cycle. The tool is particularly adept at supporting iterative work within a single request, which streamlines the coding process and helps developers refine their code more effectively. By offering these advanced AI-powered features, Review-Gate aims to improve the overall quality and speed of code development and review.
Gentrace
Gentrace was a platform designed for AI agent tracing, evaluation, and error analysis, providing tools for developers working with intelligent applications. It offered features to debug agent traces, create smart monitoring columns, and build tailored evaluations. The platform supported various integrations including AI SDK, LangChain, LangGraph, Mastra, Next.js, OpenAI Agents, OpenAI (JS), OpenAI (Python), and Pydantic AI. Gentrace also provided functionalities for error analysis, experiments, datasets, unit tests, and dataset tests, aiming to enhance the development and reliability of AI agents. The code for Gentrace has been released on GitHub under the MIT license following its shutdown.
PACT | Free Compliance Audit
PACT | Free Compliance Audit is an AI-powered tool designed to help website owners ensure their online presence meets critical legal and accessibility standards. It provides instant compliance audits, checking against ADA accessibility guidelines, GDPR privacy regulations, and WCAG 2.1 standards. Users can quickly assess their website's compliance status without the need for registration, making it a convenient solution for identifying potential issues. The tool aims to simplify the complex process of regulatory adherence, offering a free and accessible way to maintain a compliant and inclusive online environment.