AI Agents & Automation
Browsing page 109 of AI Frameworks & Infra in AI Agents & Automation. Sorted by confidence score — our independent quality rating.
The JADA Squad
The JADA Squad specializes in designing, building, deploying, and managing custom AI agents for businesses. They focus on creating Agentic AI solutions with strong governance, observability, and operational controls to help organizations work smarter and faster. JADA offers a rapid development process, promising a working prototype in just 3 days and full deployment of a custom AI agent in 10 days. Their agents learn business rules, integrate with existing workflows, and always include human oversight for monitoring usage, reviewing edge cases, and continuous tuning. They serve various functions including HR, sales, procurement, marketing, and due diligence, ensuring secure and governed operations with top AI talent.
tt-metal
tt-metal offers a comprehensive platform for developing and optimizing neural networks on Tenstorrent hardware. It includes TT-NN, a Python & C++ Neural Network OP library, and TT-Metalium, a low-level programming model for kernel development. The platform provides tools like TT-NN Visualizer for analyzing model execution, TT-Exalens for low-level debugging, and TT-SMI for device management. It supports various models including Llama 3.3, Qwen 2.5, Whisper, and Mixtral, with detailed performance metrics. tt-metal is designed for AI developers and hardware engineers looking to leverage Tenstorrent's specialized accelerators for high-performance AI applications, offering extensive documentation and programming examples.
TinyZero
TinyZero offers a minimal reproduction of DeepSeek R1-Zero, focusing on reinforcement learning tasks. Built upon the veRL library, this tool allows 3B base Large Language Models (LLMs) to independently develop self-verification and search capabilities. The project provides scripts and instructions for data preparation and training, including configurations for single GPU and multi-GPU setups, and supports instruct ablation experiments. While the repository is no longer actively maintained, it serves as a valuable resource for understanding and replicating the core concepts of DeepSeek R1-Zero, particularly for researchers and developers exploring advanced RL techniques for LLMs.
TNN
TNN is a high-performance, lightweight neural network inference framework developed by Tencent Youtu Lab and Guangying Lab. It provides a uniform deep learning inference solution for mobile, desktop, and server environments. Key features include cross-platform compatibility, high performance, model compression, and code pruning. Building upon the foundations of ncnn and Rapidnet, TNN enhances support and optimizes performance specifically for mobile devices, while also incorporating the extensibility and high-performance characteristics of other open-source frameworks. It has been deployed in various Tencent applications like Mobile QQ, Weishi, and Pitu, and serves as a core acceleration framework for Tencent Cloud AI. TNN supports models from TensorFlow, PyTorch, MxNet, and Caffe via ONNX, and runs on Android, iOS, embedded Linux, Windows, and Linux, compatible with ARM CPU, X86 GPU, and NPU hardware.
tiny-llm
tiny-llm provides a comprehensive course for system engineers focused on learning LLM inference serving, specifically tailored for Apple Silicon. The curriculum guides users through building a tiny vLLM using MLX and Qwen, with a codebase primarily utilizing MLX array/matrix APIs. This approach allows participants to construct model serving infrastructure from scratch, gaining deep insights into optimizations. The course covers essential components like attention, RoPE, KV cache, and continuous batching, with a roadmap extending to advanced topics such as Paged Attention and Speculative Decoding. It's designed for those who want to understand the underlying techniques for efficiently serving large language models.
Video-MME
Video-MME is the first-ever comprehensive evaluation benchmark designed to assess the capabilities of Multi-modal Large Language Models (MLLMs) in video analysis. It covers a wide range of visual domains, temporal durations, and data modalities, including short, medium, and long-term videos (from 11 seconds to 1 hour). The benchmark comprises 900 videos totaling 254 hours and 2,700 human-annotated question-answer pairs. It integrates multi-modal inputs beyond video frames, such as subtitles and audios, to provide a full-spectrum evaluation. Video-MME is suitable for both image MLLMs and video MLLMs, offering a robust framework for evaluating model performance in understanding and processing sequential visual data.
web-llm
WebLLM is a high-performance, in-browser LLM inference engine designed to bring language model inference directly onto web browsers with hardware acceleration. It operates entirely within the browser, eliminating the need for server support and leveraging WebGPU for enhanced performance. The engine is fully compatible with OpenAI API, allowing users to apply the same API functionalities, including streaming, JSON-mode, and function-calling, to open-source models locally. WebLLM supports a wide range of models like Llama 3, Phi 3, Gemma, and Mistral, and allows for custom model integration in MLC format. It offers structured JSON generation, real-time interactions, and supports Web Worker and Service Worker for optimized performance and offline capabilities.
WeightWatcher
WeightWatcher (WW) is an open-source, diagnostic tool designed to analyze Deep Neural Networks (DNNs) and predict their accuracy. It operates without requiring access to training or even test data, leveraging theoretical research into Heavy-Tailed Self-Regularization (HT-SR), Random Matrix Theory (RMT), Statistical Mechanics, and Strongly Correlated Systems. Users can analyze pre/trained pyTorch, Keras, and other DNN models (Conv2D and Dense layers), monitor model layers for over-training or over-parameterization, and predict test accuracies across different models. The tool also helps detect potential problems when compressing or fine-tuning pretrained models and provides layer warning labels like 'over-trained' or 'under-trained'. It offers various generalization metrics and advanced diagnostics like correlation trap analysis and experimental early stopping detection.
AgentKit
AgentKit offers a unified framework for explicitly constructing complex human "thought processes" from simple natural language prompts. It utilizes a graph-based approach, where users connect nodes like LEGO pieces to design and enforce structured thought processes. This allows for the integration of various functionalities to build multifunctional agents. A basic agent can be implemented as a list of prompts for subtasks, making it accessible for users without programming experience. The framework supports dynamic modification of the Directed Acyclic Graph (DAG) at inference time, enabling advanced capabilities like branching based on LLM responses. AgentKit provides built-in LLM API support for OpenAI, Anthropic, and Ollama models, with options for token usage tracking and error handling.
auto-evaluator
Auto-evaluator is a lightweight, open-source evaluation tool designed for question-answering systems utilizing Langchain. It streamlines the process of assessing LLM QA chains by allowing users to input documents, then automatically generating question-answer pairs using GPT-3.5-turbo. The tool then uses a specified QA chain to generate responses to these questions and employs GPT-3.5-turbo again to score the responses against the generated answers. This enables users to explore and compare scoring across various chain configurations, making it an invaluable resource for developers and researchers working on improving the accuracy and performance of their LLM-powered QA applications. It can be run as a Streamlit app and offers configurable inputs for evaluation parameters.
ai-hedge-fund-crypto
AI-Hedge-Fund for Crypto is a sophisticated, open-source algorithmic trading framework designed for cryptocurrency markets. It utilizes a graph-based workflow architecture, ensemble technical analysis, and AI language models to make data-driven trading decisions. The system employs a directed acyclic graph (DAG) of specialized nodes for multi-timeframe analysis, enabling sophisticated signal generation through weighted combinations of diverse trading strategies. Key features include AI-enhanced decision-making via LLMs for portfolio management, a compositional architecture with distinct nodes for data fetching, strategy execution, risk management, and portfolio management, and a signal ensemble approach aggregating multiple technical strategies. It also offers comprehensive backtesting with detailed metrics and visualizations, dynamic strategy visualization, and configurable timeframes and strategies, allowing users to customize their analysis without modifying core code.
llm.c
llm.c is an open-source project designed for training Large Language Models (LLMs) using simple, raw C/CUDA, aiming to provide a lightweight and efficient alternative to frameworks like PyTorch. The project's primary focus is on pretraining, specifically reproducing the GPT-2 and GPT-3 miniseries. It includes a parallel PyTorch reference implementation (a tweaked nanoGPT) for comparison, and currently boasts a performance edge over PyTorch Nightly. The repository offers a clean, ~1,000-line CPU fp32 implementation in C, alongside bleeding-edge CUDA code. It supports single and multi-GPU training, multi-node training, and integrates with libraries like cuBLAS, cuBLASLt, CUTLASS, and cuDNN for optimized performance. The project also serves an educational purpose, providing documented kernels and tutorials for understanding LLM layer implementations.
Automated Continual Learning from New Data
Automated Continual Learning from New Data is an AI system designed to continuously learn from new data inputs, enabling the development of adaptive AI models. This tool facilitates real-time data analysis and dynamic model training, making it suitable for applications requiring continuous adaptation and improvement. Built using the AutoGen framework, it supports multi-agent AI applications, allowing for complex interactions and sophisticated learning processes. The system is particularly valuable for scenarios where AI models need to evolve with new information without manual retraining, ensuring up-to-date performance and relevance. Its foundation in AutoGen suggests capabilities for orchestrating multiple AI agents to achieve complex tasks.
awesome-claude-skills
awesome-claude-skills is a comprehensive, curated list of Claude Skills, resources, and tools designed to customize and enhance Claude AI workflows, with a particular focus on Claude Code. Claude Skills are specialized folders containing instructions, scripts, and resources that Claude dynamically discovers and loads when relevant to tasks. This open-source GitHub repository details how Skills work, their progressive disclosure architecture for efficiency, and provides guides for getting started via the Claude.ai web interface, Claude Code CLI, or Claude API. It features official skills for document processing (docx, pdf, pptx, xlsx), design (algorithmic-art, canvas-design), development (frontend-design, web-artifacts-builder), communication, and skill creation. The repository also highlights community-contributed skills, tools for skill creation, best practices, and security guidelines, emphasizing the importance of vetting skills due to arbitrary code execution capabilities.
MiniGPT-4
MiniGPT-4 is an open-source initiative dedicated to advancing vision-language understanding by integrating advanced large language models. The project offers open-sourced code for both MiniGPT-4 and its successor, MiniGPT-v2, enabling researchers and developers to explore and build upon state-of-the-art vision-language capabilities. It functions as a unified interface, facilitating multi-task learning across various vision and language domains. The project provides detailed instructions for installation, preparation of pretrained LLM weights (including Llama2 Chat and Vicuna), and model checkpoints. Users can launch local demos for both MiniGPT-v2 and MiniGPT-4, with options to optimize GPU memory usage. Training and finetuning details are also provided, making it a comprehensive resource for those working with vision-language models.
meltingpot
Melting Pot is an open-source suite of test scenarios specifically designed for multi-agent reinforcement learning (MARL). Developed by Google DeepMind, it offers researchers a robust platform to train and evaluate AI agents in complex social situations. The tool includes over 50 multi-agent games (substrates) and more than 256 unique test scenarios, allowing for the assessment of generalization to novel social interactions like cooperation, competition, and trust. It is built on DeepMind Lab2D and provides tools for interactive play, evaluation of trained models, and example training scripts using frameworks like RLlib. Melting Pot aims to become a standard benchmark for MARL research, with ongoing development to expand its coverage of social interactions and generalization scenarios.
CloudApper AI
CloudApper AI is an enterprise-ready platform designed to help organizations build, deploy, and manage AI agents and solutions without requiring extensive coding. It aims to close the 'AI Gap' in enterprise software by layering AI onto existing systems, addressing challenges like aging software, lack of AI expertise, integration issues, and programming needs. The platform offers a no-code/low-code environment for creating custom AI agents for various functions, industries, and initiatives, including HR, sales, marketing, and IT. CloudApper AI emphasizes security, scalability, and ease of maintenance, allowing businesses to automate workflows, optimize operations, and boost efficiency across departments. It also highlights seamless integration with thousands of third-party systems and a commitment to data privacy.
Business Brio
Business Brio specializes in delivering custom AI solutions designed to provide measurable business impact. They leverage AI, machine learning, and advanced analytics to help organizations transform complex data into actionable insights and high-impact business decisions. With over a decade of experience, Business Brio embeds data science and AI into real business processes to unlock smarter outcomes. They serve various industries including Financial Services, Insurance, Telecom, Manufacturing, Consumer Goods, and Utility, offering solutions that drive innovation, improve decision-making, optimize operations, and boost customer value. Business Brio is recognized for its innovation in Analytics and AI by NASSCOM and contributes to global ISO standards for responsible AI.
mergoo
Mergoo is an open-source Python library designed to simplify the process of merging multiple Large Language Model (LLM) experts and then efficiently training the resulting merged LLM. It enables users to integrate knowledge from different generic or domain-specific LLM experts, supporting methods such as Mixture-of-Experts (MoE) and Mixture-of-Adapters (MoA). The library offers flexible merging for each layer and supports popular base models like Llama (including LLaMa3), Mistral, Phi3, and BERT. It is compatible with various trainers including Hugging Face Trainer, SFTrainer, and PEFT, and can run on CPU, MPS, and GPU devices. Mergoo allows for training choices ranging from only the Router of MoE layers to fully fine-tuning the merged LLM.
Function Inception: Update/Remove Functions During Conversations
Function Inception is a key feature within the AutoGen framework, designed to empower AI agents with the ability to dynamically modify their functional capabilities during ongoing conversations. This advanced feature allows agents to update existing functions or remove them as needed, significantly enhancing their adaptability and interaction within complex conversational contexts. It facilitates the creation of sophisticated multi-agent AI applications that can operate autonomously or collaborate effectively with human users. By enabling agents to evolve their toolset in real-time, Function Inception supports more flexible and responsive AI systems, making it an essential component for developers building dynamic AI solutions.
neuronika
Neuronika is a machine learning framework built entirely in Rust, emphasizing ease of use, rapid prototyping, and performance. At its core, Neuronika utilizes reverse-mode automatic differentiation, enabling the creation of dynamically changing neural networks with minimal effort and overhead through a lean, imperative, and define-by-run API. The framework leverages the power of the Rust language to offer an intuitive and efficient interface without the need for Foreign Function Interfaces (FFI). It supports GPU-accelerated primitives via CUDA, serialization with Serde, and transparent BLAS support for optimized matrix multiplication. Neuronika is currently in active development, with breaking changes expected as it evolves.
Microsoft AutoGen
Microsoft AutoGen is a programming framework designed for creating multi-agent AI applications. It allows developers to build AI systems that can operate independently or in conjunction with human users. The framework supports automated workflows and facilitates collaboration between multiple AI agents. AutoGen features a layered and extensible design, offering a Core API for message passing and event-driven agents, an AgentChat API for rapid prototyping of multi-agent patterns, and an Extensions API for integrating LLM clients and capabilities like code execution. While AutoGen is now in maintenance mode, existing users can continue to leverage its architecture. For new projects, Microsoft recommends its successor, Microsoft Agent Framework, which offers enterprise-grade support. AutoGen also provides developer tools like AutoGen Studio for no-code GUI development and AutoGen Bench for evaluating agent performance.
mlx-lm
mlx-lm is a Python package designed for generating text and fine-tuning large language models (LLMs) specifically on Apple silicon using the MLX framework. It offers seamless integration with the Hugging Face Hub, allowing users to easily access and utilize a vast array of LLMs with simple commands. Key features include support for quantizing models, uploading them to the Hugging Face Hub, and performing both low-rank and full model fine-tuning, even with quantized models. The package also provides distributed inference and fine-tuning capabilities with `mx.distributed`, and tools for efficient handling of long prompts and generations through a rotating fixed-size key-value cache and prompt caching.
neurojs
neurojs is an open-source JavaScript framework designed for deep learning and reinforcement learning applications within the browser environment. While it mainly focuses on reinforcement learning, it is versatile enough for various neural network-based tasks. The library includes practical examples and demos, such as a 2D self-driving car visualization, to showcase its capabilities. It supports advanced features like uniform and prioritized replay buffers, advantage-learning, and models such as deep-q-networks and actor-critic (via deep-deterministic-policy-gradients). neurojs also allows for binary import and export of network configurations, including weights, and is built for high performance. However, development on neurojs is no longer actively maintained, with the recommendation to use more general frameworks like TensorFlow-JS.