ShypdShypd.ai
🤖

AI Agents & Automation

Browsing page 105 of AI Frameworks & Infra in AI Agents & Automation. Sorted by confidence score — our independent quality rating.

LightCompress

LightCompress

60%

LightCompress is an open-source toolkit designed for compressing large AI models such as Large Language Models (LLMs), Vision-Language Models (VLMs), and video generative models. It offers a comprehensive suite of state-of-the-art compression algorithms, including various quantization methods (integer, floating-point, mixed-precision) and sparsity techniques (structured, unstructured). The tool supports a wide array of popular models like LLaMA, Mistral, and DeepSeekv2, and ensures compatibility with multiple inference backends such as VLLM, Sglang, and AutoAWQ. LightCompress aims to significantly reduce model size and improve inference efficiency while maintaining high accuracy, making it ideal for deploying large models on resource-constrained hardware.

RebelsAI

RebelsAI

60%

RebelsAI is a consulting company dedicated to helping businesses integrate and leverage Generative AI solutions. They focus on establishing internal Centers of AI Excellence within organizations, empowering employees and fostering a culture of innovation. RebelsAI provides custom solutions, meticulously tailored to meet specific organizational needs and seamlessly integrate with existing systems such as ERP and CRM. Their approach is designed to significantly improve process efficiency, enhance decision-making through data-driven insights, and ultimately help businesses thrive in an AI-driven landscape. They aim to transform operations and unlock new potential by strategically implementing AI technologies.

llama-swap

llama-swap

60%

llama-swap is a robust AI Agents & Automation tool designed for reliable model swapping across local OpenAI and Anthropic compatible servers, including llama.cpp and vllm. It allows users to run multiple generative AI models on their machine and hot-swap between them on demand. Built in Go for performance and simplicity, llama-swap boasts zero dependencies and is incredibly easy to set up with just one binary and one configuration file. It supports a wide range of OpenAI and Anthropic API endpoints, as well as specific endpoints for llama-server and SDAPI. The tool also includes a real-time web UI with a playground for testing models, viewing token metrics, and monitoring logs, making it a comprehensive solution for managing local AI workflows.

MachineLearningNotebooks

MachineLearningNotebooks

60%

MachineLearningNotebooks is a GitHub repository offering Python notebooks filled with machine learning and deep learning examples, specifically designed for use with the Azure Machine Learning Python SDK. This resource provides practical demonstrations for various tasks, including building, training, and deploying machine learning models within the Azure ecosystem. While this repository focuses on the v1 SDK, it serves as a valuable historical reference for developers and data scientists working with Azure ML. Users are encouraged to explore the v2 SDK samples repository for the most current and enhanced examples, as this v1 repository is deprecated and no longer actively monitored or updated.

llm-ui

llm-ui

60%

llm-ui is an open-source React library specifically designed for integrating Large Language Models (LLMs) into user interfaces. It simplifies the process of displaying and interacting with LLM outputs, offering features like the ability to add custom components to the LLM's streamed responses. The library also includes throttling to smooth out pauses in streamed output, ensuring a native frame rate rendering experience. It provides robust support for code blocks across various programming languages using Shiki, and its headless nature allows developers to bring their own styles for complete UI customization. This makes llm-ui a flexible solution for developers looking to build dynamic and responsive AI-powered applications.

llm.pdf

llm.pdf

60%

llm.pdf is a proof-of-concept project showcasing the ability to run an entire Large Language Model (LLM) within a PDF file. This innovative approach leverages Emscripten to compile llama.cpp into asm.js, enabling the LLM to execute directly within the PDF environment through an old PDF JS injection method. The entire LLM file is embedded into the PDF using base64 encoding, allowing for self-contained LLM inference. While currently a proof-of-concept, it highlights the potential for highly portable and self-sufficient AI applications. Users can generate custom PDFs with compatible GGUF quantized models, with 135M parameter models taking approximately 5 seconds per token for input/output.

long-context-attention

long-context-attention

60%

long-context-attention, also known as Unified Sequence Parallelism (USP) or Hybrid Sequence Parallelism, offers a novel approach to training and inference for long context Large Language Models (LLMs). This open-source project synergizes the strengths of DeepSpeed-Ulysses-Attention and Ring-Attention, addressing their individual limitations. Ulysses-Attention is sensitive to the number of attention heads and less suitable for GQA/MQA scenarios, while Ring-Attention can be less efficient in computation and communication. LongContextAttention provides a more general, versatile, and performant solution. It supports various FlashAttention versions (v2, v3) and can even run without FlashAttention for NPUs. The tool includes functionalities for setting process groups, extracting local tensors, and offers different ring implementation types like 'zigzag' and 'basic'. It has been verified in Megatron-LM and applied in several other projects, providing a robust solution for researchers and developers working with long context generative AI.

LookaheadDecoding

LookaheadDecoding

60%

LookaheadDecoding is an open-source project designed to significantly accelerate Large Language Model (LLM) inference by breaking the traditional sequential dependency of token generation. This innovative approach utilizes a parallel decoding algorithm, eliminating the need for a draft model or a separate data store. Motivated by Jacobi decoding, LookaheadDecoding collects and caches n-grams from Jacobi iteration trajectories, enabling simultaneous processing of future tokens. The process is divided into a lookahead branch, which generates new n-grams within a defined window, and a verification branch, which validates promising candidates. This method has demonstrated substantial latency reductions, achieving speedups ranging from 1.5x to 2.3x on various datasets and models. The tool supports sampling and FlashAttention, and is implemented with an attention mask to maximize GPU parallel computing power, making it a valuable resource for optimizing LLM performance.

Matterport3DSimulator

Matterport3DSimulator

60%

Matterport3DSimulator is an AI research platform designed for deep reinforcement learning, computer vision, natural language processing, and robotics. It allows AI agents to interact with real 3D environments using visual information derived from panoramic RGB-D images. The simulator is based on the Matterport3D dataset, featuring 90 diverse indoor environments. Key capabilities include outputting real RGB and depth images, customizable image resolution and camera parameters, and support for off-screen rendering. It offers both C++ and Python APIs and is highly efficient, capable of around 1000 fps RGB-D off-screen rendering. The platform also includes the Room-to-Room (R2R) navigation dataset and task for training agents to follow natural language instructions.

Deep-Learning-in-Production

Deep-Learning-in-Production

60%

Deep-Learning-in-Production is a comprehensive GitHub repository curated by ahkarami, designed to serve as a valuable resource for deploying deep learning-based models in production environments. The repository compiles useful notes and references across various deep learning frameworks, including PyTorch, TensorFlow, Keras, and MXNet. It covers essential topics such as model conversion (e.g., PyTorch to C++, Keras to C++), model serving with tools like Flask, TorchServe, and TensorFlow Serving, and deployment on platforms like AWS Lambda and Kubernetes. Additionally, it provides insights into model quantization, speed optimization, and general deep learning deployment toolkits like OpenVINO and NVIDIA Triton Inference Server. The repository also includes resources for front-end and back-end development, mobile/embedded device deployment, and MLOps, making it a holistic guide for machine learning engineers and data scientists looking to operationalize their models.

DeepSeek-V3

DeepSeek-V3

60%

DeepSeek-V3 is a powerful Mixture-of-Experts (MoE) language model featuring 671B total parameters, with 37B activated for each token, ensuring efficient inference and cost-effective training. Building on the DeepSeek-V2 architecture, it introduces an innovative auxiliary-loss-free strategy for load balancing and a multi-token prediction training objective for enhanced performance. The model was pre-trained on 14.8 trillion diverse tokens and further refined through Supervised Fine-Tuning and Reinforcement Learning. DeepSeek-V3 demonstrates superior performance against other open-source models and rivals top closed-source alternatives, particularly excelling in math and code tasks. It supports local deployment on various hardware and open-source community software, including SGLang, LMDeploy, and TensorRT-LLM, with options for FP8 and BF16 inference.

micro_diffusion

micro_diffusion

60%

micro_diffusion is an open-source repository from Sony Research that provides a minimalistic implementation for training large-scale diffusion models from scratch with an extremely low budget. Utilizing only 37 million publicly available real and synthetic images, it can train a 1.16 billion parameter sparse transformer for approximately $1,890, achieving a strong FID score on the COCO dataset. The repository includes training code, dataset code, and pre-trained model checkpoints for off-the-shelf generation. It supports progressive training from low to high resolution and incorporates patch masking for performance optimization and reduced training time.

deepdrive

deepdrive

60%

Deepdrive is an open-source simulator designed to facilitate experimentation and advancement in self-driving AI. It enables anyone with a PC to develop and test state-of-the-art autonomous driving systems within a realistic simulated environment. The simulator supports various AI agent types, including forward-agents, remote agents, and baseline agents like Mnet2 and C++ FSM/PID. Users can record training data for imitation learning, convert data to TFRecords, and train models using provided datasets or their own. Deepdrive offers detailed observation data, including vehicle dynamics, camera feeds (image, depth), and environmental information, all adhering to Unreal Engine conventions for units and rotations. It requires Linux, Python 3.6+, 10GB disk space, and 8GB RAM, with optional GPU requirements for baseline agents.

meshed-memory-transformer

meshed-memory-transformer

60%

Meshed-Memory Transformer (M²) is an open-source project that provides the reference code for the paper "Meshed-Memory Transformer for Image Captioning" presented at CVPR 2020. This tool is designed for researchers and developers working in computer vision and natural language processing. It allows users to set up a conda environment, download necessary data like COCO annotations and detection features, and then evaluate or train their own image captioning models. The repository includes scripts for both testing and training, with configurable arguments for batch size, number of memory vectors, and learning rate scheduling. It requires Python 3.6 and specific data preparation steps to function correctly.

DeepResearcher

DeepResearcher

60%

DeepResearcher is an open-source framework designed to scale deep research by training LLM-based agents using reinforcement learning in real-world web environments. This comprehensive tool facilitates end-to-end training, allowing agents to engage in authentic web search interactions. Qualitative analysis of the framework reveals emergent cognitive behaviors, including the ability to formulate plans, cross-validate information from multiple sources, self-reflect to redirect research, and maintain honesty when definitive answers are unavailable. DeepResearcher demonstrates significant performance improvements over prompt engineering and RAG-based baselines, emphasizing the critical role of end-to-end training in real-world settings for developing robust research capabilities.

EMPRESS

EMPRESS

60%

EMPRESS is an observability platform specifically designed for AI agents, enabling users to track every action an AI agent takes in xAPI format. This comprehensive tracking helps prove compliance with regulations like the EU AI Act, optimize agent performance, and scale AI operations with confidence. The platform records what agents do, why they do it, and the resulting outcomes, providing a full decision history and audit-ready logs. It allows users to search and filter decisions instantly, understand the reasoning behind each action, and export complete audit trails for compliance reports. EMPRESS also offers hundreds of pre-built skills to help users build and deploy agents for various tasks, from account management to content moderation, ensuring explainable decisions and improved agent behavior.

DeepLearningFrameworks

DeepLearningFrameworks

60%

DeepLearningFrameworks is an open-source GitHub repository designed to be a "Rosetta Stone" for deep learning frameworks. Its primary goal is to enable data scientists to easily transfer their expertise from one framework to another by providing common setups and comparisons across different GPUs, CUDA versions, precision levels, and languages (Python, Julia, R). The project includes notebooks demonstrating CNN, DenseNet-121, ResNet-50, and RNN models, along with detailed performance metrics like training times and feature extraction speeds across frameworks such as Caffe2, Chainer, CNTK, MXNet, Keras (with various backends), Tensorflow, Lasagne, PyTorch, and Julia-Knet. It also offers valuable lessons learned regarding API usage, data handling, and performance optimization for various frameworks.

DLTK

DLTK

60%

DLTK (Deep Learning Toolkit) is an open-source Python library designed for medical image analysis, leveraging the TensorFlow framework. It aims to facilitate rapid prototyping of deep learning models and ensure reproducibility in research applications within the medical imaging field. The toolkit provides state-of-the-art methods and models, accelerating research and development. It includes example applications and tutorial notebooks to help users understand its interface with TensorFlow, write custom read functions, and develop their own model functions. DLTK also features a Model Zoo with implementations of current research methodologies.

nmt-keras

nmt-keras

60%

NMT-Keras is an open-source library designed for Neural Machine Translation (NMT) using the Keras framework. It provides implementations of both attentional recurrent neural network NMT models and Transformer NMT models. Key features include multi-GPU training for TensorFlow, Tensorboard integration, and online learning capabilities. The library supports various attention mechanisms like Bahdanau and Luong, along with double stochastic attention. Users can leverage beam search decoding, ensemble decoding, and model averaging for improved translation quality. It also offers support for GRU/LSTM networks, label smoothing, N-best list generation, and unknown words replacement. NMT-Keras facilitates the use of pretrained word embeddings and includes a client-server architecture for web demos, making it suitable for researchers and developers in the machine translation domain.

DiffusionDPO

DiffusionDPO

60%

DiffusionDPO is a code repository from SalesforceAIResearch, offering the training code for "Diffusion Model Alignment Using Direct Preference Optimization." This tool is designed for researchers and developers working with diffusion models, providing scripts adapted from the diffusers library. It supports the alignment of models such as StableDiffusion1.5 and StableDiffusion-XL-1.0, with examples for running training on these models. The repository includes utilities for scoring models using various AI feedback mechanisms like PickScore, HPS, Aesthetics, and CLIP, along with notebooks for visualizing results and comparing generations. It's a valuable resource for those looking to fine-tune and evaluate diffusion models for specific preferences.

DiffusionDrive

DiffusionDrive

60%

DiffusionDrive is a cutting-edge AI agent tool that introduces a novel truncated diffusion model specifically designed for real-time end-to-end autonomous driving. This innovative approach significantly enhances performance, achieving a 10x reduction in diffusion denoising steps, 3.5 times higher PDMS on NAVSIM, and 64% higher mode diversity compared to traditional diffusion policies. Accepted as a CVPR 2025 Highlight, DiffusionDrive demonstrates record-breaking 88.1 PDMS on the NAVSIM benchmark with a ResNet-34 backbone, all while operating at a real-time speed of 45 FPS. It is highly flexible, allowing integration with onboard sensor data and existing perception modules, making it a robust solution for developing advanced autonomous driving systems.

DONNAJAMES B.V.

DONNAJAMES B.V.

60%

DONNAJAMES B.V. offers AI solutions specifically tailored for the notary and financial services sectors, aiming to make AI practical and personal for organizations. The platform emphasizes security and reliability, helping offices accelerate processes, minimize errors, and manage capacity shortages. By supporting employees with AI technology, DONNAJAMES creates more room for quality, assurance, and personalized client contact. The solutions are designed to integrate safely within existing workflows, offering benefits like compliance, training, and up to 25% faster work. It caters to both larger teams and individual professionals, providing direct deployment and user-based payment options for smaller operations.

octnet

octnet

60%

OctNet is an open-source framework designed for deep learning with sparse 3D data, utilizing efficient space partitioning structures known as octrees. This approach significantly reduces the memory and compute requirements of 3D convolutional neural networks, allowing for the development of deep networks at high resolutions. By hierarchically partitioning space and storing pooled feature representations in leaf nodes, OctNet focuses memory allocation and computation on relevant dense regions. This enables deeper networks without sacrificing resolution, making it suitable for tasks such as 3D object classification, orientation estimation, and point cloud labeling. The framework includes core CPU and GPU code for network operations, data pre-processing tools, and a Torch wrapper for full network integration.

encodec

encodec

60%

EnCodec is a state-of-the-art deep learning-based audio codec developed by Facebook Research. It offers high-fidelity neural audio compression for both mono 24 kHz audio and stereo 48 kHz audio. The tool provides two multi-bandwidth models: a causal model for 24 kHz monophonic audio and a non-causal model for 48 kHz stereophonic audio, trained on music-only data. Users can compress audio to various bitrates, ranging from 1.5 kbps to 24 kbps, depending on the model. EnCodec also includes pre-trained language models for further compression without quality loss and can be integrated with Hugging Face Transformers for scalable use. It supports direct command-line usage for compression, decompression, and extracting discrete audio representations.