Coding & Development
Browsing page 127 of AI tools for Open Source & Models in Coding & Development. Sorted by confidence score — our independent quality rating.
tacotron
Tacotron is a TensorFlow-based open-source project providing an implementation of the Tacotron text-to-speech synthesis model. It enables developers and researchers to train and experiment with fully end-to-end speech synthesis. The tool supports multiple speech datasets, including the LJ Speech Dataset, Nick Offerman's Audiobooks, and the World English Bible, offering flexibility for different training needs. It provides a well-documented framework, outlining requirements, data preparation steps, training procedures, and sample synthesis. Key features include gradient clipping, Noam style warmup and decay, and bucketed training batches, making it a robust platform for advanced speech synthesis research and development.
susi_gassistantbot
susi_gassistantbot is an open-source project designed to integrate SUSI AI with Google Assistant, enabling developers to create custom voice-controlled applications and AI agents. The project provides a framework for building functionalities on Google Assistant using the SUSI AI platform. It requires setting up a project on Google's Actions console, configuring API.AI (now Dialogflow) with intents and webhooks, and deploying the application to a platform like Heroku. This tool is ideal for developers looking to extend Google Assistant's capabilities with custom AI logic from SUSI, offering a flexible way to build interactive voice experiences.
text-summarization-tensorflow
text-summarization-tensorflow is an open-source project providing a TensorFlow implementation of text summarization. It utilizes a seq2seq library with an encoder-decoder model, incorporating an attention mechanism for improved performance. The tool initializes word embeddings using Glove pre-trained vectors and employs LSTM cells for both encoding and decoding processes. It supports training with custom datasets and offers options for configuring hyperparameters such as network size, depth, beam width, and learning rate. Users can also test the model with pre-trained weights and evaluate performance using ROUGE metrics. This tool is ideal for researchers and students looking to understand and experiment with text summarization techniques.
tensorforce
Tensorforce is an open-source deep reinforcement learning framework built on TensorFlow, designed for both research and practical applications. It stands out for its modular, component-based design, allowing for highly configurable feature implementations. A key differentiator is the separation of the RL algorithm from the application, making algorithms agnostic to input and output structures. The entire reinforcement learning logic, including control flow, is implemented in TensorFlow, enabling portable computation graphs. It supports a wide range of features including various network layers, memory types, policy distributions, reward estimation, training objectives, and optimization algorithms. Tensorforce also offers extensive exploration techniques, preprocessing options, and regularization methods, making it a versatile tool for developing and training reinforcement learning agents.
TransNetV2
TransNetV2 is an open-source neural network designed for fast and effective shot boundary detection in videos. This repository provides the code for TransNet V2, an advanced deep network architecture that significantly improves upon previous methods for identifying shot transitions. It is particularly useful for tasks like video editing and content analysis, enabling automated segmentation of video content. The project includes resources for both inference and training, with a PyTorch version available for inference. While training datasets can be large, users can leverage pre-trained models and instructions in the inference folder to detect shots in their own videos without needing to retrain the network.
trfl
TRFL (pronounced "truffle") is an open-source library developed by Google DeepMind, designed to simplify the implementation of Reinforcement Learning (RL) agents using TensorFlow. It offers a collection of essential building blocks and loss functions, such as Q-learning, that are crucial for developing and experimenting with various RL algorithms. The library integrates seamlessly with existing TensorFlow environments, allowing developers to leverage its powerful computational graph capabilities. TRFL does not list TensorFlow as a direct requirement, giving users flexibility to install specific CPU or GPU versions, along with TensorFlow Probability, separately. This modular approach makes it a valuable resource for researchers and practitioners in the field of AI and machine learning.
UniAnimate
UniAnimate is an open-source framework designed to enable efficient and long-term human video generation using unified video diffusion models. It addresses limitations in existing techniques by mapping reference images, posture guidance, and noise video into a common feature space, reducing optimization burden and ensuring temporal coherence. The tool supports a unified noise input for random or first-frame conditioned input, enhancing long-term video generation capabilities. UniAnimate also explores an alternative temporal modeling architecture based on state-space models to replace computation-consuming temporal Transformers, allowing for the generation of highly consistent videos up to one minute in length by iteratively employing a first-frame conditioning strategy. It provides code and models for human image animation, including features for pose alignment and generating video clips at various resolutions.
tkDNN
tkDNN is a specialized Deep Neural Network library engineered for high-performance inference on NVIDIA Jetson Boards, including TK1, TX1, TX2, AGX Xavier, and Nano. Built upon cuDNN and TensorRT primitives, its core objective is to maximize inference speed on NVIDIA hardware. The library supports various deep learning tasks such as 2D/3D object detection, tracking, semantic segmentation, and monocular depth estimation. While it excels at inference, tkDNN does not support model training. It provides detailed FPS and mAP results for popular models like YOLOv3/v4 and MobileNetV2 SSD across different NVIDIA platforms, showcasing its optimization capabilities for embedded systems.
TinyChatEngine
TinyChatEngine is an open-source library designed for efficient on-device inference of Large Language Models (LLMs) and Visual Language Models (VLMs). It allows users to run these advanced AI models directly on edge devices such as laptops, cars, and robots, ensuring instant responses and enhanced data privacy by keeping processing local. The engine leverages sophisticated LLM model compression techniques, including SmoothQuant and AWQ (Activation-aware Weight Quantization), to optimize performance for low-precision models. It boasts universal compatibility across x86, ARM, and CUDA platforms, featuring a from-scratch C/C++ implementation with no external library dependencies. TinyChatEngine is recognized for its high performance, achieving real-time inference on various devices, and is designed for ease of use, requiring only download, compilation, and deployment.
voxtral.c
voxtral.c is a pure C implementation of the inference pipeline for the Mistral AI's Voxtral Realtime 4B speech-to-text model, designed for real-time speech recognition. It boasts zero external dependencies beyond the C standard library, making it highly portable and efficient. The tool supports various input methods, including WAV files, live microphone input (macOS), and streaming audio from stdin, allowing for transcription of virtually any audio format via ffmpeg. Key features include Metal GPU acceleration for Apple Silicon, streaming output of tokens as they are generated, a streaming C API for incremental audio processing, and memory-mapped BF16 weights for near-instant loading. It also incorporates a chunked encoder and rolling KV cache to manage memory usage efficiently, enabling unlimited-length audio transcription.
vjepa2
vjepa2 is an open-source project from Facebook AI Research (FAIR) providing PyTorch code and models for V-JEPA 2 and V-JEPA 2.1, self-supervised learning approaches for video. These models are pre-trained on internet-scale video data to achieve state-of-the-art performance in motion understanding and human action anticipation tasks. V-JEPA 2.1 further refines the training recipe to learn high-quality and temporally consistent dense features, leveraging dense predictive loss, deep self-supervision, and multi-modal tokenizers. The project also includes V-JEPA 2-AC, a latent action-conditioned world model for robot manipulation tasks, demonstrating capabilities like reaching, grasping, and pick-and-place without extensive environment-specific data. It offers pretrained checkpoints and easy integration via PyTorch Hub and HuggingFace.
Appsmith
Appsmith is an open-source low-code platform designed for rapidly building custom internal tools and AI agents. It empowers developers to connect to diverse data sources and APIs, enabling the creation of dashboards, admin panels, and operational apps with minimal coding. The platform allows users to build, deploy, and manage AI agents quickly and securely, offering agent templates for various functions like HR, sales, and support. These agents can operate across different tools and systems, automating repetitive tasks and providing accurate, cited responses by connecting to real-time company data. Appsmith emphasizes enterprise-grade security, offering self-hosting options, robust access controls, and integration with SSO providers, making it suitable for secure AI deployments.
MARLlib
MARLlib is a comprehensive, open-source library designed for Multi-agent Reinforcement Learning (MARL), leveraging Ray and its RLlib toolkit. It offers a unified platform for researchers and developers to create, train, and evaluate MARL algorithms across a wide array of tasks and environments. Key features include support for all task modes (cooperative, collaborative, competitive, mixed), a Gym-like interface for multi-agent environments, and flexible parameter-sharing strategies. MARLlib provides 18 pre-built algorithms with an intuitive API, making it accessible even for those new to MARL. Users can customize model architectures, policy sharing, and access over a thousand released experiments. It is compatible with Linux operating systems and offers step-by-step installation or Docker-based usage.
cnn-text-classification-pytorch
cnn-text-classification-pytorch is an open-source implementation of Convolutional Neural Networks (CNNs) for sentence classification, built using PyTorch. This tool is based on the model described in Kim's influential paper on CNNs for Sentence Classification. It offers a practical framework for developers to perform text classification tasks, providing consistent results with the original research. The implementation has been updated to be compatible with modern PyTorch versions (2.0+), removing deprecated dependencies like `torchtext` and fixing various runtime errors. It supports datasets like MR and SST, includes options for different optimizers (Adam, Adadelta), and allows for easy training, testing, and prediction of text sentiment.
MM-EUREKA
MM-EUREKA is a cutting-edge project exploring the frontiers of multimodal reasoning through rule-based reinforcement learning. It introduces powerful models such as MM-Eureka-Qwen-7B and MM-Eureka-Qwen-32B, which significantly advance performance in multidisciplinary K12 and mathematical reasoning tasks. The project has iterated on model architecture, algorithms, and data, moving from InternVL to the more robust Qwen2.5-VL base models. Key improvements include enhanced online filtering, adaptive online rollout adjustment (ADORA), and novel RL algorithms like Clipped Policy Gradient Optimization with Policy Drift (CPGD). MM-EUREKA also open-sources a comprehensive pipeline, including self-collected MMK12 datasets, to foster further research and development in multimodal AI.
deepgaze
Deepgaze is an open-source computer vision library designed for human-computer interaction, providing advanced capabilities for analyzing human behavior through visual data. It leverages Convolutional Neural Networks (CNNs) for precise head pose and gaze direction estimation, which is crucial for understanding a person's focus of attention, even when eyes are obscured or far from the camera. Beyond CNN-based estimation, Deepgaze incorporates features like skin detection via backprojection, robust motion detection and tracking, and saliency map generation using the FASA algorithm. Built on OpenCV and TensorFlow, it offers optimized, state-of-the-art algorithms, making complex implementations accessible with just a few lines of code for both beginners and advanced users in computer vision and machine learning.
ms-swift
ms-swift is a comprehensive, open-source framework developed by the ModelScope community, designed for fine-tuning and deploying large language models (LLMs) and multimodal large models (MLLMs). It supports over 600 text-only LLMs and 400 MLLMs, offering full-pipeline capabilities from training to inference, evaluation, quantization, and deployment. The framework integrates advanced training technologies, including Megatron parallelism (TP, PP, CP, EP) for acceleration and a rich family of GRPO reinforcement learning algorithms. ms-swift also supports various fine-tuning methods like LoRA, QLoRA, and DoRA, and provides memory optimization techniques such as Flash-Attention 2/3. It offers a Web-UI interface for simplified training, inference, evaluation, and quantization workflows, making it accessible for a wide range of users.
FireRedASR
FireRedASR is a family of open-source, industrial-grade automatic speech recognition (ASR) models developed by FireRedTeam. It provides robust support for Mandarin, various Chinese dialects, and English, setting new state-of-the-art benchmarks for Mandarin ASR. A key differentiator is its outstanding capability in recognizing singing lyrics. The tool offers two main variants: FireRedASR-LLM, designed for SOTA performance and seamless end-to-end speech interaction using an Encoder-Adapter-LLM framework, and FireRedASR-AED, which balances high performance with computational efficiency through an Attention-based Encoder-Decoder architecture. It also includes modules for VAD, LID, and Punc, making it a comprehensive ASR system.
lit-llama
Lit-LLaMA offers an independent and open-source implementation of the LLaMA language model, building upon nanoGPT. It is designed to be simple, numerically correct, optimized for various hardware, and fully open-source under the Apache 2.0 license. The tool supports advanced features like flash attention, Int8 and GPTQ 4bit quantization for efficient memory usage, and LoRA and LLaMA-Adapter fine-tuning for adapting models to specific datasets. While this repository is no longer actively maintained, it serves as a foundational project, with its successor being the Lit-GPT project. It enables users to generate text, finetune models on custom data, and even venture into pre-training on large datasets like RedPajama.
neural-compressor
Intel Neural Compressor is an open-source Python library developed by Intel, offering advanced model compression techniques for deep learning frameworks like PyTorch, TensorFlow, and JAX. It supports a wide range of low-bit quantization methods, including INT8, FP8, MXFP8, INT4, MXFP4, and NVFP4, as well as sparsity. The library is designed to optimize the performance of Large Language Models (LLMs) and Vision-Language Models (VLMs) on Intel hardware such as Xeon Scalable Processors, Core Ultra Processors, and Gaudi AI Accelerators, with limited support for AMD and ARM CPUs, and NVIDIA GPUs. Key features include Static Quantization, Dynamic Quantization, SmoothQuant, Weight-Only Quantization, and Quantization-Aware Training, making it a comprehensive solution for deploying efficient AI models.
OpenML
OpenML is a collaborative online machine learning platform designed to facilitate the sharing and organization of data, machine learning algorithms, and experimental results. It aims to create a frictionless, networked ecosystem where scientists and practitioners can easily integrate their existing processes and tools to collaborate globally. The platform provides significant benefits for science by enabling rapid building upon others' results, answering complex questions quickly through prior experiments, and making larger studies feasible. For scientists, it saves time on routine duties, compares new experiments to the state of the art, and offers potential for new discoveries and publications. OpenML also serves as a valuable learning environment for students and citizen scientists, allowing them to explore state-of-the-art methods and contribute their own work.
PyTorch-BayesianCNN
PyTorch-BayesianCNN provides an implementation of Bayesian Convolutional Neural Networks (CNNs) with variational inference, specifically utilizing Bayes by Backprop, within the PyTorch framework. This tool allows researchers and developers to build CNNs that can infer intractable posterior probability distributions over weights, offering a significant advantage over traditional frequentist approaches by providing uncertainty estimations. It includes two types of Bayesian layer implementations: BBB (Bayes by Backprop) and BBB_LRT (Bayes by Backprop with Local Reparametrization Trick), which enhances sampling efficiency. The repository supports standard datasets like MNIST, CIFAR10, and CIFAR100, and includes implementations of common models such as AlexNet and LeNet, making it a valuable resource for experimenting with Bayesian deep learning and understanding model uncertainty.
pytorch_active_learning
pytorch_active_learning is an open-source PyTorch library designed for active learning, accompanying the "Human-in-the-Loop Machine Learning" book. It offers a range of active learning methods, including Least Confidence, Margin of Confidence, Ratio of Confidence, and Entropy sampling. The library also supports more advanced techniques like Model-based Outlier sampling, Cluster-based sampling, and various forms of Active Transfer Learning. It is suitable for researchers and practitioners looking to experiment with and apply active learning strategies in computer vision and natural language processing, with a focus on real-world diversity to avoid bias. The code is stand-alone and can be easily integrated with existing PyTorch installations.
SimGNN
SimGNN is a PyTorch implementation of a novel neural network approach designed for fast graph similarity computation, as detailed in the WSDM 2019 paper. It addresses the computational burden of traditional methods like Graph Edit Distance (GED) and Maximum Common Subgraph (MCS) while maintaining high performance. The tool employs a learnable embedding function to map graphs into embedding vectors, providing a global summary. A key feature is its attention mechanism, which emphasizes important nodes for specific similarity metrics. Additionally, SimGNN includes a pairwise node comparison method to supplement graph-level embeddings with fine-grained node-level information. This approach leads to better generalization on unseen graphs and offers quadratic time complexity in the worst case. Experimental results demonstrate its effectiveness and efficiency, achieving smaller error rates and significant time reductions compared to existing baselines.