Coding & Development
Browsing page 125 of AI tools for Open Source & Models in Coding & Development. Sorted by confidence score — our independent quality rating.
DeepXi
DeepXi is a deep learning framework implemented in TensorFlow 2/Keras, designed for a priori Signal-to-Noise Ratio (SNR) estimation. This tool is primarily used for speech enhancement, noise estimation, and mask estimation, and can also serve as a front-end for robust Automatic Speech Recognition (ASR). It supports various deep neural network architectures, including MHANet, RDLNet, ResNet, ResLSTM, and ResBiLSTM, to efficiently model noisy speech. DeepXi offers both causal and non-causal versions of its models, providing flexibility for different application requirements. It operates on mono/single-channel audio at a standard sampling frequency of 16000 Hz, with configurable window duration and shift. The tool supports common audio codecs like .wav, .mp3, and .flac, and provides pre-trained models and datasets for research and development.
mini-sglang
Mini-SGLang is a compact and high-performance inference framework specifically designed for Large Language Models (LLMs). It serves as a lightweight implementation of SGLang, aiming to simplify the complexities of modern LLM serving systems. With a codebase of approximately 5,000 lines of Python, it functions as both a capable inference engine and a transparent reference for researchers and developers. Key features include advanced optimizations such as Radix Cache for KV cache reuse, Chunked Prefill to reduce peak memory usage, Overlap Scheduling to hide CPU overhead, Tensor Parallelism for multi-GPU scaling, and optimized kernels like FlashAttention and FlashInfer for maximum efficiency. It supports online serving with an OpenAI-compatible API and an interactive shell mode for direct model interaction.
deepdow
deepdow is a Python package designed for portfolio optimization using deep learning techniques. It aims to bridge the gap between forecasting market evolution and solving optimization problems by constructing a pipeline of differentiable layers. The tool allows for the creation of networks where the final layer performs asset allocation, with preceding layers acting as feature extractors. The entire network is fully differentiable, enabling optimization via gradient descent algorithms. deepdow is not focused on active trading strategies but rather on finding allocations to be held over a specific horizon. It integrates differentiable convex optimization via `cvxpylayers`, offers various dataloading strategies, and supports integration with MLflow and TensorBoard. It also provides a range of loss functions, including Sharpe ratio and maximum drawdown, and is extensible for customization with both CPU and GPU support.
Superagent (YC W24)
Superagent offers red team testing for AI agents, designed to identify and prevent data leaks, harmful outputs, and unwanted actions in production systems. It employs specialized attack agents to probe for failures through black-box testing, providing findings, evidence, and remediation guidance. The platform includes Guardrail models to prevent failures at runtime, continuous tests to measure system safety, and a Safety Page to demonstrate compliance and test results to customers. This comprehensive approach helps teams building or deploying AI agents, especially those selling to enterprises or regulated industries, to prove safety and compliance.
DeepLearnToolbox
DeepLearnToolbox is a Matlab/Octave toolbox designed for deep learning research and development. It includes various deep learning models such as Deep Belief Nets (DBN), Stacked Autoencoders (SAE), Convolutional Neural Nets (CNN), Convolutional Autoencoders (CAE), and vanilla Neural Nets (NN). Each model comes with practical examples to guide users through implementation and experimentation. While the toolbox was a valuable resource, it is no longer maintained and is considered outdated. The creator recommends using more modern and actively developed deep learning frameworks like Theano, Torch, or TensorFlow for current projects.
dllm
dLLM is an open-source library designed to bring transparency and reproducibility to the development pipeline of diffusion language models. It offers scalable training pipelines, supporting advanced features like LoRA, DeepSpeed, and FSDP, based on the transformers Trainer. The library also provides unified evaluation pipelines built on lm-evaluation-harness, simplifying inference and customization. dLLM includes minimal training, inference, and evaluation recipes for open-weight models such as LLaDA and Dream, and implements various training algorithms like MDLM (masked diffusion) and BD3LM (block diffusion). It also supports accelerated inference and evaluation with Fast-dLLM, offering cache and confidence-threshold decoding.
DD-AIM
DD-AIM has developed a patented digital circuit (chip) designed to accelerate predictive AI inference at massive scale. This technology supports thousands of simultaneous models and millions of inference runs in real-time. The chip focuses on the inference side of deep learning, allowing for continuous monitoring, evaluation, and forecasting of real-world systems. It features self-learning, self-optimizing, and self-correcting capabilities for hardware and model errors, with dynamic memory management and computational techniques. The architecture is simplified for lower manufacturing and deployment costs, emphasizing low energy consumption, small size, and minimal heat generation (SWaP-C2). DD-AIM targets applications in defense, healthcare, finance, and retail sectors.
DiffusionCLIP
DiffusionCLIP is an official PyTorch implementation for text-guided image manipulation using diffusion models, as presented in the CVPR 2022 paper. It addresses limitations of GAN-inversion methods by leveraging the full inversion capability and high-quality image generation of diffusion models. The tool allows for zero-shot image manipulation guided by text prompts, even for diverse real images from datasets like ImageNet. Key features include novel sampling strategies for fine-tuning, accurate in- and out-of-domain manipulation, and a unique noise combination method for straightforward multi-attribute manipulation. It supports fine-tuning for various image types like human faces, churches, bedrooms, and dog faces, and provides a Colab notebook for inference and application.
nn_vis
nn_vis is an open-source project designed for processing and rendering neural networks to visualize their architecture and parameters. Developed as part of a master's thesis, it introduces a novel 3D visualization technique that declutters complex models. The tool estimates attributes for trained neural networks using established optimization methods like batch normalization, fine-tuning, and feature extraction to determine the importance of different network parts. It combines these importance values with techniques such as edge bundling, ray tracing, 3D impostors, and special transparency to create a comprehensive 3D model. nn_vis supports both 2D and VR visualization, allowing users to gain insights into model behavior, especially regarding generalization based on edge proximity. It also provides a GUI for controlling shader parameters and processing settings, enabling customization of the visualization.
NeuralNetwork.NET
NeuralNetwork.NET is a .NET Standard 2.0 library for building neural networks, inspired by TensorFlow and developed entirely in C# 7.3. It enables developers to create sequential and computation graph neural networks with customizable layers. The library offers simple APIs for rapid prototyping, allowing users to define and train models using stochastic gradient descent, as well as save and load network models. A key feature is its GPU support via cuDNN, which significantly enhances performance for training and using neural networks. While no longer actively maintained, it serves as a robust foundation for .NET developers looking to implement machine learning models and custom AI applications, particularly those familiar with C# and .NET environments.
FastPhotoStyle
FastPhotoStyle is an open-source photo editing tool developed by NVIDIA, designed for photorealistic image stylization. It allows users to transfer the artistic style from a 'style photo' to a 'content photo' using deep learning techniques. The underlying algorithm is detailed in an ECCV 2018 paper, offering a closed-form solution for image stylization. The tool is licensed under CC BY-NC-SA 4.0, making it suitable for research and development in computer vision and graphics. It provides various scripts for demonstration, model downloading, and processing stylization, including options for segmentation-aware stylization.
Federated-Learning-PyTorch
Federated-Learning-PyTorch provides an open-source implementation of the vanilla federated learning paradigm, as described in the paper 'Communication-Efficient Learning of Deep Networks from Decentralized Data'. This tool is built using PyTorch and allows researchers and developers to conduct experiments on popular datasets such as MNIST, Fashion MNIST, and CIFAR10. It supports both independent and identically distributed (IID) and non-IID data distributions, with options for equal or unequal data splits among users. The implementation focuses on simple models like MLP and CNN to illustrate the effectiveness of federated learning, making it a valuable resource for understanding and experimenting with this distributed machine learning approach.
facenet
facenet offers a TensorFlow-based implementation for face recognition, drawing inspiration from the "FaceNet: A Unified Embedding for Face Recognition and Clustering" paper and ideas from Oxford's "Deep Face Recognition." The project is open-source and available on GitHub, providing a robust framework for developers and researchers. It includes pre-trained models, supports various training datasets like CASIA-WebFace and VGGFace2, and incorporates face alignment using MTCNN for improved accuracy. The tool is compatible with TensorFlow r1.7 and Python 2.7/3.5, making it accessible for those working with these environments. It also features a flexible input pipeline and continuous integration for reliable development.
facial-expression-recognition-using-cnn
facial-expression-recognition-using-cnn is an open-source project designed for deep facial expression recognition using Convolutional Neural Networks (CNN) with OpenCV and TensorFlow. It can analyze facial expressions from both static images and real-time camera streams, categorizing them into emotions like Angry, Disgust, Fear, Happy, Sad, Surprise, and Neutral. The tool allows for training models on datasets like Fer2013, optimizing hyperparameters, and evaluating performance. It supports the integration of additional features such as face landmarks and HOG features to improve accuracy, providing a robust framework for researchers and developers interested in emotion detection and facial analysis.
Sciloop
Sciloop specializes in creating high-quality, expert-crafted evaluation and training data for advanced AI models, particularly focusing on reasoning capabilities. Their network comprises Olympiad medalists and top researchers across mathematics, physics, chemistry, biology, and computer science, ensuring the data is built to challenge frontier models. Sciloop offers various data types including benchmark sets, Supervised Fine-Tuning (SFT) data, Reinforcement Learning from Human Feedback (RLHF) data, and reward modeling data. The platform also allows domain experts to contribute by solving complex problems, earning compensation for their submissions, and provides a continuous data streaming service for AI labs.
garak
garak is an open-source LLM vulnerability scanner designed to identify and assess weaknesses in large language models. It probes for a wide range of issues including hallucination, data leakage, prompt injection, misinformation, toxicity generation, and jailbreaks. Inspired by tools like nmap and Metasploit Framework, garak focuses on making LLMs or dialog systems fail through static, dynamic, and adaptive probes. It supports various LLM platforms such as Hugging Face, OpenAI, AWS Bedrock, Replicate, Cohere, and Groq, and can be installed via pip or cloned from source. The tool provides detailed logging and reporting in JSONL format, helping developers and researchers understand and mitigate risks in their AI models.
Pose-Transfer
Pose-Transfer is an open-source project providing the code for person image generation, implementing the Progressive Pose Attention method detailed in a CVPR19 paper. This tool allows users to transfer poses from one image to another, and also supports generating videos from a single input image. It offers functionalities for data preparation, including dataset splitting and keypoint annotation for datasets like Market1501 and DeepFashion. Users can train and test models, and evaluate performance using metrics such as SSIM, IS, DS, and PCKh. The project is built on PyTorch and provides pre-trained models for convenience.
rq-vae-transformer
rq-vae-transformer is the official open-source implementation of "Autoregressive Image Generation using Residual Quantization" (CVPR 2022). This framework, consisting of RQ-VAE and RQ-Transformer, is designed for autoregressive modeling of high-resolution images. It precisely approximates feature maps and represents images as stacks of discrete codes, facilitating the generation of high-quality images. The tool supports image generation using both class and text conditions, with pretrained checkpoints available for various datasets including FFHQ, LSUN, ImageNet, and CC-3M. It also includes a large-scale RQ-Transformer for text-to-image generation, trained on millions of text-image pairs. The repository provides code for training and evaluation pipelines, as well as Jupyter notebooks for easy text-to-image generation.
self-attention-cv
Self-attention-cv is an open-source repository offering implementations of diverse self-attention mechanisms specifically tailored for computer vision applications. Built in PyTorch, it leverages `einsum` and `einops` for efficient and flexible module creation. The repository serves as an ongoing collection of building blocks, enabling developers to integrate advanced attention models into their projects. It supports a range of computer vision tasks, including image recognition and segmentation, with examples for Multi-head attention, Axial attention, Vision Transformers (ViT), and TransUnet. It also includes various positional embedding implementations.
StyleSwin
StyleSwin is an official implementation of a transformer-based Generative Adversarial Network (GAN) designed for high-resolution image generation, as presented at CVPR 2022. It leverages a Swin transformer within a style-based architecture, incorporating local and shifted window attention for computational efficiency and modeling capacity. A key innovation is the double attention mechanism, which combines local and shifted window contexts to enhance generation quality. StyleSwin also addresses the challenge of spatial coherency in high-resolution synthesis by employing a wavelet discriminator to suppress blocking artifacts. The tool demonstrates superior performance over prior transformer-based GANs, particularly at resolutions like 1024x1024, achieving competitive results with StyleGAN on datasets such as CelebA-HQ and FFHQ.
MergeFund
MergeFund is a platform designed to change how work gets done, particularly in the open-source and project-based work economy. It connects companies with a vetted network of developers, designers, and researchers, enabling outcome-based work where payment is made only upon validated completion of deliverables. The platform supports both open-source and closed-source projects, offering features like bounty posting, a vetted contributor network, project dashboards for management, and flexible payment options including fiat and cryptocurrency. MergeFund aims to address the open-source funding crisis by allowing communities to fund repositories and maintainers to create bounties, ensuring contributors are compensated for their work without hourly tracking.
tennis_analysis
Tennis_analysis is an open-source project designed to analyze tennis players and ball movements within video footage. It leverages advanced computer vision techniques, including YOLO v8 for player detection and a fine-tuned YOLO model for tennis ball detection. Additionally, the tool utilizes Convolutional Neural Networks (CNNs) to accurately extract court keypoints, providing a comprehensive understanding of on-court activity. This project is ideal for individuals looking to enhance their machine learning and computer vision skills through a practical, hands-on application. It measures player speed, ball shot speed, and the total number of shots, offering valuable insights for performance analysis.
keras2cpp
keras2cpp is an open-source project designed to facilitate the porting of Keras neural network models into pure C++ code. This tool is particularly useful for developers who need to deploy Keras models in environments where C++ is the preferred or required language. It stores both the neural network's weights and architecture in plain text files, ensuring transparency and ease of inspection. While initially prepared to support simple Convolutional networks, such as those found in MNIST examples, its design allows for easy extension to accommodate more complex architectures. The current implementation includes ReLU and Softmax activations and is compatible with the Theano backend, providing a robust solution for integrating Keras models into C++ applications.
torchMoji
torchMoji is an open-source PyTorch implementation of the DeepMoji model, designed for advanced sentiment, emotion, and sarcasm analysis in text. Trained on 1.2 billion tweets with emojis, it excels at understanding nuanced emotional content. The tool provides capabilities for extracting emoji predictions, converting text into 2304-dimensional emotional feature vectors, and fine-tuning the model for transfer learning on new datasets. It's ideal for researchers and developers looking to integrate sophisticated emotional intelligence into their applications, offering a robust foundation for various text modeling tasks. The project includes examples and scripts to facilitate easy adoption and experimentation.