ShypdShypd.ai
💻

Coding & Development

Browsing page 78 of AI tools for Open Source & Models in Coding & Development. Sorted by confidence score — our independent quality rating.

SwinIR

SwinIR

60%

SwinIR is an official PyTorch implementation of the Swin Transformer model for image restoration. It excels in tasks such as classical, lightweight, and real-world image super-resolution, grayscale and color image denoising, and JPEG compression artifact reduction. The tool's deep feature extraction module, composed of residual Swin Transformer blocks, allows it to outperform state-of-the-art methods while potentially reducing the number of parameters. SwinIR provides interactive online demos, including a Colab demo for real-world image SR and a PlayTorch demo for mobile applications, making it accessible for both research and practical applications.

sygil-webui

sygil-webui

60%

sygil-webui is an open-source, web-based user interface designed for Stable Diffusion, created by Sygil.Dev. It offers a comprehensive platform for generating and enhancing images, featuring built-in image enhancers like GFPGAN and RealESRGAN, as well as various upscalers. Users can benefit from a generator preview, prompt weighting, negative prompts, and sequential seeds for batch generations. The tool also includes advanced functionalities such as an img2img editor with mask and crop capabilities, mask painting, and textual inversion for custom embeddings. It supports both Windows and Linux installations and provides a clean, easy-to-use UI with dynamic live previews and optimized VRAM usage.

swe-rl

swe-rl

60%

SWE-RL is an official codebase for "Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution," designed to scale reinforcement learning-based LLM reasoning for real-world software engineering tasks. It leverages open-source software evolution data and rule-based rewards to improve LLM performance. The codebase includes prompt templates and a flexible reward function API that supports various editing formats, including sequence similarity for search/replace changes and unified diffs. Additionally, SWE-RL features an Agentless Mini component for fast asynchronous inference, code refactoring, file-level localization, and repair, supporting OpenAI-compatible endpoints and Hugging Face models like Llama-3.3-70B-Instruct.

Falcondale

Falcondale

60%

Falcondale specializes in developing applied quantum machine learning and optimization solutions designed to deliver real-world impact. The company focuses on leveraging quantum intelligence to solve complex problems across various industries. Falcondale aims to provide a competitive edge through its advanced quantum technologies, offering solutions that go beyond traditional computational methods. Their expertise lies in translating cutting-edge quantum research into practical, deployable applications for businesses and organizations seeking innovative data analysis and optimization capabilities.

streaming-llm

streaming-llm

60%

StreamingLLM is an innovative open-source framework designed to address the challenges of deploying Large Language Models (LLMs) in streaming applications that require processing infinite-length inputs. It introduces the concept of "attention sinks" to efficiently manage Key and Value (KV) states, allowing LLMs to generalize to infinite sequence lengths without fine-tuning. This approach prevents the performance degradation seen in traditional window attention methods when text length exceeds cache size. StreamingLLM enables models like Llama-2, MPT, Falcon, and Pythia to perform stable and efficient language modeling with millions of tokens, offering up to a 22.2x speedup over sliding window recomputation baselines. It is particularly optimized for scenarios such as multi-round dialogues where continuous operation without extensive memory or dependency on past data is crucial.

streaming-vlm

streaming-vlm

60%

StreamingVLM is an innovative AI tool designed for real-time understanding of effectively infinite video streams. Developed by mit-han-lab, it addresses common challenges in long-video analysis by maintaining a compact KV cache and aligning training directly with streaming inference. This approach efficiently avoids the quadratic cost associated with traditional methods and mitigates the pitfalls of sliding-window techniques. The system is capable of running at up to 8 frames per second (FPS) on a single H100 GPU, offering stable and efficient video processing. It has demonstrated superior performance, winning 66.18% against GPT-4o mini on a new long-video benchmark and also enhances general Video Question Answering (VQA) capabilities without requiring task-specific fine-tuning. The project provides scripts for environment setup, inference, supervised fine-tuning (SFT), and various evaluations including OVOBench and VQA tasks.

synthetic-data-kit

synthetic-data-kit

60%

synthetic-data-kit is a powerful open-source tool developed by Meta Llama for generating high-quality synthetic datasets specifically designed to fine-tune Large Language Models. It streamlines the often complex process of data preparation, allowing users to create reasoning traces, QA pairs, and summaries from various input formats. The tool features a modular 4-command CLI flow: ingest, create, curate, and save-as, enabling users to process individual files or entire directories. It supports different LLM backends like vLLM or external API endpoints and can convert curated data into various fine-tuning formats such as Alpaca, OpenAI fine-tuning format, and ChatML. Additionally, it handles multimodal data, extracting both text and images, and offers intelligent chunking for large documents to maintain context and quality.

trae-agent

trae-agent

60%

Trae Agent is an LLM-based agent designed for general-purpose software engineering tasks, offering a transparent and modular architecture for researchers and developers. It provides a powerful command-line interface (CLI) that can interpret natural language instructions and execute intricate software engineering workflows using various tools and LLM providers. Key features include Lakeview for concise summarization of agent steps, multi-LLM support for providers like OpenAI, Anthropic, and Google Gemini, and a rich tool ecosystem for file editing, bash execution, and sequential thinking. The agent also offers an interactive mode for iterative development, detailed trajectory recording for debugging, and flexible YAML-based configuration. It is easily installed via pip and supports Docker for isolated task execution.

TheAgentCompany

TheAgentCompany

60%

TheAgentCompany is an open-source benchmark designed to evaluate the performance of LLM agents on consequential, real-world tasks within a simulated software company environment. It allows for assessing how well AI agents can accelerate or autonomously perform work-related tasks by interacting with the web, writing code, running programs, and communicating. The platform offers diverse task roles, data types, and a comprehensive scoring system with multiple evaluation methods, including deterministic and LLM-based evaluators. It features simple one-command operations for environment setup and quick system resets, making it an extensible framework for adding new tasks and evaluators. The benchmark is available on GitHub and supports integration with platforms like OpenHands.

textgenrnn

textgenrnn

60%

textgenrnn is a Python 3 module built on Keras/TensorFlow designed for creating character-level recurrent neural networks (char-RNNs). It enables users to easily train text-generating neural networks of any size and complexity on any text dataset. The tool incorporates modern neural network architectures, including attention-weighting and skip-embedding, to accelerate training and enhance model quality. Users can train and generate text at either the character or word level, configure RNN size, layer count, and use bidirectional RNNs. It supports training on generic input text files, including large ones, and allows for GPU-trained models to generate text on a CPU. Additionally, textgenrnn offers a powerful CuDNN implementation for faster GPU training and supports contextual labels for improved learning and results.

table-transformer

table-transformer

60%

Table Transformer (TATR) is a deep learning model developed by Microsoft for extracting tables from unstructured documents, including PDFs and images. Based on object detection, TATR can be trained to work across various document domains, with pre-trained model weights available for the PubTables-1M dataset. The repository also provides the official code for the PubTables-1M dataset, a large-scale dataset for table detection, structure recognition, and functional analysis, and the GriTS evaluation metric for table structure recognition. Researchers and developers can use TATR to detect and recognize tables, convert them to HTML or CSV, and train custom models for specific needs.

TileRT

TileRT

60%

TileRT is an open-source, tile-based runtime engineered for ultra-low-latency Large Language Model (LLM) inference. It aims to push the boundaries of LLM latency without compromising model size or quality, allowing models with hundreds of billions of parameters to achieve millisecond-level time per output token (TPOT). Unlike traditional inference systems optimized for high-throughput batch processing, TileRT prioritizes responsiveness, making it ideal for applications like high-frequency trading, interactive AI, real-time decision-making, and AI-assisted coding. It achieves this by decomposing LLM operators into fine-grained tile-level tasks and dynamically rescheduling computation, I/O, and communication across multiple devices to minimize idle time and improve hardware utilization. TileRT currently supports models like GLM-5 and DeepSeek-V3.2 and offers Multi-Token Prediction (MTP) for efficient longer output generation.

TimeCapsuleLLM

TimeCapsuleLLM

60%

TimeCapsuleLLM is an innovative open-source project focused on creating language models (LLMs) trained exclusively on data from specific historical periods and geographic locations. The primary goal is to mitigate modern biases inherent in contemporary LLMs and accurately emulate the linguistic style, vocabulary, and worldview of a chosen era. The project has developed several versions, including v0, v0.5, v1, and v2, with increasing dataset sizes and model parameters, built on architectures like nanoGPT, Phi 1.5, and llamaforcausallm. It emphasizes Selective Temporal Training (STT) where all training data is curated from a defined historical window, ensuring the model's knowledge and language reflect that period without modern influence. The project provides core training scripts, tokenizer building tools, and detailed documentation for researchers and developers interested in historical language modeling.

tokenizers

tokenizers

60%

tokenizers is an open-source library developed by Hugging Face, offering highly optimized and versatile tokenizers for natural language processing tasks. Implemented primarily in Rust, it boasts exceptional performance, capable of tokenizing a gigabyte of text on a server's CPU in less than 20 seconds. The library supports training new vocabularies and tokenizing text using popular models like Byte-Pair Encoding, WordPiece, and Unigram. It includes features such as alignment tracking during normalization, ensuring that the original sentence segments corresponding to tokens can always be retrieved. Additionally, it handles pre-processing steps like truncation, padding, and adding special tokens required by various models, making it suitable for both research and production environments.

trajectory-transformer

trajectory-transformer

60%

Trajectory Transformer is an open-source code release that implements offline reinforcement learning as a sequence modeling problem. Based on the paper "Offline Reinforcement Learning as One Big Sequence Modeling Problem," this tool provides a framework for training models to predict trajectories. It includes scripts for training transformers on various datasets and for planning with these models. The project also offers pretrained models for multiple datasets, allowing users to quickly experiment and reproduce results. It supports installation via conda or Docker, and provides utilities for running jobs on Azure, making it suitable for researchers and engineers in reinforcement learning and robotics.

texar

texar

60%

Texar is a comprehensive toolkit designed to support a broad range of machine learning tasks, with a particular focus on natural language processing and text generation. Built on TensorFlow, it offers a rich library of modular and easy-to-use ML components and functionalities, enabling both researchers and practitioners to rapidly prototype and experiment with models. Key features include support for pre-trained models like BERT, GPT2, and XLNet, and full customizability at multiple abstraction levels. Texar is versatile, supporting various tasks, models, algorithms, data processing, and evaluation methods, from encoder-decoder architectures to reinforcement learning and adversarial learning. It emphasizes modularity for maximum re-use and clean APIs, based on a principled decomposition of learning, inference, and model architecture. The toolkit also supports distributed model training with multiple GPUs and provides extensive documentation and examples.

torch-template-for-deep-learning

torch-template-for-deep-learning

60%

torch-template-for-deep-learning is an open-source project providing PyTorch implementations of a wide array of classical backbone Convolutional Neural Networks (CNNs), alongside essential tools for deep learning development. It includes various data enhancement techniques like Cutout and Mixup, a collection of torch loss functions such as Focal Loss and Dice Loss, and numerous attention mechanisms including SE Attention and Self Attention. The template also features deployment modes for PyTorch models, conversion utilities from TensorFlow to PyTorch, and Class Activation Mapping (CAM) methods. This comprehensive resource aims to simplify and accelerate the development of deep learning applications by offering readily available and well-structured components.

Vim

Vim

60%

Vim, or Vision Mamba, is an open-source AI tool developed by hustvl for efficient visual representation learning. It leverages a bidirectional state space model (Mamba) to process visual data, offering a novel approach to computer vision tasks. The tool addresses challenges in visual data representation for SSMs, particularly the position-sensitivity of visual data and the need for global context. Vim has demonstrated superior performance on tasks like ImageNet classification, COCO object detection, and ADE20k semantic segmentation, outperforming established vision transformers like DeiT. Notably, it achieves significant improvements in computation and memory efficiency, being 2.8x faster than DeiT and saving 86.8% GPU memory for high-resolution image feature extraction. This makes Vim a promising candidate for next-generation backbones in vision foundation models.

VideoLLaMA2

VideoLLaMA2

60%

VideoLLaMA2 is an open-source project designed to significantly advance spatial-temporal modeling and audio understanding within video-Large Language Models (LLMs). It offers a comprehensive framework for researchers and developers to explore and build upon state-of-the-art video analysis capabilities. The tool provides various pre-trained models, including vision-only and audio-visual checkpoints, supporting tasks such as multi-choice video QA, video captioning, open-ended video QA, and audio-visual QA. It includes detailed instructions for installation, running online and offline demos, and quick-start guides for training and evaluating custom VideoLLaMA2 models using datasets like VideoLLaVA. The project emphasizes its top performance on leaderboards like MLVU and VideoMME for ~7B-sized VideoLLMs.

videollm-online

videollm-online

60%

VideoLLM-online is the official implementation of an Online Video Large Language Model for Streaming Video, presented at CVPR 2024. Unlike traditional models that process full videos offline, VideoLLM-online enables real-time interaction within a video stream, allowing it to proactively update responses based on activity changes or assist with next steps. It features a cheap and scalable method for synthesizing streaming data by transforming offline annotations into dialogue data using open-source LLMs. The inference method is parallelized, combining video encoding, LLM forwarding, and response generation asynchronously, achieving high speeds of 10-15 FPS on an A100 GPU for long-form videos up to 10 minutes. The tool is designed for researchers and developers working with streaming video analysis and real-time multimodal AI.

X-MAS FLUX LORA

X-MAS FLUX LORA

60%

X-MAS FLUX LORA is an AI-powered image generator hosted on Hugging Face, specifically designed to create festive Christmas-themed images. Users can input text descriptions, and the tool will generate high-quality visuals. A notable feature is its ability to translate Korean prompts into English, making it accessible to a broader audience. The application also provides adjustable settings, allowing users to control aspects like image size and level of detail, ensuring more customized outputs. While the tool was previously available, the live website indicates it is currently paused, requiring users to request its restart from the author.

VideoMamba

VideoMamba

60%

VideoMamba is an innovative open-source state space model designed for efficient video understanding, specifically addressing the dual challenges of local redundancy and global dependencies in video data. It adapts the Mamba architecture to the video domain, overcoming limitations found in existing 3D convolution neural networks and video transformers. Its linear-complexity operator enables efficient long-term modeling, which is crucial for processing high-resolution and extended video content. The tool demonstrates scalability in the visual domain without requiring extensive dataset pretraining, thanks to a novel self-distillation technique. It also exhibits sensitivity for recognizing fine-grained short-term actions, superiority in long-term video understanding, and compatibility with multi-modal contexts, setting a new benchmark for comprehensive video analysis.

Thai Sentence Embedding Benchmark

Thai Sentence Embedding Benchmark

60%

Thai Sentence Embedding Benchmark is a specialized AI tool designed to evaluate and rank Thai sentence embedding models. It features a comprehensive leaderboard that showcases the performance of different models across a variety of datasets and tasks relevant to the Thai language. Users can access detailed scores for each model, enabling them to compare and select the most suitable embeddings for their specific natural language processing (NLP) applications. This tool is particularly valuable for AI researchers and NLP engineers who require robust benchmarks for developing and optimizing Thai language models.

tts Text To Speech

tts Text To Speech

60%

tts Text To Speech is a powerful text-to-speech (TTS) tool built on Next-gen Kaldi, available as a Hugging Face Space. It allows users to easily convert written text into spoken audio. The application provides options to select from various languages and TTS models, offering flexibility in voice output. Additionally, users can specify a speaker ID and adjust the speaking speed to customize the generated audio. The tool outputs the spoken text as a WAV audio file and also indicates the duration of the generated audio, making it suitable for a range of applications from content creation to research and development.