Coding & Development
Browsing page 37 of AI tools for Open Source & Models in Coding & Development. Sorted by confidence score — our independent quality rating.
TextRecognitionDataGenerator
TextRecognitionDataGenerator is an open-source synthetic data generator designed to create text image samples for training Optical Character Recognition (OCR) software. It allows users to generate custom datasets with various parameters, including different fonts, backgrounds, and text modifications like skewing, blurring, and distortion. The tool supports multiple languages, including non-latin scripts like Chinese and Japanese, and can generate images with handwritten text (experimental). Users can run it via CLI or as a Python module, offering flexibility for integration into training pipelines. It also provides a Docker image for easier deployment, eliminating the need for local installations.
TheoremExplainAgent
TheoremExplainAgent (TEA) is an open-source AI system designed to generate video-based multimodal explanations for Large Language Model (LLM) theorem understanding. It produces long-form Manim videos that visually explain mathematical theorems, demonstrating a deep understanding of the subject matter. This approach helps to uncover reasoning flaws that might be hidden in text-only explanations. The tool provides a comprehensive codebase for researchers, including generation and evaluation scripts. It supports various LLM models for video generation and offers features like Retrieval Augmented Generation (RAG) for enhanced context. TheoremExplainAgent is intended for academic research, particularly in the fields of AI, natural language processing, and educational technology, to advance the capabilities of LLMs in explaining complex mathematical concepts.
textgen
TextGen is an open-source project providing implementations of numerous text generation models, such as LLaMA, ChatGLM, BLOOM, GPT2, BART, T5, and SongNet. It offers comprehensive support for both training and prediction, making it a versatile tool for developers and researchers in natural language processing. Key features include LoRA fine-tuning for GPT models, text augmentation using UDA/EDA, and Seq2Seq models for tasks like translation and summarization. The tool also supports T5 for creative text generation and GPT2 for article generation, alongside SongNet for structured text like poetry. It provides pre-trained models on HuggingFace and detailed usage examples for easy integration and experimentation.
Type Think AI
Type Think AI offers a comprehensive platform designed to integrate and utilize various advanced AI models, including popular ones like Claude, GPT-4o, and Bedrock. This tool aims to simplify and accelerate tasks related to content creation, in-depth research, and tackling complex problems by providing a single, accessible interface for diverse AI capabilities. It empowers users to significantly boost their productivity and leverage cutting-edge artificial intelligence for a wide array of personal and business applications, making advanced AI more manageable and efficient for everyday use.
Time-series-prediction
TFTS (TensorFlow Time Series) is an open-source Python package designed for time series deep learning models, built on TensorFlow and Keras. It offers a comprehensive suite of classical and state-of-the-art deep learning methods for time series tasks, including prediction, classification, and anomaly detection. The tool is highly flexible, allowing users to train models with their own 3D data inputs, whether as NumPy arrays, TensorFlow datasets, or Keras sequences. It supports a variety of models such as seq2seq, wavenet, transformer, rnn, tcn, bert, dlinear, nbeats, informer, and autoformer. TFTS also enables customization, allowing users to build custom models by adding embeddings for categorical variables or custom head layers for specific tasks. It has demonstrated strong performance in competitions, with TFTS-Bert winning 3rd place in KDD Cup 2022 wind power forecasting and TFTS-Seq2seq securing 4th place in Tianchi-ENSO index prediction 2021.
Text-Classification-Pytorch
Text-Classification-Pytorch is an open-source repository offering implementations of several deep learning models for text classification within the PyTorch framework. It covers popular architectures such as Recurrent Neural Networks (RNN), Long Short-Term Memory (LSTM), Attention mechanisms, Convolutional Neural Networks (CNN), and Recurrent Convolutional Neural Networks (RCNN). The project focuses on sentiment analysis as a primary text classification task and includes detailed documentation for each model, making it a valuable resource for both learning and practical application in natural language processing. Users can easily set up and run the models after cloning the repository.
twitter-sentiment-analysis
Twitter Sentiment Analysis is an open-source project hosted on GitHub, providing a framework for performing sentiment analysis on tweet data. It offers implementations of several machine learning and deep learning models, such as Naive Bayes, Support Vector Machines (SVM), Convolutional Neural Networks (CNN), and Long Short-Term Memory (LSTM) networks. The repository is designed for binary classification (positive or negative sentiment) and includes scripts for data preprocessing, statistical analysis, and model training/evaluation. While the original dataset is not releasable due to copyright, the project is easily adaptable for use with other datasets, making it a valuable resource for researchers and developers interested in sentiment analysis.
Tune-A-Video
Tune-A-Video is an open-source tool designed for one-shot tuning of image diffusion models, specifically for text-to-video generation. Developed by showlab, it allows users to fine-tune pre-trained text-to-image diffusion models, such as Stable Diffusion or personalized DreamBooth models, to generate videos from text prompts. The tool is highly efficient, capable of tuning a 24-frame video in approximately 10-15 minutes using an A100 GPU. It supports personalized text-to-video generation by leveraging DreamBooth models, enabling users to create videos featuring specific subjects or styles. Tune-A-Video is ideal for researchers and developers in AI video research and development, offering a flexible and powerful platform for advanced video creation tasks.
Ultimate-Data-Science-Toolkit---From-Python-Basics-to-GenerativeAI
The Ultimate-Data-Science-Toolkit is an extensive open-source educational resource designed to guide users through the fundamentals of Python programming to advanced concepts in data science, machine learning, deep learning, and generative AI. It features detailed modules covering Python basics, data structures, control statements, functions, object-oriented programming, and exception handling. For data analysis, it delves into Numpy, Pandas, data visualization with Matplotlib and Seaborn, and statistical concepts like hypothesis testing. The toolkit also includes practical applications of supervised and unsupervised machine learning algorithms, MLOps, and deep learning with TensorFlow/Keras. Furthermore, it offers case studies and an introduction to generative AI, including transformers, LLMs, LangChain, and RAGs, making it a comprehensive learning path for aspiring data scientists and AI engineers.
VoiceCraft
VoiceCraft is an advanced open-source tool designed for zero-shot speech editing and text-to-speech (TTS) generation. It leverages a token infilling neural codec language model to achieve state-of-the-art performance on diverse, real-world audio data, including audiobooks, internet videos, and podcasts. Users can clone or edit an unseen voice with just a few seconds of reference audio. The tool offers flexible inference options, including Google Colab, Docker, and standalone command-line scripts, making it accessible for various technical skill levels. It also supports model development, training, and finetuning, providing comprehensive capabilities for speech manipulation and synthesis.
Qwiet AI
Qwiet AI by Harness is an advanced application security solution that leverages AI-powered code analysis to identify and remediate vulnerabilities. It offers a single, streamlined scan that replaces separate SAST, SCA, IaC, container, and secrets tools, providing comprehensive visibility into application security. A key differentiator is its AI agents, which generate verified, production-ready, and unit-tested code fixes, significantly reducing remediation time. The platform boasts a 97% industry-leading True Positive rate and aims to reduce false positives by 90%, allowing developers to focus on critical issues. Qwiet AI integrates seamlessly into CI/CD pipelines and IDEs, ensuring security from the start and accelerating the path to secure code.
localGPT-Vision
localGPT-Vision is an end-to-end vision-based Retrieval-Augmented Generation (RAG) system designed to interact with documents using Vision Language Models (VLMs). Users can upload and index PDFs and images, then ask questions about their content, receiving responses along with relevant document snippets. The system leverages Colqwen or ColPali models for retrieval, which embed page images directly to understand visual cues like layout and figures, eliminating the need for complex text extraction. It supports various VLMs including Qwen2-VL-7B-Instruct, LLAMA-3.2-11B-Vision, Pixtral-12B-2409, Molmo-7B-O-0924, Google Gemini, and OpenAI GPT-4o. The tool also features session management, model selection, and persistent indexes, making it a comprehensive solution for visual document analysis.
cheetah
Cheetah is an on-device streaming speech-to-text engine developed by Picovoice, leveraging deep learning for highly accurate and efficient transcription. Designed for privacy, all voice processing occurs locally on the device. It boasts a compact footprint and is computationally efficient, making it suitable for a wide range of platforms including Linux, macOS, Windows, Android, iOS, web browsers (Chrome, Safari, Firefox, Edge), and Raspberry Pi devices. Cheetah supports multiple languages, including English, French, German, Italian, Portuguese, and Spanish, with additional languages available for commercial customers. It provides SDKs for various programming languages and environments, enabling developers to integrate real-time speech-to-text capabilities into their applications.
cherry-studio
Cherry Studio is a desktop client designed for AI productivity, offering smart chat functionalities, autonomous agents, and access to over 300 pre-configured AI assistants. It provides unified access to a diverse range of Large Language Models (LLMs) including major cloud services like OpenAI, Gemini, and Anthropic, as well as web services like Claude, Perplexity, and Poe. The tool also supports local models via Ollama and LM Studio. Key features include multi-model simultaneous conversations, document processing for various formats, WebDAV file management, global search, topic management, and AI-powered translation. Cherry Studio is cross-platform, ready to use without environment setup, and offers customization options like themes.
MGM
MGM (Mini-Gemini) is an official repository for "Mini-Gemini: Mining the Potential of Multi-modality Vision Language Models." This open-source framework supports a series of dense and Mixture-of-Experts (MoE) Large Language Models (LLMs) ranging from 2B to 34B parameters. It is designed to facilitate image understanding, reasoning, and generation concurrently. Built upon the LLaVA framework, MGM also supports LLaMA3-based models. Key features include dual vision encoders for low and high-resolution visual embeddings, patch info mining for detailed region analysis, and an LLM for integrating text with images for both comprehension and generation. The repository provides models, data, and scripts for training and evaluation, making it a comprehensive resource for researchers and developers in multimodal AI.
MegaParse
MegaParse is a powerful and versatile file parser specifically designed for optimal ingestion by Large Language Models (LLMs). It handles a wide range of document types including Text, PDFs, Powerpoint presentations, Excel, CSV, and Word documents, with a core focus on preventing information loss during parsing. The tool is built for speed and efficiency, offering broad file compatibility and open-source availability. MegaParse supports content elements such as tables, TOC, headers, footers, and images. It also features a MegaParse Vision component for multimodal models like GPT-4o and Claude 3.5, allowing for advanced document conversion. Installation is straightforward via pip, and it can be used as an API for seamless integration into existing workflows.
mirascope
Mirascope is an open-source LLM anti-framework designed to simplify interaction with various large language models (LLMs) through a unified interface. It empowers developers to integrate LLM capabilities into their applications using Python and TypeScript. Key features include the ability to call LLMs with simple decorators, retrieve structured output using Pydantic models, and build sophisticated AI agents equipped with tools. Mirascope supports advanced functionalities such as streaming, asynchronous operations, and multi-turn conversations, making it a versatile solution for developing complex AI-driven applications. The project is structured as a monorepo, providing clear separation for its Python and TypeScript implementations, as well as documentation and examples.
Lora Finetuning Guide
Lora Finetuning Guide is an educational resource hosted on Hugging Face Spaces, designed to help users understand and implement LoRA (Low-Rank Adaptation) finetuning. This guide enables individuals to fine-tune generative AI models, such as Stable Diffusion, to integrate specific concepts. Users can provide their own images and a corresponding dataset description to customize a model, resulting in a personalized AI model that has learned the desired concept. It serves as a practical educational tool for those interested in customizing AI models and exploring advanced machine learning techniques.
Brainjar
Brainjar is an AI solutions provider that focuses on integrating human and artificial intelligence to deliver tailored solutions for businesses across various industries. The company specializes in machine learning applications, offering end-to-end AI solutions. Key offerings include intelligent document processing, computer vision, and structured data analysis. Brainjar builds custom AI applications for sectors like medical, finance, government, telecom, and manufacturing, aiming to optimize human capital, drive organizational change, and improve business processes. They emphasize that AI complements human intelligence, freeing up individuals for value-creating tasks rather than replacing them.
deep-learning-illustrated
The deep-learning-illustrated repository on GitHub offers the complete code and Jupyter notebooks that complement the 'Deep Learning Illustrated' book by Jon Krohn, Grant Beyleveld, and Aglaé Bassens. This resource provides a visual and interactive approach to understanding artificial neural networks and deep learning. It covers a wide range of topics from biological and machine vision to natural language processing, generative adversarial networks, and deep reinforcement learning. Users can find step-by-step installation guides and all code examples, making it suitable for those seeking a practical introduction to AI and deep learning implementation. The notebooks are primarily in TensorFlow, with notes on converting to TensorFlow 2.x.
MusicGPT
MusicGPT is an innovative application designed for generating music from natural language prompts. It leverages Large Language Models (LLMs) that run locally, ensuring performant music creation across different platforms without the need for extensive dependencies like Python or complex machine learning frameworks. Currently, it supports MusicGen by Meta, with plans to integrate more music generation models. Users can interact with MusicGPT through a chat-like UI mode, which stores chat history, allows playing generated samples, and generates music in the background. Alternatively, a CLI mode enables direct music generation and playback in the terminal, with configurable sample lengths. It offers flexibility in model selection and GPU usage, though powerful hardware is recommended for larger models.
multi-agent-coding-system
The multi-agent-coding-system is an open-source AI coding system that leverages an orchestrator agent to manage explorer and coder agents. This system is designed for intelligent context sharing, allowing agents to build meaningfully on previous discoveries and eliminate redundant work. It achieved a notable #13 ranking on Stanford's TerminalBench leaderboard, outperforming Claude Code. The orchestrator analyzes tasks, dispatches subagents, verifies changes, and maintains a context store. Explorer agents perform read-only investigations and verifications, while coder agents handle implementation with full system access. The system's smart context sharing and task management ensure efficient and strategic problem-solving, even for complex tasks, by providing agents with precise, relevant information.
funNLP
funNLP is a comprehensive open-source repository dedicated to Chinese Natural Language Processing (NLP). It provides a vast collection of tools, datasets, and resources for various NLP tasks, including sensitive word detection, language identification, phone number and ID extraction, and sentiment analysis. The repository also features extensive linguistic databases such as Chinese and Japanese name libraries, synonym/antonym dictionaries, and various Chinese word vectors. Developers and researchers can leverage funNLP for tasks like text generation, summarization, named entity recognition, and building conversational AI systems. Its diverse offerings make it a valuable resource for advancing NLP research and developing applications in the Chinese language domain.
Mistral 7B Instruct GGUF Run On CPU Basic
Mistral 7B Instruct GGUF Run On CPU Basic is a Hugging Face Space that provides a user-friendly interface to interact with the Mistral 7B Instruct model. This tool is designed for basic text generation on a CPU, making it accessible for experimentation and personal projects without requiring high-end GPUs. Users can input messages and receive AI-generated responses, with options to fine-tune the output's randomness (temperature) and focus (top_p) using intuitive sliders. It functions as a general assistant, capable of various conversational tasks.