Coding & Development
Browsing page 100 of AI tools for Open Source & Models in Coding & Development. Sorted by confidence score — our independent quality rating.
LAW-GPT
LAW-GPT is an open-source Chinese legal large language model designed to provide professional and reliable answers to legal questions. The project aims to make legal assistance accessible to everyone, much like search engines or express delivery services. It is built by fine-tuning ChatGLM-6B LoRA with a 16-bit instruction set. The training data includes existing legal Q&A datasets and high-quality legal text Q&A constructed using self-instruct methods based on legal provisions and real case guidance. This approach significantly enhances the model's performance in the legal domain, improving the reliability and professionalism of its answers by providing legal basis for its responses. The project also includes a retrieval function for more accurate answers.
Hephaestus
Hephaestus is a semi-structured agentic framework designed for building dynamic AI workflows. Unlike traditional frameworks that require predefined instructions for every scenario, Hephaestus allows agents to discover and create tasks based on their findings. It defines logical phase types (e.g., Plan, Implement, Test) and lets agents spawn new tasks in any phase, leading to self-branching workflows. This approach ensures adaptability, as the workflow evolves based on actual discoveries rather than anticipated scenarios. It includes features like real-time monitoring, Kanban board coordination, and dependency tracking, making it ideal for complex software development projects where agents can identify optimizations or bugs and create new work to address them.
konlpy
konlpy is an open-source Python package specifically designed for Korean natural language processing (NLP). It provides essential functionalities for analyzing Korean text, including morphological analysis and part-of-speech tagging. This makes it a valuable tool for developers and researchers who need to process and understand the nuances of the Korean language in their applications or studies. The package is built to be user-friendly, facilitating the integration of advanced NLP capabilities into various projects. Its open-source nature encourages community contributions and ensures continuous development and improvement, making it a robust choice for Korean NLP tasks.
MacBERT
MacBERT is a sophisticated pre-trained language model specifically designed for Chinese Natural Language Processing (NLP). It builds upon the foundational BERT architecture by incorporating a novel Masked and Corrected (Mac) language model pre-training task. This innovative approach aims to mitigate the common 'pre-training-downstream task' inconsistency, a challenge where the [MASK] token used during pre-training is absent in real-world downstream applications. MacBERT addresses this by replacing [MASK] tokens with similar words, derived using a synonyms toolkit based on word2vec similarity. It also integrates Whole Word Masking (WWM) and N-gram masking techniques. The model maintains full compatibility with BERT, allowing for seamless integration into existing NLP workflows without code modification. MacBERT has demonstrated significant performance enhancements across various Chinese NLP tasks, including extractive question answering, natural language inference, sentiment classification, and sentence pair matching.
PrimeAI
PrimeAI specializes in leveraging AI, Machine Learning, and data engineering to provide advanced data insights for businesses. The platform offers services in AI/ML & Data Science, Professional Services, Data & Infrastructure, and Managed Services. PrimeAI helps organizations transform complex datasets into actionable intelligence, build or rebuild cloud-based data warehouses, and supplement technology teams with industry experts. Their solutions are designed to improve efficiency and profitability, with a focus on industries such as Transportation & Logistics, Industrial Services, Travel & Hospitality, Healthcare & Life Sciences, and Retail.
Llama-Chinese
Llama-Chinese is a vibrant open-source community dedicated to advancing Llama large language models, with a strong emphasis on Chinese language optimization. The platform serves as a central hub for developers and enthusiasts, offering a wealth of learning materials, resources, and a collaborative environment to foster the best open-source Llama ecosystem. It supports the development and deployment of Llama models for various applications, including commercial use. The community provides access to pre-trained models like Atom, offers tools for fine-tuning and quantization, and facilitates deployment acceleration. Additionally, it hosts a forum for technical discussions, provides computing resources, and shares diverse datasets, making it an invaluable resource for anyone interested in Chinese AI models.
LLM-quickstart
LLM-quickstart is an open-source guide designed to help users quickly get started with large language models (LLMs). It offers comprehensive resources for both theoretical learning and practical fine-tuning of LLMs. The guide provides detailed instructions for setting up a development environment, including installing necessary software like CUDA Toolkit, Miniconda, and Jupyter Lab. It also outlines hardware requirements, specifically recommending a GPU with at least 16GB of VRAM, such as an NVIDIA Tesla T4. The project includes practical examples and configurations for working with various LLM components and frameworks, making it a valuable resource for those looking to dive into the world of large language models.
MLAPP_CN_CODE
MLAPP_CN_CODE is an open-source GitHub project dedicated to providing a comprehensive Chinese translation of Kevin P. Murphy's influential textbook, "Machine Learning: A Probabilistic Perspective." Beyond just translation, the project also includes Python implementations of the algorithms discussed in the book, making complex concepts more accessible. Users can find code files directly linked to the graphics within the translated articles, facilitating a deeper understanding of the theoretical material through practical application. The project is actively maintained, with recent updates covering topics like deep learning, decision theory, optimization, and information theory, ensuring its relevance and timeliness for students and researchers alike.
pinferencia
Pinferencia is a Python library designed to be the simplest machine learning inference server ever. It allows users to deploy models with just a few lines of code, providing both a GUI and a REST API out-of-the-box. The tool supports various deep learning frameworks like Hugging Face, PyTorch, and TensorFlow, making it versatile for different model types. Pinferencia emphasizes minimal code and transformation, fast deployment, and robust testing with 100% test coverage. It also offers automatic API documentation with an online try-out feature and compatibility with Kserve API, ensuring easy integration with platforms like Kubeflow, TF Serving, Triton, and TorchServe.
RecSys
RecSys is a comprehensive open-source repository dedicated to recommendation systems, computational advertising, and machine learning, with a strong emphasis on Click-Through Rate (CTR) and Conversion Rate (CVR) prediction. It serves as a valuable resource for developers and data scientists interested in these fields, offering a curated collection of learning materials, classic research papers, and practical tools. The repository covers a wide range of topics, from foundational statistical learning models to advanced deep learning architectures, and includes insights from real-world applications at major tech companies like Google, Alibaba, and Facebook. It also features practical code examples and references to significant industry competitions, making it an excellent resource for both theoretical understanding and hands-on implementation.
stable-diffusion-webui-extensions
stable-diffusion-webui-extensions serves as the official extension index for the Stable Diffusion Web UI, providing a centralized repository for users to find and integrate additional functionalities. This open-source project allows developers to submit their extensions, which are then reviewed and added to the index, making them accessible to a wider user base. The platform facilitates the customization and enhancement of stable diffusion workflows by offering a variety of extensions, each tagged for appropriate categorization. It includes guidelines for submitting new extensions, ensuring they are functional and properly described. The index also provides important tags like 'online' for extensions requiring external server connections and 'ads' for those containing advertisements, promoting transparency for users.
Starter-Guide
Starter-Guide, developed by the PKU-DAIR team, is an open-source repository designed to provide a comprehensive guide for beginners in the fields of data management (DM) and artificial intelligence (AI). It consolidates core papers and shared experiences from the team to help newcomers quickly familiarize themselves with cutting-edge areas and build a solid technical foundation. The guide covers various research directions including AI systems, AutoML, Database, AI Agent, Data-Centric ML, Diffusion Models, AI for Science, and Graph. It aims to support users in their learning and research journeys, whether they are just starting out or looking to deepen their understanding.
Whisper-Finetune
Whisper-Finetune is an open-source project designed to fine-tune the Whisper speech recognition model. It offers flexible training options, including support for data with or without timestamps, and even training without speech data. The tool significantly accelerates inference processes and provides versatile deployment capabilities across Web, Windows desktop, and Android platforms. It leverages techniques like Lora for fine-tuning and supports CTranslate2 and GGML for accelerated inference. The project includes detailed instructions for environment setup, data preparation, single and multi-GPU training, model merging, evaluation, and various prediction interfaces, making it a comprehensive solution for customizing and deploying Whisper models.
TransGPT
TransGPT is the first open-source large language model specifically designed for the transportation industry in China. It aims to provide practical value by offering functionalities such as traffic condition prediction, intelligent consultation, public transportation services, traffic planning and design, traffic safety education, management assistance, and accident reporting and analysis. The model also supports autonomous driving assistance systems. TransGPT serves as a general knowledge base for various transportation sectors, including road, bridge, tunnel engineering, highway and waterway transportation, and urban public transport. It is available in two main models: TransGPT-7B and TransGPT-MM-6B, with both text and multimodal capabilities. The project provides training and inference code, along with commercial-use-approved datasets for pre-training and fine-tuning.
tinyflow
Tinyflow is a lightweight, open-source AI agent solution designed as a development component rather than a standalone product. It enables developers to integrate AI agent orchestration capabilities into existing applications. The frontend is built with Web Components, ensuring compatibility with popular frameworks like React, Vue, Angular, and Svelte, as well as native HTML, CSS, and JavaScript. For the backend, Tinyflow supports various languages including Java, Python, and Node.js, with Java backend implementation available and Python/Node.js versions currently under development. This flexibility makes Tinyflow a versatile tool for enhancing traditional applications with advanced AI agent functionalities.
ChatLM-mini-Chinese
ChatLM-mini-Chinese is an open-source project featuring a 0.2B parameter Chinese dialogue model (ChatLM-Chinese-0.2B). It provides comprehensive code for the entire model development lifecycle, including data cleaning, tokenizer training, model pre-training, SFT instruction fine-tuning, and RLHF optimization. The project is designed to be resource-efficient, capable of pre-training on machines with as little as 4GB VRAM and requiring only 512MB VRAM for float16 inference. It also supports downstream task fine-tuning, with an example provided for triplet information extraction. All dataset sources, data cleaning processes, and training procedures are openly shared, making it an excellent resource for researchers and developers working with small-scale Chinese language models.
SAG
SAG, developed by Zleap.AI, is an open-source, SQL-driven RAG engine designed for automatically building knowledge graphs during querying. It transforms raw text into "semantic atomic events" and extracts multi-dimensional "natural language vectors" for each event. Unlike traditional methods, SAG dynamically constructs relationship networks at query time, rather than relying on pre-maintained knowledge graphs. Its core capabilities include automatic understanding of documents, intelligent association through dynamic graph building, precise recall via a three-stage search (Recall → Expand → Rerank), complete traceability of results, and flexible extensibility with custom entity types. SAG is production-ready and suitable for developers, enterprise tech teams, and researchers interested in GraphRAG/RAG+KG.
SakuraLLM
SakuraLLM is an open-source, large language model designed for Japanese to Chinese translation, specifically optimized for light novels and Galgame content. It leverages SFT and RLHF models, incorporating knowledge of universal character and relationship attributes to deliver ACGN-style translations. The project emphasizes offline self-deployment and provides various model sizes, from 1.5B to 32B parameters, built upon Qwen model series. Key features include improved translation accuracy, support for glossaries (GPT dictionaries) to maintain consistency in proper nouns and pronouns, and enhanced retention of control characters. SakuraLLM also offers API support in OpenAI format, making it compatible with various existing translation tools and platforms.
foolbox
Foolbox is a Python library designed to facilitate the creation of adversarial examples that can fool neural networks. Built on EagerPy, it offers native performance across PyTorch, TensorFlow, and JAX, allowing for a unified codebase without duplication. The toolbox provides a comprehensive collection of state-of-the-art gradient-based and decision-based adversarial attacks. It emphasizes type checking to catch bugs early and includes extensive documentation, guides, and tutorials for ease of use. Foolbox is ideal for machine learning researchers and security engineers focused on evaluating and improving the robustness of their models against adversarial attacks.
php-nlp-tools
php-nlp-tools is an open-source collection of Natural Language Processing (NLP) tools specifically designed for PHP 5.3+ environments. It enables developers to integrate advanced text analysis capabilities into their PHP applications. The library includes classification models like Multinomial Naive Bayes and Maximum Entropy, as well as experimental Topic Modeling with Latent Dirichlet Allocation. For text processing, it offers various tokenizers such as WhitespaceTokenizer and PennTreebankTokenizer, alongside stemmers like PorterStemmer and GreekStemmer. Additionally, it provides utilities for similarity calculations (Jaccard Index, Cosine similarity) and optimizers for MaxEnt models, including a fast, parallel gradient descent optimizer written in Go. This comprehensive toolkit is ideal for developers looking to implement NLP features directly within their PHP projects.
self-llm
self-llm is an open-source project by Datawhale China, offering a comprehensive guide for deploying and fine-tuning large language models (LLMs) and multimodal large language models (MLLMs) on Linux environments. Specifically tailored for Chinese users and beginners, it simplifies the process of working with open-source models like LLaMA, ChatGLM, and InternLM. The guide covers essential steps including detailed environment configuration, local deployment, and various fine-tuning methods such as full parameter fine-tuning, LoRA, and ptuning. It also provides instructions for application deployment, including command-line invocation, online demo deployment, and integration with frameworks like LangChain. The project aims to make advanced LLM technology accessible to a broader audience of students and researchers.
Rightsify
Rightsify is at the forefront of developing AI music models by providing synthetic datasets and curated human-created music collections. The platform also specializes in intelligent licensing solutions, ensuring that developers and businesses can legally and effectively integrate AI-generated music into their applications. Rightsify supports the creation and deployment of AI-driven music solutions, helping users navigate the complexities of music rights and data acquisition. This comprehensive approach makes it a valuable resource for anyone looking to leverage AI in music production, background music, or other audio applications, while maintaining legal compliance.
ChatGPTAuthHelper
ChatGPTAuthHelper is a straightforward Chrome extension designed to assist users in logging into ChatGPT. This open-source tool, available on GitHub, streamlines the authentication process, making it easier to access ChatGPT services. Users can download the extension from the Release section, enable developer mode in Chrome, and load the unpacked extension. Once installed, it integrates with services like `token.oaifree.com/auth` to facilitate login. The tool is ideal for individuals who frequently use ChatGPT and are looking for a more convenient way to manage their login, bypassing potential authentication hurdles.
book_DeepLearning_in_PyTorch_Source
book_DeepLearning_in_PyTorch_Source is an open-source GitHub repository containing the source code for a book titled "Deep Learning Principles and PyTorch Practice." This resource is designed to help users understand deep learning concepts and their practical implementation using the PyTorch framework. It covers a wide range of topics, from introductory PyTorch concepts to advanced applications like generative models, transfer learning, and reinforcement learning. The repository includes code examples for tasks such as text classification, image style transfer, and neural machine translation, making it a valuable learning tool for students and developers looking to gain hands-on experience with deep learning in PyTorch.