ShypdShypd.ai
📚

Research & Education

Browsing page 92 of AI tools for Academic Research in Research & Education. Sorted by confidence score — our independent quality rating.

MARLlib

MARLlib

59%

MARLlib is a comprehensive, open-source library designed for Multi-agent Reinforcement Learning (MARL), leveraging Ray and its RLlib toolkit. It offers a unified platform for researchers and developers to create, train, and evaluate MARL algorithms across a wide array of tasks and environments. Key features include support for all task modes (cooperative, collaborative, competitive, mixed), a Gym-like interface for multi-agent environments, and flexible parameter-sharing strategies. MARLlib provides 18 pre-built algorithms with an intuitive API, making it accessible even for those new to MARL. Users can customize model architectures, policy sharing, and access over a thousand released experiments. It is compatible with Linux operating systems and offers step-by-step installation or Docker-based usage.

MM-EUREKA

MM-EUREKA

59%

MM-EUREKA is a cutting-edge project exploring the frontiers of multimodal reasoning through rule-based reinforcement learning. It introduces powerful models such as MM-Eureka-Qwen-7B and MM-Eureka-Qwen-32B, which significantly advance performance in multidisciplinary K12 and mathematical reasoning tasks. The project has iterated on model architecture, algorithms, and data, moving from InternVL to the more robust Qwen2.5-VL base models. Key improvements include enhanced online filtering, adaptive online rollout adjustment (ADORA), and novel RL algorithms like Clipped Policy Gradient Optimization with Policy Drift (CPGD). MM-EUREKA also open-sources a comprehensive pipeline, including self-collected MMK12 datasets, to foster further research and development in multimodal AI.

deepmd-kit

deepmd-kit

59%

DeePMD-kit is a Python/C++ package designed to facilitate the creation of deep learning-based models for interatomic potential energy and force fields, and to perform molecular dynamics simulations. It addresses the accuracy-versus-efficiency dilemma in molecular simulations by leveraging deep learning. The package is highly modularized and interfaces with popular deep learning frameworks like TensorFlow, PyTorch, JAX, and Paddle, as well as high-performance classical and quantum MD packages such as LAMMPS, i-PI, and GROMACS. It implements the Deep Potential series models, which have been successfully applied to various systems, including organic molecules, metals, and semiconductors. DeePMD-kit also supports MPI and GPU for efficient parallel and distributed computing, making it suitable for complex scientific research.

ms-swift

ms-swift

59%

ms-swift is a comprehensive, open-source framework developed by the ModelScope community, designed for fine-tuning and deploying large language models (LLMs) and multimodal large models (MLLMs). It supports over 600 text-only LLMs and 400 MLLMs, offering full-pipeline capabilities from training to inference, evaluation, quantization, and deployment. The framework integrates advanced training technologies, including Megatron parallelism (TP, PP, CP, EP) for acceleration and a rich family of GRPO reinforcement learning algorithms. ms-swift also supports various fine-tuning methods like LoRA, QLoRA, and DoRA, and provides memory optimization techniques such as Flash-Attention 2/3. It offers a Web-UI interface for simplified training, inference, evaluation, and quantization workflows, making it accessible for a wide range of users.

lmm-r1

lmm-r1

59%

LMM-R1 is an open-source project designed to enhance the reasoning capabilities of 3B Large Multimodal Models (LMMs) by extending the OpenRLHF framework. It addresses the challenges of limited parameter capacity and scarce high-quality multimodal reasoning data through a novel two-stage rule-based RL approach. The first stage, Foundational Reasoning Enhancement (FRE), builds strong reasoning foundations using text-only data. The second stage, Multimodal Generalization Training (MGT), extends these capabilities to multimodal domains. LMM-R1 supports various LMMs like Qwen2.5-VL, Phi3.5-V, and Phi4-Multimodal, and offers distributed PPO and REINFORCE++/RLOO implementations based on Ray, achieving significant speedups. It also integrates with vLLM for accelerated generation, FlashAttention2, and supports QLoRA/LoRA for efficient fine-tuning.

OpenML

OpenML

59%

OpenML is a collaborative online machine learning platform designed to facilitate the sharing and organization of data, machine learning algorithms, and experimental results. It aims to create a frictionless, networked ecosystem where scientists and practitioners can easily integrate their existing processes and tools to collaborate globally. The platform provides significant benefits for science by enabling rapid building upon others' results, answering complex questions quickly through prior experiments, and making larger studies feasible. For scientists, it saves time on routine duties, compares new experiments to the state of the art, and offers potential for new discoveries and publications. OpenML also serves as a valuable learning environment for students and citizen scientists, allowing them to explore state-of-the-art methods and contribute their own work.

Osprey

Osprey

59%

Osprey is a cutting-edge computer vision tool that enhances multimodal large language models (MLLMs) by incorporating pixel-wise mask regions into language instructions. This innovative approach enables fine-grained visual understanding, allowing Osprey to generate detailed semantic descriptions, including both short and elaborate explanations, based on specific input mask regions. It seamlessly integrates with Segment Anything Model (SAM) in various modes like point-prompt, box-prompt, and segmentation everything, to extract and describe semantics associated with particular parts or objects within an image. Osprey is built upon the LLaVA-v1.5 codebase and is designed for researchers and developers working on advanced visual instruction tuning and pixel-level image analysis.

PyTorch-BayesianCNN

PyTorch-BayesianCNN

59%

PyTorch-BayesianCNN provides an implementation of Bayesian Convolutional Neural Networks (CNNs) with variational inference, specifically utilizing Bayes by Backprop, within the PyTorch framework. This tool allows researchers and developers to build CNNs that can infer intractable posterior probability distributions over weights, offering a significant advantage over traditional frequentist approaches by providing uncertainty estimations. It includes two types of Bayesian layer implementations: BBB (Bayes by Backprop) and BBB_LRT (Bayes by Backprop with Local Reparametrization Trick), which enhances sampling efficiency. The repository supports standard datasets like MNIST, CIFAR10, and CIFAR100, and includes implementations of common models such as AlexNet and LeNet, making it a valuable resource for experimenting with Bayesian deep learning and understanding model uncertainty.

pytorch_active_learning

pytorch_active_learning

59%

pytorch_active_learning is an open-source PyTorch library designed for active learning, accompanying the "Human-in-the-Loop Machine Learning" book. It offers a range of active learning methods, including Least Confidence, Margin of Confidence, Ratio of Confidence, and Entropy sampling. The library also supports more advanced techniques like Model-based Outlier sampling, Cluster-based sampling, and various forms of Active Transfer Learning. It is suitable for researchers and practitioners looking to experiment with and apply active learning strategies in computer vision and natural language processing, with a focus on real-world diversity to avoid bias. The code is stand-alone and can be easily integrated with existing PyTorch installations.

RoseTTAFold

RoseTTAFold

59%

RoseTTAFold is a deep learning model and script package designed for the accurate prediction of protein structures and interactions. This tool is an official implementation of the RoseTTAFold architecture, which employs a 3-track neural network to achieve its predictions. It is primarily intended for research in computational biology, enabling scientists to model complex protein structures and protein-protein interactions (PPIs). The package includes scripts for installation, dependency management, and running predictions for both monomer structures and complex modeling. It also features a faster 2-track version for PPI screening, making it a versatile tool for advanced biological research.

SimGNN

SimGNN

59%

SimGNN is a PyTorch implementation of a novel neural network approach designed for fast graph similarity computation, as detailed in the WSDM 2019 paper. It addresses the computational burden of traditional methods like Graph Edit Distance (GED) and Maximum Common Subgraph (MCS) while maintaining high performance. The tool employs a learnable embedding function to map graphs into embedding vectors, providing a global summary. A key feature is its attention mechanism, which emphasizes important nodes for specific similarity metrics. Additionally, SimGNN includes a pairwise node comparison method to supplement graph-level embeddings with fine-grained node-level information. This approach leads to better generalization on unseen graphs and offers quadratic time complexity in the worst case. Experimental results demonstrate its effectiveness and efficiency, achieving smaller error rates and significant time reductions compared to existing baselines.

Self-Driving-Car-in-Video-Games

Self-Driving-Car-in-Video-Games

59%

Self-Driving-Car-in-Video-Games is an open-source project featuring a supervised deep neural network designed to learn autonomous driving within video games, specifically Grand Theft Auto V. The model, named T.E.D.D. 1104, is trained using extensive human-labeled data, recording gameplay and key inputs to teach it how to navigate various vehicles under different weather conditions. It approaches the task as a classification problem, taking a sequence of five images as input and predicting the correct keyboard or Xbox controller inputs. The project provides pretrained models of varying sizes (XXL, M, S) and includes all necessary files for data generation, training, and real-time inference, primarily supporting Windows 10/11 for gameplay interaction.

TPVFormer

TPVFormer

59%

TPVFormer is an academic project offering a Tri-Perspective View (TPV) representation for vision-based 3D semantic occupancy prediction, serving as an alternative to Tesla's Occupancy Network for autonomous driving research. It addresses the limitations of traditional bird's-eye-view (BEV) representations by incorporating two additional perpendicular planes, allowing for a more fine-grained description of 3D scenes. The tool features a transformer-based TPV encoder (TPVFormer) to effectively obtain TPV features by aggregating image features. It demonstrates that camera inputs alone can achieve performance comparable to LiDAR-based methods on LiDAR segmentation tasks. The project also includes resources for semantic scene completion and comparisons with Tesla's Occupancy Network.

VividTalk

VividTalk

59%

VividTalk is an open-source project designed for one-shot audio-driven talking head generation. It leverages a 3D hybrid prior to produce realistic facial animations directly from audio input. This tool is particularly suitable for researchers and developers working in AI-driven video synthesis and deepfake creation, offering a foundation for exploring advanced animation techniques. As a GitHub repository, it provides the code and resources for users to implement and experiment with the technology, making it a valuable asset for those interested in the technical aspects of generating dynamic talking head videos.

WeDLM

WeDLM

59%

WeDLM is an open-source diffusion language model developed by Tencent, designed for high-speed inference. It uniquely reconciles diffusion language models with standard causal attention, enabling native KV cache compatibility with technologies like FlashAttention and PagedAttention. This approach allows for direct initialization from pre-trained autoregressive models such as Qwen2.5 and Qwen3, delivering significant real speedups compared to vLLM-optimized baselines. WeDLM achieves 3-6x speedup on tasks like math reasoning and up to 10x on sequential/counting tasks, while maintaining competitive accuracy. It includes an inference engine, evaluation suite, and a fine-tuning framework, making it a powerful tool for developers and researchers focused on efficient language model deployment.

EduLink AI

EduLink AI

59%

EduLink AI is dedicated to transforming education with advanced AI solutions, focusing on elevating teaching, learning, and academic integrity. Its core offerings include The Checker AI, designed to safeguard academic integrity by detecting AI-generated content and ensuring the authenticity of student work, and The Tutor AI, an enhanced digital assistant for educators and students that provides AI-powered summaries and tailored lesson plans. EduLink AI's solutions are built for a wide range of educational institutions, from K-12 schools to universities, and are compliant with data privacy regulations like GDPR. The platform also aims to provide inclusive solutions for neurodiversity, adapting to individual learning styles.

aTrain

aTrain

59%

aTrain is a powerful GUI tool designed for offline transcription of speech recordings, leveraging state-of-the-art machine learning models for high accuracy and speed. Developed by researchers at the University of Graz, it features speaker diarization to identify different speakers in a recording. A key differentiator is its commitment to privacy, processing all data locally on your device without internet uploads, ensuring GDPR compliance. It supports transcription in 99 languages and offers compatibility with popular qualitative analysis tools like MAXQDA, ATLAS.ti, and nVivo. The tool can run on both CPU and NVIDIA GPUs, with GPU support significantly reducing transcription times.

WenetSpeech

WenetSpeech

59%

WenetSpeech offers a comprehensive 10000+ hour multi-domain Chinese corpus specifically designed for speech recognition tasks. This extensive dataset is compiled from YouTube and Podcast sources, utilizing both Optical Character Recognition (OCR) and Automatic Speech Recognition (ASR) techniques for labeling. To ensure high quality, the corpus undergoes a novel end-to-end label error detection method for validation and filtering. It categorizes data into High Label, Weak Label, and Unlabel sets, suitable for supervised, semi-supervised, or unsupervised training. The dataset also provides various training subsets (S, M, L) and evaluation sets (DEV, TEST_NET, TEST_MEETING) to support diverse ASR system development and benchmarking. Access to the dataset requires visiting the official website, agreeing to the license, and obtaining a password.

AgentBench

AgentBench

59%

AgentBench is a comprehensive benchmark designed to evaluate Large Language Models (LLMs) as agents across a diverse spectrum of environments. It encompasses 8 distinct environments, including 5 newly created domains like Operating System (OS), Database (DB), Knowledge Graph (KG), Digital Card Game (DCG), and Lateral Thinking Puzzles (LTP), alongside 3 recompiled from published datasets (House-Holding, Web Shopping, Web Browsing). The platform offers both Dev and Test splits for each dataset, requiring LLMs to generate responses thousands of times for thorough evaluation. AgentBench also introduces VisualAgentBench for evaluating and training visual foundation agents based on large multimodal models (LMMs), covering embodied, GUI, and visual design environments. It supports quick setup using Docker Compose and provides benchmarking results via a leaderboard.

camel_tools

camel_tools

59%

camel_tools is a comprehensive, open-source Python toolkit developed by the CAMeL Lab at New York University Abu Dhabi, specifically designed for Arabic natural language processing. It offers a wide array of functionalities including text pre-processing, advanced morphological modeling, and specialized components for Dialect Identification, Named Entity Recognition, and Sentiment Analysis. The tool is built to be accessible for researchers and developers, with clear installation instructions for various operating systems like Linux, macOS, and Windows. It also provides options for installing necessary data packages, making it a robust solution for anyone working with the complexities of the Arabic language in NLP tasks.

LLM-Pruner

LLM-Pruner

59%

LLM-Pruner is a cutting-edge tool designed for the structural pruning of large language models (LLMs), as presented at NeurIPS 2023. It enables users to compress LLMs to any desired size while retaining their original multi-task solving abilities. The tool emphasizes task-agnostic compression, requiring minimal training corpus (e.g., 50k Alpaca samples for post-training) and offering efficient compression times, with pruning taking approximately 3 minutes and post-training around 3 hours. LLM-Pruner supports a wide range of popular LLMs, including Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, and TinyLlama. It features an automatic structural pruning process, aiming for minimal human effort, and provides detailed instructions for discovery, estimation, and recovery stages of pruning, along with evaluation using lm-evaluation-harness.

unet.cu

unet.cu

59%

unet.cu is an open-source project that provides a UNet diffusion model implemented entirely in C++/CUDA. Inspired by Andrej Karpathy's llm.c, the goal is to achieve performance comparable to PyTorch implementations, specifically for training unconditional diffusion models. The repository includes benchmarks showing its training speed relative to PyTorch and PyTorch with `torch.compile`. It supports training with sample images from ImageNet 64x64 and allows users to train with their own data. The project emphasizes learning CUDA concepts and provides a detailed breakdown of its architecture, including custom convolution kernels and optimizations to avoid inefficient data transposes.

pytorch-bert-crf-ner

pytorch-bert-crf-ner

59%

Pytorch-bert-crf-ner offers a PyTorch implementation for Korean Named Entity Recognition (NER) tagging, leveraging the power of BERT and CRF models. This open-source tool is specifically designed to assist in Korean Natural Language Processing (NLP) tasks and research. It provides functionalities to identify and classify named entities such as persons, locations, organizations, dates, and more within Korean text. The repository includes examples, data utilities, and training scripts, making it suitable for developers and researchers working with Korean language data who need to implement or experiment with NER models.

pytorch-maddpg

pytorch-maddpg

59%

Pytorch-maddpg offers a PyTorch implementation of the Multi-Agent Deep Deterministic Policy Gradient (MADDPG) algorithm, a key approach in multi-agent reinforcement learning. This open-source project is hosted on GitHub and is designed for researchers and developers working on complex multi-agent systems. The implementation includes a modified Waterworld environment, where agents (evaders, pursuers, poisons) interact under specific physical rules, allowing for experimentation with cooperative behaviors. It supports features like agents bouncing off walls and requiring exact cooperation for rewards, making it a valuable tool for studying multi-agent coordination and policy learning.