Research & Education
Browsing page 90 of AI tools for Academic Research in Research & Education. Sorted by confidence score — our independent quality rating.
parameter_efficient_instruction_tuning
parameter_efficient_instruction_tuning is an open-source repository dedicated to the systematic comparison of various parameter-efficient fine-tuning (PEFT) methods for instruction tuning tasks. The project utilizes the SuperNI dataset as its primary benchmark for training and evaluation. Implementations of PEFT methods are adapted from well-known libraries such as adapter-transformers and peft. The repository includes bash scripts for running experiments, optimized for the hfai HPC platform, supporting features like experiment configuration, checkpoint management, and training state validation. It also addresses platform-specific considerations like PyTorch and CUDA compatibility, making it a valuable resource for researchers and developers working on efficient large language model fine-tuning.
Point-BERT
Point-BERT is a PyTorch implementation of a novel pre-training paradigm for 3D point cloud Transformers, introduced in CVPR 2022. Inspired by BERT, it utilizes a Masked Point Modeling (MPM) task where point clouds are divided into local patches, and a discrete Variational AutoEncoder (dVAE) tokenizes these patches. The pre-training objective involves recovering original point tokens at masked locations, supervised by the dVAE's output. This method significantly advances the capabilities of Transformers for 3D data, facilitating tasks like classification on ModelNet40 and ScanObjectNN, few-shot learning, and part segmentation on ShapeNetPart. It is an essential tool for researchers and engineers working with 3D point cloud analysis.
rome
ROME (Rank-One Model Editing) is an open-source tool designed for researchers and developers to precisely locate and modify factual associations within large language models, specifically GPT-2 XL and GPT-J. This GPU-only implementation allows for targeted editing of model knowledge without extensive retraining. It provides functionalities for causal tracing to understand model behavior and a straightforward API for specifying rewrite requests. The repository includes evaluation suites for benchmarking editing methods against CounterFact, making it a valuable resource for advancing research in model interpretability and editability. Users can also integrate new editing methods for comparative analysis.
SEAL
SEAL (learning from Subgraphs, Embeddings, and Attributes for Link prediction) is a novel framework designed for link prediction. It systematically transforms the link prediction task into a subgraph classification problem. For each target link, SEAL extracts its h-hop enclosing subgraph and constructs a node information matrix, which can include structural node labels, latent embeddings, and explicit attributes. This data is then fed into a graph neural network (GNN) to classify the existence of the link, allowing the model to learn from both graph structure features and latent/explicit node features simultaneously. The framework is implemented in both MATLAB and Python, with a PyTorch Geometric version available for testing on OGB, Planetoid, and custom datasets. Notably, SEAL can achieve strong performance even without node embeddings or attributes, leveraging purely graph structures, and can function as an inductive link prediction model.
Self-Driving Delivery Agent
Self-Driving Delivery Agent, also known as DriVLMe, is an open-source project providing the official implementation of the IROS 2024 paper: "Enhancing LLM-based Autonomous Driving Agents with Embodied and Social Experience." This tool is designed for researchers and developers working on autonomous driving systems, particularly those interested in integrating large language models (LLMs) with real-world driving experiences. It offers a framework for setting up a conda environment, preparing LLaVA weights, and training/finetuning models on datasets like bddx and SDN. The project includes scripts for pretraining, finetuning, and evaluating autonomous driving agents, making it a valuable resource for advancing the field of AI-driven autonomous vehicles.
Top2Vec
Top2Vec is an open-source Python library designed for advanced topic modeling and semantic search. It automatically detects topics within text data and generates jointly embedded topic, document, and word vectors. The library offers a 'classic' version for general topic modeling and a newer 'contextual' version that leverages contextual token embeddings to identify multiple topics per document and even detect topic segments within documents. This contextual approach provides a more nuanced understanding of complex texts. Key features include automatic topic number detection, hierarchical topic generation, keyword-based topic search, and document search by topic or keywords. Top2Vec eliminates the need for stop word lists, stemming, or lemmatization, and works effectively on short texts. It also supports various embedding models like Doc2Vec, Universal Sentence Encoder, and BERT Sentence Transformer for flexible deployment.
VLA-Adapter
VLA-Adapter is an open-source implementation offering an effective paradigm for tiny-scale Vision-Language-Action (VLA) models. It provides a robust framework for training and deploying VLA models, particularly for robotic control and real-world system integration. The tool supports various GPU configurations, from extremely limited VRAM (10-12GB) to professional-grade GPUs (80GB+), making it accessible for diverse research and development environments. Key features include support for LIBERO and CALVIN benchmarks, an enhanced Pro version for improved performance, and compatibility with various foundation models and real-world robotic systems like ALOHA and Franka. It also offers detailed guidance on data preparation and training configurations.
beir
BEIR (Benchmarking Information Retrieval) is an open-source, heterogeneous benchmark designed for evaluating NLP-based retrieval models across a wide array of information retrieval tasks. It offers a common and user-friendly framework for researchers and developers to assess their models against over 15 diverse IR datasets. The tool supports various retrieval architectures, including lexical, dense, sparse, and reranking-based models, and allows for the evaluation of custom models using state-of-the-art metrics like NDCG@k, MAP@K, Recall@K, Precision@K, and MRR. BEIR simplifies the preprocessing of IR datasets and integrates with popular platforms like Hugging Face, making it a valuable resource for both academia and industry.
MAgent
MAgent is a research platform specifically engineered for many-agent reinforcement learning, distinguishing itself from other platforms that typically focus on single or few-agent scenarios. It enables researchers to scale up their reinforcement learning experiments from hundreds to millions of agents, facilitating the study of artificial collective intelligence. The platform supports both Linux and OS X and allows for the implementation of various algorithms, including rule-based systems and deep learning frameworks. While the original project is no longer maintained, a community-maintained fork, MAgent2, is available for continued development and use. It offers examples for training and playing with agents in scenarios like pursuit, gathering, and battle, along with baseline algorithms like DQN, DRQN, and A2C.
Mocha.jl
Mocha.jl is a deep learning framework for the Julia programming language, drawing inspiration from the C++ framework Caffe. Although now deprecated, it was designed for efficient training of deep and shallow convolutional neural networks, supporting optional unsupervised pre-training via stacked auto-encoders. The framework boasts a modular architecture with isolated components for layers, activation functions, solvers, and more, allowing for easy extension. Written in Julia, it offers a high-level interface for intuitive deep neural network experimentation. Mocha.jl provides multiple backends, including a portable pure Julia backend, a faster native extension backend, and a highly efficient GPU backend utilizing NVidia® cuDNN and CUDA kernels. It also supports HDF5 for data and model storage, ensuring compatibility with other computational tools, and can import Caffe model snapshots.
PyTorch-BYOL
PyTorch-BYOL offers a robust PyTorch implementation of the Bootstrap Your Own Latent (BYOL) self-supervised learning approach. This tool is designed for researchers and developers to experiment with and apply BYOL algorithms for representation learning. It includes configurable parameters for network architecture (ResNet-18 or ResNet-50), projection and prediction heads, data transformations, and trainer settings such as batch size, momentum update, and epochs. The repository provides clear installation instructions and configuration options, making it accessible for those looking to delve into self-supervised learning without starting from scratch. It also details feature evaluation methods, including linear separability using logistic regression and KNN on datasets like STL10.
SuperGluePretrainedNetwork
SuperGluePretrainedNetwork is a research project from Magic Leap, presented at CVPR 2020, focusing on learning feature matching using Graph Neural Networks. The core of the project is the SuperGlue network, which integrates a Graph Neural Network with an Optimal Matching layer. This architecture is specifically designed to perform matching tasks on two distinct sets of sparse image features. The repository offers both the PyTorch code implementation and pretrained weights, making it accessible for researchers and developers interested in computer vision and feature matching applications. It serves as a valuable resource for those looking to implement or build upon advanced feature matching techniques.
Glass.AI
Glass.AI is an AI technology company that provides transparent, evidence-led AI for researching companies, sectors, and themes globally. Unlike Large Language Models, Glass.AI delivers results grounded in evidence, free from hallucinations, by continuously tracking millions of data sources across the internet, including websites, social media, and news. It extracts meaning, patterns, and insights using advanced natural language understanding, and monitors companies, sectors, and themes over time to keep data fresh. The platform is trusted by governments, consultancies, and corporates for accurate, verifiable intelligence, enabling them to map key sectors, track clients and competitors, and transform research processes.
text_renderer
text_renderer is an open-source tool designed to generate synthetic text line images, primarily for training deep learning Optical Character Recognition (OCR) models like CRNN. It features a modular design, allowing users to easily add different components such as Corpus, Effect, and Layout. A key capability is its integration with Albumentations, providing a wide range of image augmentation effects to enhance dataset diversity. The tool supports rendering multiple corpora on a single image with varying effects, generating vertical text, and creating LMDB datasets compatible with PaddleOCR. It also includes a web-based font viewer and corpus sampler for character balance.
Uni-ControlNet
Uni-ControlNet is an advanced AI tool designed to offer comprehensive control over text-to-image diffusion models. It provides an all-in-one method for controllable image synthesis, allowing users to precisely guide the generation process. The tool unifies various control aspects, simplifying the creation of specific image outputs. Based on research presented at NeurIPS 2023, Uni-ControlNet aims to enhance the flexibility and accuracy of AI-driven image generation, making it a valuable resource for researchers and developers working with diffusion models.
UER-py
UER-py (Universal Encoder Representations) is an open-source framework designed for pre-training on general-domain corpora and fine-tuning on downstream NLP tasks using PyTorch. It emphasizes model modularity, allowing users to combine various embedding, encoder, decoder, and target modules to construct custom pre-training models. The toolkit supports CPU, single GPU, and distributed training modes, making it versatile for different computational environments. UER-py also provides a comprehensive model zoo with pre-trained models of diverse properties, facilitating their direct use in various applications. It has been tested for reproducibility against original implementations of models like BERT, GPT-2, ELMo, and T5, and offers solutions for numerous NLP competitions.
Voice-Cloning-App
Voice-Cloning-App is an open-source Python/Pytorch application designed for easily synthesizing human voices. It offers key features such as automatic dataset generation, including support for subtitles and audiobooks, and additional language support. The tool facilitates both local and remote training, with easy start/stop functionality, and supports data importing/exporting, as well as multi-GPU setups. It is built upon a reworked version of Tacotron2 and integrates other technologies like DSAlign, Silero, DeepSpeech, and hifi-gan. The application is suitable for users running Windows 10 or Ubuntu 20.04+ with at least 5GB of disk space, and optionally an NVIDIA GPU with 4GB+ memory for enhanced performance.
Frontdoor
Otio is an AI research assistant and writing partner designed to streamline research and writing processes for professionals and students. It allows users to upload PDFs, articles, videos, and transcripts, or connect cloud storage services like Google Drive, Zotero, Mendeley, Dropbox, OneDrive, and Box. Otio then enables users to chat across their sources with any AI model, providing verified citations for every answer. The tool helps users synthesize information, take notes, conduct deep research, visualize data, and build slides, aiming to reduce time spent on manual synthesis and improve research efficiency.
Hippo AI Foundation
The Hippo AI Foundation is a non-profit organization dedicated to open-sourcing medical knowledge to advance AI-based healthcare. Its core mission is to establish AI healthcare as a common good, moving away from the current trend of privatizing medical knowledge. The foundation actively supports the development and deployment of AI technologies that benefit all, ensuring that advancements in medical AI are accessible and contribute to public welfare rather than being confined to proprietary systems. This initiative seeks to democratize access to cutting-edge healthcare solutions powered by artificial intelligence.
3D-convolutional-speaker-recognition
3D-convolutional-speaker-recognition is an open-source project providing a TensorFlow implementation of 3D Convolutional Neural Networks for text-independent speaker verification. The project leverages a 3D convolutional architecture to simultaneously capture speech-related and temporal information from speaker utterances, leading to more robust speaker models. It outlines a three-phase Speaker Verification Protocol (SVP) including development, enrollment, and evaluation stages. A key differentiator is its approach to direct speaker model creation, which is shown to significantly outperform traditional d-vector verification systems. The code uses MFECs (Mel-Frequency Energy Coefficients) as input features, discarding the DCT operation of MFCCs to preserve locality for convolutional operations. The implementation details for the 3D convolutional operations using TensorFlow Slim are provided, making it a valuable resource for researchers and developers in the field.
memit
memit is a powerful tool designed for mass-editing thousands of facts into a transformer's memory, as presented at ICLR 2023. It provides a method for simultaneously updating large quantities of information stored within transformer models. This capability is crucial for researchers and engineers focused on enhancing the accuracy and knowledge base of AI models. The tool offers a straightforward API for specifying rewrite requests, allowing users to define prompts, subjects, and target new information for editing. It also includes functionalities for running full evaluation suites and generating scaling curves to analyze performance.
mini-sglang
Mini-SGLang is a compact and high-performance inference framework specifically designed for Large Language Models (LLMs). It serves as a lightweight implementation of SGLang, aiming to simplify the complexities of modern LLM serving systems. With a codebase of approximately 5,000 lines of Python, it functions as both a capable inference engine and a transparent reference for researchers and developers. Key features include advanced optimizations such as Radix Cache for KV cache reuse, Chunked Prefill to reduce peak memory usage, Overlap Scheduling to hide CPU overhead, Tensor Parallelism for multi-GPU scaling, and optimized kernels like FlashAttention and FlashInfer for maximum efficiency. It supports online serving with an OpenAI-compatible API and an interactive shell mode for direct model interaction.
DeepLearnToolbox
DeepLearnToolbox is a Matlab/Octave toolbox designed for deep learning research and development. It includes various deep learning models such as Deep Belief Nets (DBN), Stacked Autoencoders (SAE), Convolutional Neural Nets (CNN), Convolutional Autoencoders (CAE), and vanilla Neural Nets (NN). Each model comes with practical examples to guide users through implementation and experimentation. While the toolbox was a valuable resource, it is no longer maintained and is considered outdated. The creator recommends using more modern and actively developed deep learning frameworks like Theano, Torch, or TensorFlow for current projects.
dm_control
dm_control is Google DeepMind's comprehensive software stack designed for physics-based simulation and Reinforcement Learning (RL) environments, built upon the MuJoCo physics engine. It offers Python bindings to the MuJoCo engine, a suite of RL environments, and an interactive viewer for real-time interaction. The package also includes libraries for composing and modifying MuJoCo MJCF models in Python, defining rich RL environments from reusable components, and additional libraries for custom tasks like multi-agent soccer. This open-source tool is ideal for researchers and developers working on advanced AI and robotics applications, providing a robust infrastructure for developing and testing continuous control algorithms.