Research & Education
Browsing page 98 of AI tools for Academic Research in Research & Education. Sorted by confidence score — our independent quality rating.
PoseFormer
PoseFormer is an open-source project that provides an official implementation of the paper "3D Human Pose Estimation with Spatial and Temporal Transformers," accepted at ICCV 2021. This tool is designed for researchers and developers working in the field of computer vision and human pose estimation. It offers code built on VideoPose3D, allowing users to evaluate pre-trained models with both CPN detected and ground truth 2D poses as input. Additionally, PoseFormer supports training new models from scratch, with configurable frame inputs to achieve varying levels of accuracy. The repository also links to related works like Context-Aware PoseFormer (NeurIPS 2023) and PoseFormerV2 (CVPR 2023), indicating ongoing research and development in this area.
Reproducible-Deep-Compressive-Sensing
Reproducible-Deep-Compressive-Sensing is a comprehensive collection of source code dedicated to deep learning-based compressive sensing (DCS). This repository categorizes and provides access to numerous research works, offering links to their respective source code, PDF papers, and DOIs. The collection is organized based on key characteristics such as sampling matrix type (frame-based/block-based), sampling scale (single scale, multi-scale), and the deep learning platform used. It also includes code for image and video reconstruction, as well as other related applications. This resource is invaluable for researchers and developers looking to explore, reproduce, or build upon existing deep learning models in compressive sensing.
Deep-RL-Notes
Deep-RL-Notes offers a comprehensive collection of notes on Deep Reinforcement Learning, specifically tailored for UC Berkeley's CS 285 (formerly CS 294-112) course, taught by Professor Sergey Levine. This resource serves as a textbook, covering foundational concepts like Markov decision processes and value functions, as well as advanced techniques such as Deep Q-Networks (DQN) and Proximal Policy Optimization (PPO). It integrates deep learning with reinforcement learning, discussing function approximation and representation learning. Users can compile the LaTeX source code into a PDF locally or edit it online via Overleaf, as the repository is regularly updated. The notes aim to balance theoretical clarity with practical relevance, providing examples, case studies, and programming exercises for hands-on experience.
Texygen
Texygen is an open-source benchmarking platform designed to support research in open-domain text generation models. It offers a comprehensive suite of implemented text generation models, alongside a diverse set of metrics for evaluating the diversity, quality, and consistency of generated texts. The platform aims to standardize research in the field of text generation, fostering reproducibility and reliability in future work. By facilitating the sharing of fine-tuned open-source implementations among researchers, Texygen helps advance the development and understanding of text generation technologies. It supports Python 3.6+ and popular libraries like TensorFlow, Numpy, Scipy, and NLTK.
trafilatura
Trafilatura is a powerful Python package and command-line tool designed for comprehensive web data extraction. It simplifies the process of converting raw HTML into structured, meaningful data, offering capabilities for web crawling, scraping, and extraction of main texts, metadata, and comments. The tool is highly configurable and robust, balancing precision in limiting noise with recall for including all valid content. It supports sitemaps and feeds for advanced text discovery, efficient processing of online and offline input, and offers multiple output formats including TXT, Markdown, CSV, JSON, HTML, XML, and XML-TEI. Trafilatura is widely adopted by major companies and institutions, and consistently outperforms other open-source libraries in text extraction benchmarks.
Transformer-SSL
Transformer-SSL is an open-source project offering the official implementation for "Self-Supervised Learning with Swin Transformers." This codebase is notable for including Swin Transformer as one of its backbones, enabling the evaluation of learned representations' transferring performance on downstream tasks like object detection and semantic segmentation. It features MoBY, a self-supervised learning approach combining MoCo v2 and BYOL, achieving high accuracy on ImageNet-1K linear evaluation with significantly fewer tricks than previous works. The project provides models and code for self-supervised learning, linear evaluation, and demonstrates strong performance when transferring to object detection and semantic segmentation tasks.
EDGE AI FOUNDATION
The EDGE AI FOUNDATION, formerly the tinyML Foundation, is a global non-profit organization dedicated to advancing Edge AI through innovation, collaboration, advocacy, and education. It connects researchers, developers, business leaders, and policymakers to foster breakthroughs in AI technologies at the edge. The foundation offers various resources, including an Edge AI Certification Catalog, events, livestreams, and publications like technology reports and articles. It actively partners with academia and industry through working groups to drive cross-industry initiatives and best practices, and promotes responsible AI development. The foundation also curates industry news, highlighting advancements and trends in Edge AI.
Industrial Engineering & Innovation Sciences at TU/e
Eindhoven University of Technology (TU/e) is a leading research university dedicated to engineering science and technology. The Industrial Engineering & Innovation Sciences department focuses on effective and value-driven innovation, researching the responsible implementation of advanced technologies like AI and robotics. The program uniquely combines social sciences, humanities, and technical sciences to address complex challenges. Key research themes include the interaction between humans and technology, supply chain management, sustainability, and data-driven intelligence. TU/e offers bachelor's and master's programs, conducts extensive research, and fosters cooperation with industry, providing a comprehensive environment for academic and professional growth.
timm Attention Visualization
timm Attention Visualization is an AI tool designed to help users understand how deep learning models, specifically those from the timm (PyTorch Image Models) library, process visual information. By uploading an image and selecting a timm model, users can generate detailed attention maps and rollout visualizations. These visualizations highlight the specific parts of an image that the model focuses on when making predictions, offering insights into its decision-making process. This tool is invaluable for researchers, developers, and data scientists working with computer vision models, aiding in debugging, improving model interpretability, and enhancing overall model performance. It is hosted on Hugging Face Spaces, making it easily accessible for experimentation.
Uformer
Uformer is an open-source implementation of a general U-shaped Transformer designed for various image restoration tasks. Based on research presented at CVPR 2022, this tool employs a hierarchical encoder-decoder network with a local-enhanced window Transformer block to efficiently capture both local context and global dependencies. Its core designs include non-overlapping window-based self-attention to reduce computational requirements and depth-wise convolution in the feed-forward network. Uformer also explores three skip-connection schemes to optimize information flow from the encoder to the decoder. It has been extensively tested and proven superior in tasks such as image denoising (SIDD, DND), motion deblurring (GoPro, HIDE, RealBlur-J/-R), defocus deblurring (DPDD), deraining, and demoireing. The project is built with PyTorch 1.9.0, Python3.7, and CUDA11.1, making it accessible for researchers and developers.
VLM-R1
VLM-R1 is an open-source project from om-ai-lab that introduces a stable and generalizable R1-style Large Vision-Language Model. It is designed to solve complex visual understanding tasks, demonstrating state-of-the-art performance in areas such as Open-Vocabulary Detection (OVD) and multimodal math reasoning. The project supports various fine-tuning methods, including full fine-tuning for GRPO, LoRA fine-tuning, and multi-node training. VLM-R1 also offers multi-image input capabilities and supports different VLMs like QwenVL and InternVL. Recent updates have optimized its performance on Huawei Ascend Atlas series hardware, significantly reducing Time to First Token (TTFT) and increasing throughput. The repository provides comprehensive scripts for training, evaluation, and deployment, making it a valuable resource for researchers and developers working with advanced vision-language models.
transfuser
TransFuser is an open-source project that focuses on advancing autonomous driving technology through transformer-based sensor fusion. This tool implements imitation learning for the control of autonomous vehicles, leveraging multi-modal fusion transformers for end-to-end autonomous driving. The project is a journal extension of previous work, offering researchers and developers a robust codebase for experimentation and development in the field. It includes detailed setup instructions for CARLA, dataset generation scripts, and training and evaluation procedures. The repository also provides pre-trained agents and tools for submitting to the CARLA leaderboard, making it a comprehensive resource for those working on autonomous driving systems.
VM-UNet
VM-UNet is an open-source code repository for 'Vision Mamba UNet for Medical Image Segmentation,' a novel U-shape architecture model designed for medical image segmentation. It addresses the limitations of CNNs in long-range modeling and the quadratic computational complexity of Transformers by utilizing State Space Models (SSMs), specifically Mamba. The tool introduces the Visual State Space (VSS) block as its foundation to capture extensive contextual information and employs an asymmetrical encoder-decoder structure. VM-UNet has demonstrated competitive performance on datasets like ISIC17, ISIC18, and Synapse, aiming to establish a baseline for efficient and effective SSM-based segmentation systems in medical imaging.
W2NER
W2NER offers the source code for a novel approach to Unified Named Entity Recognition (NER), as presented in an AAAI 2022 paper. Unlike traditional methods that study flat, overlapped, and discontinuous NER individually, W2NER unifies these tasks by modeling them as word-word relation classification. The architecture effectively captures neighboring relations between entity words using Next-Neighboring-Word (NNW) and Tail-Head-Word-* (THW-*) relations. It employs a neural framework that treats unified NER as a 2D grid of word pairs, enhanced by multi-granularity 2D convolutions for refining grid representations. A co-predictor then reasons about word-word relations. The model has demonstrated state-of-the-art performance across 14 benchmark datasets, including both English and Chinese, for all three types of NER.
xplique
Xplique is a comprehensive Python toolkit designed to bring clarity to complex neural network models through state-of-the-art Explainable AI (XAI) techniques. Originally developed for TensorFlow models, it also offers partial compatibility with PyTorch. The library features modules for Attribution Methods, allowing users to compute explanations like Grad-CAM and Integrated Gradients across various tasks such as classification, regression, object detection, and semantic segmentation. It also includes Feature Visualization to understand how networks build their understanding, Concept Extraction to identify human concepts, and Metrics to evaluate the faithfulness and robustness of explanations. Xplique supports diverse data types including images, time series, and tabular data, making it a versatile tool for AI model analysis and debugging.
wer_are_we
wer_are_we is an open-source project dedicated to tracking the state-of-the-art and recent research results in speech recognition. It functions as a dynamic bibliography, compiling and presenting performance metrics (such as Word Error Rate or WER) for various models across different datasets like LibriSpeech, WSJ, Hub5'00, TED-LIUM, and CHiME. The project details the architectures, training methodologies, and published papers associated with each result, offering a valuable resource for researchers and practitioners to compare and understand advancements in the field. Users are encouraged to contribute corrections and updates, fostering a collaborative environment for maintaining an accurate and up-to-date overview of speech recognition progress.
Pixstart
Pixstart offers innovative solutions for public and private actors to better manage and monitor the ecology of territories using satellite data and AI. The tool helps track the evolution of environments, providing insights into water quality, forest health, and complex environmental zones. It enables users to monitor natural resources and exploitation infrastructures, conduct comprehensive environmental diagnostics, and receive advice on actions to take. Pixstart's tools assist in identifying and adjusting best practices to support and improve ecosystems, addressing challenges posed by climate change and human activities with significant economic and health repercussions.
🐍💨 Data Contamination Database
The 🐍💨 Data Contamination Database is a Hugging Face Space designed to help users identify and manage data contamination within datasets and models. This application provides functionalities to filter and view data specifically related to contamination. Users can input particular evaluation datasets and contaminated sources, and then select various options to exclude or analyze these issues. It serves as a crucial resource for AI researchers and data scientists aiming to ensure the integrity and reliability of their data, ultimately leading to more robust and accurate AI models. The tool is hosted on Hugging Face Spaces, making it accessible for a wide range of users.
SciSpace by Typeset
SciSpace by Typeset is an advanced AI research agent designed to significantly accelerate academic workflows. It integrates with over 150 research tools, allowing users to efficiently search through a vast database of 280 million papers. The platform supports systematic reviews, assists in drafting manuscripts, and even helps match research to suitable journals. Key features include a Biomedical Agent, AI Writer, Chat with PDF, Literature Review tools, and a Citation Generator. SciSpace aims to reduce research time by up to 90% by automating many common research tasks and providing citation-backed results, making it an invaluable tool for researchers and students alike.
Savantic AI Lab
Savantic AI Lab operates as a full-stack AI lab, combining deep scientific expertise with real-world application to develop scalable, sustainable, and transformative AI solutions. With over two decades of innovation, they focus on "Meaningful AI" to drive sustainable growth, measurable impact, and long-term value across various industries. Their services range from research to real-world implementation, helping organizations turn AI potential into business impact. Savantic emphasizes ethical and responsible AI, ensuring solutions prioritize sustainability and deliver tangible results. They work with diverse sectors including Retail & Logistics, Medtech & Life Sciences, Industry & Energy, and Public Transportation & Municipalities.
AutoDL-Projects
AutoDL-Projects is an open-source, lightweight project offering automated deep learning algorithms implemented in PyTorch. It provides various neural architecture search (NAS) and hyper-parameter optimization (HPO) algorithms, making it suitable for beginners, engineers, and researchers. The project features simple library dependencies, a unified codebase for all algorithms, and active maintenance. Key capabilities include implementations of NAS algorithms like TAS, DARTS, GDAS, SETN, NAS-Bench-201, and NATS-Bench, as well as HPO-CG. It requires Python >= 3.6 and PyTorch >= 1.5.0, with options for knowledge distillation and pre-trained models.
ddpo
ddpo offers the training code for the Denoising Diffusion Policy Optimization (DDPO) paper, focusing on training diffusion models using reinforcement learning. The codebase has been rigorously tested on Google Cloud TPUs (v3 for RWR and v4 for DDPO) and includes a PyTorch implementation that extends support to GPUs and LoRA for efficient, low-memory training. Researchers can leverage this tool to experiment with different prompt distributions and reward functions, as defined in its configurable pipeline. It also supports RWR (Reward Weighted Regression) for various training strategies, including sparse RWR. The project provides detailed instructions for installation and running DDPO and RWR, making it a valuable resource for advanced AI research in diffusion models.
Shodhganga Thesis
Shodhganga Thesis provides comprehensive guidance for researchers on how to leverage India's national repository of theses and dissertations, Shodhganga. It offers practical advice on efficient searching, extracting value for literature reviews and methodology, and ensuring correct citation to prevent plagiarism. The tool helps users understand how to select topics, identify research gaps, clarify methodologies, build research tools, and structure chapters. It also provides strategies for evaluating thesis quality, adapting tools ethically, and organizing downloaded PDFs for an effective research workflow. The resource emphasizes learning from theses as a map rather than a source for direct copying, promoting academic integrity.
Gemma3n Visual (Audio) Question Answering
Gemma3n Visual (Audio) Question Answering is an AI tool that enables users to interact with images using audio queries. By uploading an image and speaking a question, users receive a text-based answer. This functionality makes it a valuable resource for multimodal AI research, allowing for exploration into how AI can process and respond to combined visual and auditory inputs. The tool is built as a Hugging Face Space, indicating its accessibility and potential for community-driven development and experimentation in the field of AI agents and automation.