Data & Analytics
Browsing page 34 of AI tools for Data Labeling & Annotation in Data & Analytics. Sorted by confidence score — our independent quality rating.
Budgerigar Gender Determination
Budgerigar Gender Determination is an AI tool hosted on Hugging Face designed to automatically identify the gender of budgerigars. Users can upload photos or videos of their birds, and the application will analyze the cere color to determine gender. The tool then draws labeled boxes around each detected bird, indicating its gender. It offers adjustable confidence and detection settings, allowing users to fine-tune the analysis. This free tool provides a quick and easy method for budgerigar owners, bird enthusiasts, and researchers to determine the gender of their birds without manual inspection.
Grounding-DINO-1.5-API
Grounding DINO 1.5 API introduces a suite of advanced open-set object detection models developed by IDEA Research, pushing the boundaries of open-set object detection. The suite includes Grounding DINO 1.5 Pro, designed for stronger generalization across a wide range of scenarios, and Grounding DINO 1.5 Edge, optimized for faster speed in edge computing applications. The project provides examples for using these models, which are hosted on DeepDataSpace. Users need to apply for an API Token through the DeepDataSpace website for their first application and can purchase additional API calls. The models demonstrate state-of-the-art performance on various benchmarks, including COCO, LVIS, and ODinW, for zero-shot and few-shot transfer learning.
Marigold Depth Completion
Marigold Depth Completion is an AI tool designed to generate detailed depth maps by combining an input image with sparse depth data. Users provide an image and a corresponding sparse depth map file, typically in a numpy format, to produce a comprehensive depth map. This application is particularly useful for tasks requiring accurate 3D scene understanding, such as in computer vision, robotics, and graphics processing. Developed by the Photogrammetry and Remote Sensing Lab of ETH Zurich, it offers a robust solution for enhancing depth information from incomplete datasets, making it a valuable resource for researchers and developers working with 3D data.
MONAILabel
MONAI Label is an intelligent open-source image labeling and learning tool designed to reduce the time and effort of annotating new datasets, particularly for medical imaging. It allows users to create annotated datasets and build AI annotation models for clinical evaluation. The tool operates as a server-client system, facilitating interactive medical image annotation through AI, and can run locally on a machine with single or multiple GPUs. It supports various medical imaging modalities and integrates with popular viewers like 3D Slicer, OHIF, QuPath, and CVAT. MONAI Label also provides a framework for developing and deploying custom labeling apps, offering compositional and portable APIs for easy integration into existing workflows.
ROLO
ROLO is an open-source recurrent YOLO (You Only Look Once) model designed for simultaneous object detection and tracking. It utilizes the regression capabilities of Long Short-Term Memory (LSTM) networks to interpret visual features and translate them into precise object coordinates. This approach allows ROLO to not only detect objects within a frame but also track their movement over time, making it suitable for applications requiring continuous object monitoring. The project is available on GitHub, indicating its open-source nature and accessibility for developers and researchers.
SINet
SINet is an open-source project for Camouflaged Object Detection (COD), a challenging computer vision task focused on detecting objects that blend into their natural habitat. Developed by Deng-Ping Fan and colleagues, SINet was presented at CVPR 2020 (Oral) and offers a robust baseline for COD research. The repository includes detailed introductions, the Search & Identification Net (SINet) model, and one-key evaluation codes. It also features the COD10K dataset, which provides diverse and meticulously annotated samples for training and testing. SINet is implemented in PyTorch and supports both training and testing, with an enhanced version (SINet-V2) accepted at IEEE TPAMI 2022. The project also highlights potential applications in medical imaging, agriculture, art, and computer vision.
SUSTechPOINTS
SUSTechPOINTS, hosted on GitHub, provides a comprehensive platform for software development, offering various plans tailored for individuals and organizations. The Free plan includes unlimited public/private repositories, Dependabot security updates, 2,000 CI/CD minutes/month, and 500MB of Packages storage. The Team plan expands on this with access to GitHub Codespaces, repository rules, multiple reviewers in pull requests, and increased CI/CD minutes and package storage. For larger organizations, the Enterprise plan adds advanced security, compliance features like SOC1/SOC2 reports, data residency options, and extensive support, making it suitable for managing complex projects and teams.
mmdetection3d
MMDetection3D is an open-source object detection toolbox built on PyTorch, designed as OpenMMLab's next-generation platform for general 3D detection. It supports a wide range of multi-modality and single-modality detectors, including MVXNet, VoteNet, and PointPillars. The platform handles popular indoor and outdoor 3D detection datasets like ScanNet, SUNRGB-D, Waymo, nuScenes, Lyft, and KITTI. A key feature is its natural integration with MMDetection, allowing users to leverage over 300 models and methods from 40+ papers in 2D detection. MMDetection3D is known for its high efficiency, offering faster training compared to other codebases, making it a robust library for various 3D detection projects.
UniDet
UniDet is an open-source object detection tool designed to operate across multiple large-scale datasets with an automatically learned unified label space. It was the winning solution of the ECCV 2020 Robust Vision Challenges. The tool offers state-of-the-art performance on datasets such as COCO, Objects365, OpenImages, and Mapillary. A key feature is its ability to predict class labels within this unified space, allowing it to be directly used for testing on novel datasets not included in its training. The repository also provides state-of-the-art baselines for Objects365 and OpenImages. UniDet is built on detectron2, making its inference API familiar to users of that framework.
WaifuDiffusion v1.4 Tags
WaifuDiffusion v1.4 Tags is an AI tool designed to analyze and tag images, specifically optimized for WaifuDiffusion v1.4. Users can upload an image to receive detailed tags, ratings, and character labels, making it highly suitable for booru websites and similar image-sharing platforms. The tool offers flexibility by allowing users to adjust thresholds and select different models to achieve more accurate and customized results. This capability ensures that the tagging process can be fine-tuned to meet specific requirements, providing a robust solution for image annotation and categorization.
Awesome-Referring-Image-Segmentation
Awesome-Referring-Image-Segmentation is a curated GitHub repository that compiles a vast collection of academic papers and datasets related to referring image segmentation. This resource is invaluable for researchers and practitioners in the computer vision domain, offering insights into traditional and interactive methods, as well as current challenges in the field. The repository is organized into sections covering datasets, challenges, traditional referring image segmentation, interactive referring image segmentation, referring video object segmentation, 3D referring segmentation, and referring image segmentation in specific domains. It is actively maintained and encourages contributions via pull requests or issue submissions, fostering a collaborative environment for advancing research in this specialized area.
DeepDanbooru
DeepDanbooru is an AI-based multi-label image classification system specifically designed for anime-style girl images. Built with TensorFlow, it provides a robust solution for estimating tags on visual content. The system is open-source and available on GitHub, allowing developers and researchers to access and modify its codebase. Users can prepare their own datasets or utilize tools like DanbooruDownloader to acquire data. It supports creating training projects, downloading tags from Danbooru, filtering datasets, and training custom models. The tool is ideal for those looking to categorize and analyze large collections of anime imagery with AI-driven tagging.
DeepEMD
DeepEMD offers a PyTorch implementation for few-shot image classification, based on the research paper "DeepEMD: Few-Shot Image Classification with Differentiable Earth Mover's Distance and Structured Classifiers." This tool is designed to address the challenge of learning from limited labeled data by employing the Earth Mover's Distance (EMD) as a metric for structural matching between image regions. It includes a cross-reference mechanism to mitigate issues from cluttered backgrounds and intra-class variations, and supports k-shot classification through a structured fully connected layer. DeepEMD has demonstrated significant performance improvements on benchmarks like miniImageNet, tieredImageNet, FC100, and CUB, without requiring extra training or testing data. The repository provides code for model pre-training, meta-training, and evaluation, along with options for different EMD solvers and model configurations.
Emotion-LLaMA
Emotion-LLaMA is an advanced open-source AI model designed for multimodal emotion recognition and reasoning, leveraging instruction tuning. It addresses the limitations of traditional single-modality approaches by seamlessly integrating audio, visual, and textual inputs through emotion-specific encoders. The model aligns features into a shared space and employs a modified LLaMA model, significantly enhancing both emotional recognition and reasoning capabilities. It was accepted at NIPS 2024 and has achieved top scores in various challenges, including the MER2024 Challenge. The project also includes the MERR dataset, which contains a large number of coarse-grained and fine-grained annotated samples across diverse emotional categories, enabling models to learn from varied scenarios and generalize to real-world applications.
image-text-localization-recognition
image-text-localization-recognition is an open-source GitHub repository that serves as a comprehensive resource list for scene text localization and recognition. It compiles a wide array of research papers and their corresponding code implementations, making it an invaluable tool for researchers and developers in the field. The repository is meticulously organized, allowing users to browse resources by institute or year, and includes tags for Scene Text Localization (STL) and Text Recognition (TR). It features contributions from prominent universities and technology companies, covering various advancements in detecting and recognizing text within images. The resource supports both English and Chinese content, broadening its accessibility and utility for a global audience.
tf-image-segmentation
tf-image-segmentation is an open-source image segmentation framework built upon Tensorflow and the TF-Slim library. Its core purpose is to streamline the process of converting various image segmentation datasets, including general, medical, and other types, into a unified and easy-to-use .tfrecords format for training. The framework includes a robust training routine that supports on-the-fly data augmentation, such as scaling and color distortion, ensuring effective model training. It also provides functionalities for evaluating model accuracy using common metrics like Mean IOU, Mean pixel accuracy, and Pixel accuracy. The framework offers pre-trained model files and definitions for models like FCN-32s, FCN-16s, and FCN-8s, initialized with weights from Image Classification models like VGG, making it a comprehensive solution for researchers and developers working on image segmentation tasks.
MagFace
MagFace is an open-source AI tool presented at CVPR 2021, designed for universal face representation in recognition and quality assessment tasks. It provides pre-trained models for various backbones and datasets, including iResNet100 and iResNet50 on MS1MV2 and CASIA-WebFace. Users can evaluate models on datasets like LFW, CFP, AgeDB, IJB-B, and IJB-C, and calculate face qualities by extracting features and magnitudes. The tool also supports basic and parallel training, with instructions for finetuning existing models. It's implemented in Python and Jupyter Notebook, making it accessible for developers and researchers in the field.
Prithvi 100M Multi Temporal Crop Classification Demo
Prithvi 100M Multi Temporal Crop Classification Demo is an AI tool hosted on Hugging Face Spaces, designed for multi-temporal crop classification. Users can upload a multi-temporal HLS GeoTIFF file containing 18 bands, which represents three dates with specific spectral channels. The application then processes this data using the Prithvi model to perform crop classification. It provides outputs in the form of three time-step images, showcasing the classification at different points in time, along with a combined land-use classification map. This demo is particularly useful for researchers and professionals in agricultural science and remote sensing who need to analyze land use and crop types over time.
SegMamba
SegMamba is an open-source project designed for 3D medical image segmentation, leveraging long-range sequential modeling with Mamba. The tool provides comprehensive code for the entire workflow, including pre-processing, training, inference, and metrics computation. It is particularly advantageous for its speed and memory efficiency in handling large medical imaging datasets. Researchers and developers can utilize SegMamba to analyze and segment medical images, contributing to advancements in medical diagnostics and treatment planning. The project also references related research in vision language models, indicating its potential for broader applications in AI for healthcare.
CCTV_YOLO
CCTV_YOLO is an open-source project designed for fast, real-time object detection with high-resolution output. It leverages the YOLOv5n6 model to efficiently process live video streams. The tool performs inference on low-resolution frames (320x180) to achieve high-speed processing, then maps and draws the detection results, including bounding boxes, onto the corresponding high-resolution frames. This approach ensures both rapid detection and detailed visual output. The project includes a Gradio interface for interactive real-time video stream processing and supports CUDA for optimized performance on compatible GPUs. It's ideal for applications requiring quick and accurate object detection from live camera feeds, such as surveillance or traffic monitoring.
Face Mesh Workflow
Face Mesh Workflow is a tool hosted on Hugging Face Spaces that allows users to upload an image, detect faces within it, and generate a 3D mesh. It offers the flexibility to adjust depth sources and customize the generated mesh using various sliders. The primary output is an OBJ file, which can then be downloaded for further use in other 3D modeling or animation software. This tool is particularly useful for those working with facial recognition, 3D modeling, or anyone needing to create 3D representations of faces from 2D images.
stardist
StarDist is an open-source Python implementation for object detection and segmentation using star-convex shapes in 2D and 3D images. It is particularly well-suited for applications in microscopy and histopathology, enabling precise cell and nuclei instance segmentation. The tool trains models to predict distances to object boundaries and probabilities, generating candidate polygons that are refined via non-maximum suppression. StarDist supports multi-class prediction, allowing objects to be classified into discrete categories. It also includes a submodule for computing common instance segmentation metrics, facilitating performance evaluation. Installation is straightforward with pip, and pretrained models are available for various image types.
FaceX-Zoo
FaceX-Zoo is a comprehensive PyTorch toolbox designed for state-of-the-art face recognition. It provides a robust training module that supports various supervisory heads and backbones, enabling users to develop and fine-tune advanced face recognition models. The toolbox also features a standardized evaluation module, allowing for consistent model assessment across popular benchmarks by simply editing a configuration. Additionally, FaceX-Zoo includes a simple yet fully functional face SDK for validating trained models and primary application development. The project emphasizes extensibility, allowing for easy upgrades and integration of new techniques in face-related domains beyond just recognition, such as face parsing and face lightning.
CLIP-RSICD Demo
The CLIP-RSICD Demo is a tool designed for exploring Contrastive Language-Image Pre-training (CLIP) models specifically applied to remote sensing image datasets. It provides a platform for users to analyze and gain insights into how CLIP models process and interpret satellite imagery. This tool is particularly useful for educational purposes, allowing students and practitioners to understand the practical application of CLIP in the remote sensing domain. Additionally, it serves as a valuable resource for researchers working with satellite data, offering a demonstration of CLIP's capabilities in this specialized field. The demo aims to bridge the gap between advanced AI models and their real-world applications in geospatial analysis.