ShypdShypd.ai
📉

Data & Analytics

Browsing page 28 of AI tools for Data Cleaning & Prep in Data & Analytics. Sorted by confidence score — our independent quality rating.

Tumult

Tumult

58%

Tumult Analytics is a robust Python library designed for computing aggregate queries on tabular data with differential privacy. Its interface is familiar to users of SQL or PySpark, making it easy to adopt. The tool is feature-rich, supporting a growing list of aggregation functions, data transformation operators, and privacy definitions. Built and maintained by differential privacy experts, Tumult Analytics is scalable, leveraging Spark to handle very large datasets. It is used in production at institutions like the U.S. Census Bureau. The library provides comprehensive documentation, tutorials, and deployment guides for various environments, including AWS Glue and Databricks.

Tonic's GOT OCR

Tonic's GOT OCR

58%

Tonic's GOT OCR is an Optical Character Recognition (OCR) tool available as a Hugging Face Space, developed by UCAS, Beijing. This application allows users to upload images and extract text in multiple formats. Users can choose to receive the extracted text as simple plain text, formatted HTML, or perform more precise region-specific extraction using bounding boxes or color-based selection. The tool is designed to provide flexibility in how text is read and presented, catering to different needs for text retrieval from visual sources.

The Synthetic Data Vault

The Synthetic Data Vault

57%

The Synthetic Data Vault (SDV) offers a comprehensive, source-available software ecosystem designed for generating high-quality synthetic data. It leverages AI models to learn the statistical properties and patterns from real datasets, then produces synthetic data that mirrors these characteristics without revealing any sensitive original information. This ensures privacy and compliance while providing data suitable for development, testing, and analysis. SDV includes tools for developing generative models, assessing the quality and utility of synthetic data, and benchmarking different synthetic data generation techniques. It's an invaluable resource for data scientists and developers working with sensitive information.

TIMi Suite

TIMi Suite

57%

The TIMi Suite is a comprehensive data analytics solution designed to cover all data analysis needs, from ETL and business intelligence to predictive modeling and process automation. It integrates four core tools: Anatella for data integration, Stardust for 3D VR segmentation and visualization, Modeler for real-time AUTO-ML, and Kibella for unlimited self-service Business Intelligence. The suite allows users to manipulate large datasets, compute and display complex KPIs, and create accurate predictive models efficiently. It supports various solutions including customer experience management, strategic decision-making, risk management, and industrial process optimization. TIMi is known for its speed, claiming to be 10 to 1000 times faster than competing software on equivalent hardware, and offers both cloud and on-premise deployment options.

SFrame

SFrame

57%

SFrame is an open-source library that offers scalable tabular (SFrame, SArray) and graph (SGraph) data structures, specifically designed for out-of-core data analysis and machine learning tasks. It provides a robust solution for handling large datasets that exceed available memory. Key features include a scalable, column-compressed, disk-backed dataframe, support for strictly typed and weakly typed columns, as well as specialized types like Image. It also offers uniform support for missing data, query optimization, and lazy evaluation. SFrame includes both a C++ API (gl_sarray, gl_sframe, gl_sgraph) for direct native access and a Python API (SArray, SFrame, SGraph) for indirect access via an interprocess layer. While the repository is deprecated, its functionality has been integrated into Turi Create.

Volatile AI

Volatile AI

57%

Volatile AI specializes in molecule and volatile organic compound analytics, offering field instrumentation and software for end-users. The company focuses on volatilomics fingerprinting, enabling rapid chemical analysis without complex sample preparation. Their technology includes accessible gas chromatography and custom portable instruments designed for various applications such as biopharmaceutical fermentation monitoring, asphalt variant profiling, and whiskey maturation monitoring. Volatile AI positions itself as a leading research company in this field, providing solutions for understanding molecular composition outside of traditional lab settings.

Foundational

Foundational

57%

Foundational is a proactive data governance platform designed for the AI era, analyzing source code to prevent data incidents before they reach production. It offers cross-platform data lineage derived from source code analysis, pre-merge impact analysis within GitHub pull requests, and automated data contracts with quality enforcement. The platform also supports AI governance and model traceability, analyzing source code across SQL, Spark, Python, dbt, and BI tools. Foundational provides complete, always up-to-date lineage with column-level precision and transformation tracking, integrating with over 120 data tools. It aims to simplify cross-platform complexity and accelerate delivery by embedding governance directly into developer workflows.

Anote

Anote

57%

Anote is an applied AI research company specializing in human-centered AI for data solutions. The platform offers comprehensive services for data labeling, training, prediction, and evaluation, aiming to provide high-quality datasets and evaluations. Anote caters to enterprises, federal clients, and model providers, helping them build and refine their AI models. By focusing on these core aspects, Anote ensures that AI systems are developed with precision and accuracy, supporting a wide range of data-driven initiatives. The company's approach emphasizes the critical role of human intelligence in enhancing AI performance and reliability.

Groupt

Groupt

57%

Groupt simplifies data categorization and analytics, transforming complex arrays of information into clear, actionable insights for enhanced decision-making. Users can upload CSV files containing qualitative data, such as survey responses or user feedback, to receive visualizations of response groupings, categories, and insights. The platform boasts high accuracy in its AI categorization, making it a reliable tool for understanding customer insights. Groupt offers a free trial with a 100-row CSV limit, a Pro plan for lifted row limits and full AI analysis, and an Enterprise option for high-volume, long-form CSVs with advanced AI analysis. It supports only CSV file formats for now and allows users to start for free without a credit card by logging in with Google.

TerrOïko

TerrOïko

57%

TerrOïko is an innovative company specializing in ecological engineering and data science, founded in 2012 by two doctors in ecology. They develop new digital technologies applied to biodiversity, leveraging the latest scientific advancements in ecology, data processing, and computer science. TerrOïko provides data-driven solutions for the study and management of biodiversity to clients involved in territorial planning, local authorities, natural space managers, and consulting firms. Their services are applicable across various sectors, including infrastructure, industrial and urban projects, construction, territorial planning, renewable energies, nature conservation programs, public policy evaluation, and international scientific cooperation.

Claro AI

Claro AI

57%

Claro AI is an execution layer for product and supplier data, designed to continuously improve operational data quality. It consolidates product and supplier data, resolves duplicates, fills missing attributes, and continuously detects and fixes catalog issues as new data arrives. The platform helps retail and marketplaces prepare product data for launch and discovery at scale, and assists industrial distributors in turning fragmented supplier data into operational catalog systems. Claro automatically resolves duplicates, standardizes supplier feeds, and validates product data, ensuring catalogs remain consistent across systems. It integrates with ERP, PIM, APIs, and cloud sources, structuring complex inputs into clean, usable records. The solution is customizable to specific taxonomies, workflows, validation rules, and integrations, providing enterprise-grade product data without manual work.

DeepMiner

DeepMiner

57%

DeepMiner provides AI-driven solutions to help organizations make better decisions by leveraging all their data. The platform integrates with existing infrastructure to handle complex, multi-dimensional data, supporting strategic decision-making. It focuses on reducing inefficiency by cleaning and organizing data, delivering contextual insights by analyzing multiple datasets, and enhancing search and discovery. DeepMiner is heavily invested in Research and Development, constantly innovating to provide novel solutions. It offers services like Data Spine for government economic policy and Golden Record for economic development agencies, centralizing and structuring data for informed decisions and sustainable growth.

Twine AI

Twine AI

57%

Twine AI offers comprehensive services for building and improving AI models through trusted audio, image, and video datasets. They provide global data collection, annotation, and labeling for speech, vision, and beyond, leveraging a network of over 1 million global experts. The platform supports custom dataset creation, expert annotation, and human evaluation, ensuring high-quality training data for various AI applications. Twine AI also offers model evaluation services with human experts in the loop, off-the-shelf datasets through their marketplace, and AI/ML consulting. Their services are designed to help adapt any model to specific use cases, with a strong focus on ethical data collection, bias reduction, informed consent, and data provenance.

akunah

akunah

57%

Akunah is a global medical technology and software company that leverages AI to create personalized solutions for healthcare. The platform specializes in Patient Reported Outcome Measures (PROMs), clinical management software, and AI applications in healthcare. It also focuses on shoulder replacement and surgical pre-operative planning, alongside medical education. Akunah's products are powered by 'arya', an AI engine designed to deliver smarter, more intuitive experiences. With over 100,000 patients onboarded across 14 countries and managing over 10 million secure health data points, Akunah emphasizes security and compliance, being GDPR & HIPAA compliant, ISO 13485 & MDSAP certified, and FDA & TGA aligned. The company aims to serve humanity’s long-term well-being through its AI innovations.

ynnov

ynnov

57%

Ynnov specializes in transforming ideas and visions into reality through customized AI solutions and expert support, particularly for Africa and developing regions. The platform offers AI-driven solutions for challenges in economics, health, education, and agriculture, aiming to bridge the digital divide and reduce inequalities. Key services include R&D for adaptive innovations, transformative AI onboarding for businesses, data intelligence leveraging collection, analytics, and visualization, and custom app development. Ynnov also focuses on social equity and inclusion through initiatives like Impactis, supporting skill development and entrepreneurship. Their data intelligence capabilities include assessing children through AI-driven education surveys, mapping households with geolocation, and maintaining a continuously growing data repository for actionable insights.

aster

aster

57%

ASTER is an open-source attentional scene text recognizer designed to accurately recognize cropped text within natural images. It incorporates a flexible rectification mechanism to enhance recognition accuracy, particularly for challenging text orientations. The tool is implemented using TensorFlow r1.4 and reuses code from the TensorFlow Object Detection API, with a PyTorch port also available. ASTER provides scripts for data preparation, training, and on-the-fly evaluation, making it suitable for researchers and developers working on scene text recognition tasks. It includes a demo program with pretrained models for easy experimentation and offers state-of-the-art results in text recognition benchmarks.

GemmaStat

GemmaStat

57%

GemmaStat offers a streamlined solution for statistical analysis, allowing users to upload datasets and receive instant insights. The platform generates stunning visualizations and actionable reports, making complex data understandable. It eliminates the need for specialized software, simplifying the process of data interpretation for a broad audience. GemmaStat focuses on transforming raw data into clear answers, providing a user-friendly interface for quick analysis and reporting.

Liva AI

Liva AI

57%

Liva AI is a tool designed for processing audio and video data, enabling users to analyze and manipulate various audio and video files. While specific functionalities are not detailed on the website, its core purpose revolves around data processing within these media types. This suggests capabilities that could range from basic file management and conversion to more advanced analytical tasks, potentially leveraging AI for pattern recognition, content indexing, or enhancement. The tool aims to provide a platform for users to interact with and derive insights from their audio and video assets.

Data-Centric AI Community

Data-Centric AI Community

57%

The Pokies Net Casino is an online gaming platform specifically designed for the Australian market, offering an extensive library of online pokies, classic pokies, table games, and live dealer games. The platform emphasizes a user-friendly interface with straightforward navigation and mobile optimization, ensuring seamless gameplay across various devices. Key features include immediate-play capability, streamlined profile management, and rapid transaction methods such as The Pokies Net PayID. It provides appealing bonuses, free spins, and loyalty rewards to enhance player engagement. While operating as an international platform outside local regulatory frameworks, it promotes responsible gaming practices and employs certified random number generators for fair play.

torchio

torchio

57%

TorchIO is a Python package designed for medical imaging processing within deep learning applications, particularly those built with PyTorch. It offers a comprehensive set of tools for efficiently handling 3D medical images, covering tasks such as reading, preprocessing, sampling, and augmentation. A key differentiator is its inclusion of both typical computer vision operations, like random affine transformations, and domain-specific transformations. These specialized transforms simulate real-world artifacts such as intensity inhomogeneities in MRI or k-space motion artifacts, which are crucial for robust AI model training in medical contexts. The package aims to streamline the development of AI solutions in healthcare by providing robust data handling and augmentation capabilities.

Global Data Excellence

Global Data Excellence

57%

Global Data Excellence (GDE) offers a Swiss AI platform called DEMS (Data Excellence Management System) designed for sustainable corporate governance. This platform allows users to dialog with it in natural language, providing a transversal intelligence that combines linguistics and computer engineering. DEMS transforms business management by offering ethical AI that harmonizes an entire ecosystem while complying with various regulations. Key features include multilingual support, prescriptive and dynamic capabilities, a 360° view of the business without requiring data integration, and automated decision-making. The platform ensures data remains at its source, offers automated synchronization, and provides predictive solutions with prescriptive actions based on contextual data, aiming to maximize revenues and reduce costs.

Perle

Perle

57%

Perle is an AI training data platform that leverages a global network of 15,000+ vetted experts across 70 countries and 27 languages to provide high-quality data annotation, evaluation, and red-teaming services. It focuses on bringing human expertise into AI training loops, offering solutions for complex AI models. Perle's services include programs for training data, evaluation, labeling, and red-team coverage, as well as augmenting staff with domain specialists. They also provide embodied data for robotics, capturing real-world manipulation for training. Perle addresses challenges like teaching robots in human environments, stress-testing models, and auditing for bias and safety.

subword-nmt

subword-nmt

57%

subword-nmt is a powerful open-source tool designed for unsupervised word segmentation, a crucial step in neural machine translation (NMT) and text generation. It offers preprocessing scripts that enable users to segment text into subword units, facilitating the reproduction of experiments in NMT. Key features include the ability to learn byte pair encoding (BPE) from training data, apply BPE to new text, and segment rare words into character n-grams. The tool also supports advanced functionalities like BPE dropout for improved system quality, glossaries to prevent segmentation of specific terms, and byte-level BPE for more granular control. It is particularly useful for researchers and developers working with multilingual systems, offering best practice advice for consistent segmentation across languages.

Chisquares

Chisquares

57%

Chisquares is an all-in-one platform that simplifies the entire research workflow, from survey design and data collection to analysis and manuscript writing. It addresses common research challenges by providing an integrated suite of tools, eliminating the need for multiple software. The platform features automated writing and formatting, smart data analysis that provides instant insights without errors, and real-time collaboration for co-authors. It also supports offline data collection, making it suitable for remote areas without internet access. Chisquares caters to a diverse audience including Master's and PhD students, corporations, professors, public health practitioners, educators, and human resources professionals, offering tailored solutions like quiz modes for teaching and automated training coordination.