ShypdShypd.ai
📉

Data & Analytics

Browsing page 23 of AI tools for Data Pipelines & Integration in Data & Analytics. Sorted by confidence score — our independent quality rating.

NVTabular

NVTabular

58%

NVTabular is a powerful feature engineering and preprocessing library specifically designed for tabular data, enabling the manipulation of terabyte-scale datasets. It accelerates computation on the GPU using the RAPIDS Dask-cuDF library, making it ideal for training deep learning-based recommender systems. As a core component of NVIDIA Merlin, it seamlessly integrates with other Merlin tools like Merlin Models, HugeCTR, and Merlin Systems to provide end-to-end acceleration for recommender systems on the GPU. NVTabular addresses challenges such as processing huge datasets, managing complex data pipelines, and overcoming input bottlenecks, allowing data scientists and ML engineers to focus on data transformation rather than scaling issues. It significantly reduces the time required for feature engineering and preprocessing, with reported completion times of 13 minutes on a single V100 GPU and 3 minutes on a DGX-1 cluster for the Criteo 1TB Click Logs Dataset.

streaming

streaming

58%

Streaming is a data streaming library built by MosaicML designed to make training on large datasets from cloud storage as fast, cheap, and scalable as possible. It is specifically optimized for multi-node, distributed training for large models, ensuring correctness, performance, and ease of use. The library supports various data types including images, text, video, and multimodal data, and is compatible with major cloud storage providers like AWS, OCI, GCS, Azure, and any S3 compatible object store. It integrates seamlessly into existing training workflows as a drop-in replacement for PyTorch IterableDataset. Key features include seamless data mixing, true determinism for reproducible training runs, instant mid-epoch resumption, high throughput, and equal convergence compared to local disk solutions.

Valo Health

Valo Health

58%

Valo Health is a technology company revolutionizing drug discovery and development by integrating human and machine intelligence. Their approach combines real-world data, AI, advanced causal inference techniques, and predictive chemistry to create a powerful engine for accelerating life-changing cures. Valo harnesses AI to find patterns in large-scale human data, identify novel disease targets, and rapidly engineer novel small molecules through human causal biology and closed-loop chemistry. This deep integration across biology, chemistry, and engineering disciplines allows them to explore vast chemical spaces and advance promising lead series into candidates, ultimately aiming to reduce costs and failure rates in drug development.

Principal

Principal

58%

Brisa, previously known as Principal, is a comprehensive personal wealth operating system designed to give users a holistic view of their financial health. It allows secure connection to over 12,000 banks, brokerages, and other financial institutions to automatically import transaction data. Users can also manually add accounts for cash, investments, loans, credit cards, and real estate. Key features include net worth tracking, multi-currency support, home valuation using Zestimate, and retirement planning. Brisa prioritizes security with industry-standard encryption and strict access controls, storing all data securely within North America. It is currently a desktop-only platform optimized for larger screens.

Abundant

Abundant

58%

Abundant specializes in creating frontier reinforcement learning (RL) environments and datasets. This platform is designed to support leading AI laboratories and large enterprises by providing the foundational data and simulation environments necessary for advanced RL model training and development. Based in San Francisco, Abundant handles billions of tokens monthly, indicating its capacity for large-scale data processing and its critical role in the AI ecosystem. The tool focuses on delivering robust and scalable solutions for complex RL challenges, enabling organizations to push the boundaries of AI research and application.

SkyFi

SkyFi

58%

SkyFi is an Earth intelligence platform offering instant access to satellite imagery and geospatial analytics. Users can order and task satellite imagery and SAR from top providers, access a vast archive of data, and download geospatial data and analytics through a unified platform. Key features include advanced analytics like object detection and hyperspectral signature analysis, commercial imagery tasking, and access to open data. SkyFi also provides specialized access to constellations like Vantor and Planet SkySat, as well as Maritime AIS data and ICEYE US Direct SAR intelligence, catering to diverse needs from agriculture to military and defense.

Buildify

Buildify

58%

Buildify is Canada's leading data platform for new and pre-construction home listings, offering a powerful Data Feed API designed for REALTOR® and brokers. It enables instant integration of live listings into real estate websites, providing access to over 150 property attributes including specifications, pricing, and availability. The platform sources and verifies information directly from a vast network of builders and agents, ensuring daily updates and accurate data. Buildify aims to simplify the process of selling presale homes by providing a comprehensive and reliable source of fragmented pre-construction data, empowering real estate professionals with full control over their website's interface and user experience.

Tracardi

Tracardi

58%

Tracardi is a free open-source Customer Data Platform (CDP) designed to help businesses manage customer experiences and drive sales through automated digital interactions. It excels in unifying customer data from various sources, breaking down data silos, and creating comprehensive customer profiles for personalized engagement. The platform offers real-time data processing and a wide array of plugin options for workflow customization. Tracardi supports multi-tenant setups and optimizes resource use, making it a cost-effective solution for organizations of all sizes. Its open-source nature ensures transparency, continuous improvement, and seamless integration with existing company infrastructure, allowing for flexible deployment and automation without requiring in-house developers.

Fatala Digital House

Fatala Digital House

58%

Fatala Digital House specializes in digital transformation for Small and Medium-sized Enterprises (SMEs) and mid-sized companies, focusing on data and AI. They assist organizations in improving performance and optimizing costs by placing data at the core of their strategy. Services include data strategy consulting, custom solution development and deployment leveraging Web, Data Engineering, and Data Science expertise, and training to build capable teams. Fatala offers a data diagnostic to identify potential areas for improvement, aiming to increase revenue, achieve operational excellence through process automation and algorithmic implementation, and enhance business intelligence for better decision-making. Their teams are based in Africa and Europe.

YouData.ai

YouData.ai

58%

YouData.ai provides a developer-first platform for AI data engineering, designed to connect and prepare enterprise data for AI applications. It ingests messy databases, automatically fixes schemas, and syncs data to Vector DBs with sub-50ms latency. The platform features self-healing pipelines that adjust to schema drifts upstream, ensuring no downtime. With over 200 integrations, it connects natively to various data sources like Postgres, Snowflake, and MongoDB. YouData.ai offers an SDK to manage infrastructure, rate limiting, and schema validation, eliminating the need for users to manage Kafka clusters or Airflow instances. It is SOC2 Type II and HIPAA compliant, with deployment options on-premise or in a managed VPC, and granular observability via Datadog and New Relic.

Bitstrapped

Bitstrapped

58%

Bitstrapped, in collaboration with Google and TELUS Digital, specializes in leveraging AI and automation to help organizations modernize their infrastructure and enhance customer experiences. Their services include Google Cloud migration and management, AI-driven contact center modernization, and the implementation of specialized analytics tools. They offer expertise in CCaaS with Google Contact Center AI (CCAI) and Google Gemini Enterprise for Customer Experience (GECX) solutions, enabling advanced NLU, multimodal customer experiences, and real-time AI assistance for agents. Bitstrapped focuses on delivering scalable, secure solutions that streamline operations, empower teams, and unlock new opportunities across various industries.

10h11

10h11

58%

10h11 is a data consulting company specializing in custom data dashboards and analytics solutions for enterprise clients across France and Europe. They provide end-to-end data services, including data collection (API integrations, ETL pipelines, database design), business intelligence and analytics (custom KPIs, predictive models, automated reporting), and interactive dashboard design and visualization. 10h11 also offers process automation and real-time data systems, transforming raw business data into actionable insights. They build custom solutions tailored to specific client needs, leveraging tools like Looker Studio, Power BI, and Tableau, and integrating with existing systems such as ERPs, CRMs, and IoT devices. With over 14 years of experience, 10h11 has completed 400+ projects for 120+ clients across 14 industries.

YData

YData

58%

YData Fabric is a comprehensive platform designed to empower data scientists by improving data quality and accelerating AI model development. It offers robust features such as automated data profiling for quick exploratory data analysis, an interactive data catalog to track changes and drifts, and advanced synthetic data generation to protect sensitive information and augment datasets. The platform also provides scalable data preparation pipelines for cleaning, transforming, and orchestrating data flows, significantly reducing time-to-market for AI solutions. YData is trusted by a large community of data scientists and is recognized for its accuracy, scalability, and enterprise readiness in synthetic data.

RealEstateAPI

RealEstateAPI

58%

RealEstateAPI offers a robust property data API designed to empower developers and businesses in the prop-tech sector. It provides access to a wide array of real estate data, enabling the creation of innovative applications and services. The API is built for frictionless integration, making it easy for developers to incorporate comprehensive property information into their platforms. With a focus on supporting future unicorns, RealEstateAPI aims to be a foundational component for next-generation real estate technology, offering the world's most expressive property data APIs.

Banza App

Banza App

58%

Banza App is a personal AI twin that learns your individual preferences, anticipates your needs, and prioritizes your privacy. It aims to provide a highly personalized AI experience by understanding your unique tastes and behaviors. The platform is designed to offer a bespoke AI companion that adapts to you, ensuring that your data is handled with respect and confidentiality. This tool focuses on creating a truly personal AI that evolves with you, offering a tailored and intuitive interaction without compromising your privacy.

Sahaj Software

Sahaj Software

58%

Sahaj Software is an artisanal technology services company focused on delivering purpose-built solutions through intelligent engineering. They specialize in AI, ML, data engineering, and platform engineering, helping organizations achieve data-led transformation. Their approach emphasizes simplicity, first principles thinking, and lean cohesive teams to solve complex problems. Sahaj offers technology advisory services, including tech due diligence and assessment, to provide informed decision-making and better risk management. They are committed to full knowledge transfer, ensuring clients are not dependent on Sahaj post-implementation. The company's ethos is rooted in trust, respect, curiosity, and craftsmanship, aiming to inspire brilliance and reduce exploitation.

Truata Calibrate

Truata Calibrate

58%

Trūata Calibrate is a cloud-native software solution designed to help organizations manage data pipelines with privacy-centric data management. It empowers businesses to operationalize privacy-compliant data pipelines quickly, allowing teams to work with data responsibly and confidently. The platform utilizes intelligent automation for fast and effective risk measurement and mitigation via a centralized dashboard. It scans data assets to identify direct and indirect privacy risks, performs targeted de-identification for safe data sharing, and creates an auditable trail of compliance. Trūata Calibrate also provides dynamic recommendations for data transformation and privacy-utility impact simulations, ensuring data can be effectively transformed for safe use across the business ecosystem.

EasyML

EasyML

58%

EasyML is a general-purpose dataflow-based system designed to ease the process of applying machine learning algorithms to real-world tasks, especially on distributed platforms such as Hadoop and Spark. It formulates learning tasks as directed acyclic graphs (DAGs), where each node represents an operation or algorithm. The system includes a distributed machine learning library with algorithms for pre/post-processing, data transformation, feature generation, and performance evaluation, primarily based on Spark. A GUI-based studio allows users to create, configure, submit, monitor, and share machine learning processes using a drag-and-drop interface. EasyML also offers a cloud service for executing tasks, scheduling nodes automatically on Linux, Spark, or Map-Reduce based on their implementation. Users can upload their own algorithm packages and datasets.

4iG Space and Defence Technologies

4iG Space and Defence Technologies

58%

4iG Space and Defence Technologies is Hungary’s first privately-owned large enterprise specializing in cutting-edge solutions for the space and defence sectors. Leveraging the expertise of the 4iG Group and its portfolio companies, the company is dedicated to building a connected ecosystem. Their offerings span from comprehensive satellite systems and mission operations to advanced Unmanned Aerial Vehicle (UAV) and Counter-UAV (C-UAV) technology. Additionally, they provide sophisticated geospatial data solutions, aiming to redefine the space and defence industry through integrated and innovative technological advancements.

GreenM

GreenM

58%

GreenM specializes in deploying private AI solutions tailored for healthcare organizations, focusing on HIPAA/GDPR compliance and data security. Their services include an AI Launchpad for rapid prototype development within 6 weeks, a Private AI Foundation for secure infrastructure, and Unified Health Data solutions to create AI-ready data layers. GreenM integrates AI directly into existing clinical workflows, such as EHR systems, without replacing current platforms, and offers agentic AI systems for documentation, triage, and operational tasks. They cater to a wide range of healthcare providers, from specialty clinics to hospitals, ensuring AI operates within the client's private cloud or on-premise environment, maintaining full control over sensitive data.

Neatables

Neatables

58%

Neatables is a dedicated online platform designed for efficient paddle court booking. It provides a straightforward interface for users to select a date and view available time slots for paddle courts. The system streamlines the reservation process, making it easy to book a court with just a few clicks. Additionally, Neatables includes an admin section, suggesting capabilities for managing bookings and court availability. The platform also offers a convenient option to send booking confirmations or details via WhatsApp, enhancing communication and user experience. This tool is ideal for paddle clubs, sports centers, or individuals looking to manage court reservations effectively.

Cloudera

Cloudera

58%

Cloudera offers a hybrid data and AI platform designed to bring AI capabilities to data wherever it resides, including public clouds, private data centers, and edge locations. The platform unifies 100% of an organization's data with an Open Data Lakehouse, powered by Apache Iceberg, to deliver AI-ready data for real-time insights. Key features include Enterprise AI, Data in Motion, and a Unified Data Fabric, ensuring consistent cloud experience, security, and governance across hybrid environments. Cloudera empowers organizations to deploy and scale any AI model, process diverse data types, and make faster decisions from real-time data, ultimately transforming decision-making and boosting operational efficiency.

Hylos

Hylos

58%

Hylos is a distributor sales intelligence platform designed to automate sell-through data collection and deliver real-time regional insights. It eliminates the need for manufacturers to chase distributor reports by providing a secure, frictionless way for distributors to submit sales data. The platform unifies diverse data formats and SKU codes through autonomous data mapping, creating a single source of truth for sales reps, directors, and leadership. Hylos offers live territory health monitoring, automated sell-through tracking, and AI-driven growth signals, enabling manufacturers to spot regional surges and dips, flag opportunities, and manage inventory risks proactively. This allows businesses to move from manual, reactive reporting to strategic, data-driven decision-making, ensuring distributors have the right stock at the right time.

Additive Catchments

Additive Catchments

58%

Additive Catchments is dedicated to restoring river health by providing advanced infrastructure for water quality monitoring. The platform utilizes sensor networks to deliver real-time data, offering transparent insights and actionable intelligence crucial for effective water management. It aims to give rivers a voice by integrating environmental data, civic infrastructure, and pollution monitoring to create a comprehensive river health index. This tool is designed to support water governance and catchment management, enabling stakeholders to make informed decisions and build a sustainable future where rivers, communities, and society can thrive.