ShypdShypd.ai
📉

Data & Analytics

Browsing page 15 of AI tools for Statistical & Scientific in Data & Analytics. Sorted by confidence score — our independent quality rating.

Awesome-Hyperbolic-Representation-and-Deep-Learning

Awesome-Hyperbolic-Representation-and-Deep-Learning

58%

Awesome-Hyperbolic-Representation-and-Deep-Learning is a comprehensive repository of academic papers focusing on hyperbolic embedding, hyperbolic models, and their applications in deep learning. This resource is meticulously organized into categories such as core methods, domain applications, and task-oriented settings, making it easy for researchers to navigate the taxonomy. It highlights the natural advantages of hyperbolic spaces for processing data with tree-like structures or power-law distributions due to their exponential growth property. The repository is continuously updated with the latest research developments, including papers from major conferences like NeurIPS, ICLR, CVPR, and ACL, ensuring users have access to cutting-edge information in the field.

WhiteLab Genomics

WhiteLab Genomics

58%

WhiteLab Genomics is an AI-driven platform founded in 2019, backed by Y-Combinator, dedicated to accelerating the development of life-saving genomic medicines. The platform utilizes its proprietary AI-led framework, ALFRED (AI-Led Framework for Rational Exploration in Drug Design), which combines advanced algorithms, curated databases, and cutting-edge computational biology. This technology optimizes therapeutic candidates across diverse modalities such as AAV, lentivirus, and nanoparticles, focusing on safety and efficacy. WhiteLab Genomics supports various applications including gene and RNA-based therapies, target receptor identification, viral vector design, non-viral vector design, payload design, and bioproduction optimization. It aims to significantly accelerate and de-risk drug development processes.

Datarock

Datarock

58%

Datarock offers production-ready, explainable AI solutions specifically designed for the mining industry. It standardizes geological characterization, accelerates data updates, and seamlessly integrates with existing software and databases. The platform helps reduce data generation and processing costs by up to 90%, having analyzed over 27 million meters of core since 2020. Datarock addresses industry challenges such as outdated models, fragmented workflows, and the need for efficient exploration in deeper search spaces. It frees up expert geologists from low-value tasks by automating repeatable geological processes at scale, allowing them to focus on interpretation and critical decisions. Datarock's solutions span the entire mining cycle, from exploration to rehabilitation, offering prospectivity modeling, automated logging, property prediction, and mineralogy modeling.

Pluto Bio

Pluto Bio

58%

Pluto Bio offers a collaborative multi-omics platform designed to accelerate research and drug discovery. It provides a unified workspace for preclinical and translational strategy, enabling multi-site, interdisciplinary collaboration in real-time. The platform centralizes data visualization with a no-code canvas, allowing users to explore data and test scientific hypotheses quickly while maintaining end-to-end traceability. Pluto Bio supports a wide range of biological assays, including scRNA-seq, RNA-Seq, ChIP-seq, ATAC-seq, and Spatial Transcriptomics, with pipelines for custom assays. It helps organize experiments, plots, data, and files in a secure cloud environment, facilitating target identification, biomarker discovery, and mechanism tracking.

lag-llama

lag-llama

58%

Lag-Llama is the first open-source foundation model specifically designed for probabilistic time series forecasting. It provides robust capabilities for zero-shot forecasting on datasets of any frequency and prediction length, making it highly versatile. Users can also finetune the model on their specific datasets to achieve maximum performance, with recommendations provided for optimizing hyperparameters like context length and learning rate. The project includes scripts for replicating pretraining and finetuning experiments from the associated paper, ensuring reproducibility. Lag-Llama aims to advance the field of time series analysis by offering a powerful, adaptable foundation model.

UADAMAGE

UADAMAGE

58%

UADAMAGE is an AI and GIS company specializing in geospatial analytics for automatic damage monitoring. The platform leverages satellite and drone imagery alongside advanced computer vision to assess damage following war or natural disasters. It transforms these diverse data inputs into actionable insights, aiding governments, organizations, and partners in making data-driven decisions. UADAMAGE's core focus areas include infrastructure recovery, demining efforts, and environmental monitoring, providing critical information for post-disaster assessment and planning.

EconML

EconML

58%

EconML is a Python package developed by Microsoft Research as part of the ALICE (Automated Learning and Intelligence for Causation and Economics) project. It provides a toolkit for estimating heterogeneous treatment effects from observational data, integrating advanced machine learning techniques with econometrics. The package is designed to measure the causal effect of treatment variables on an outcome, controlling for various features, and how this effect varies. It supports methods like Double Machine Learning, Causal Forests, Orthogonal Random Forests, and Meta-Learners, offering flexibility in modeling effect heterogeneity while preserving causal interpretation and providing confidence intervals. EconML is built on standard Python packages for Machine Learning and Data Analysis, making it accessible for data scientists and researchers.

PyRCA

PyRCA

58%

PyRCA is a Python machine learning library designed to facilitate root cause analysis (RCA) in complex IT environments, particularly those utilizing microservices architectures. It offers a comprehensive suite of state-of-the-art RCA algorithms, primarily focusing on metric-based analysis. Users can identify anomalous metrics using methods like ε-diagnosis or pinpoint root causes based on topology/causal graphs through techniques such as Bayesian inference and Random Walk. The library also provides a convenient tool for building and refining causal graphs from time series data and domain knowledge, simplifying the development of graph-based RCA solutions. PyRCA supports various methods including ε-Diagnosis, Bayesian Inference-based RCA, Random Walk-based RCA, Root Cause Discovery, and Hypothesis Testing-based RCA, with plans to expand to trace and log-based RCA in the future. It also includes a benchmark for evaluating different RCA methods.

rep

rep

58%

REP, or Reproducible Experiment Platform, is an ipython-based environment designed for conducting data-driven research with an emphasis on consistency and reproducibility. It provides a unified Python wrapper for several machine learning libraries, including Sklearn, XGBoost, and Theanets, allowing users to work with a consistent interface. Key features include parallel training of classifiers on clusters, classification/regression reports with interactive plots, and smart grid-search algorithms with parallel execution. REP also supports research versioning using Git and offers pluggable quality metrics for classification. It aims to extend scikit-learn by providing a better user experience and tools for meta-algorithm design, making it a valuable resource for data scientists and researchers.

TALENT

TALENT

58%

TALENT is a comprehensive, open-source toolkit and benchmark designed to enhance model performance on tabular data. It integrates a wide array of advanced deep learning models (over 35), classical algorithms (more than 10), and efficient hyperparameter tuning capabilities. The platform boasts an extensive collection of 300 diverse tabular datasets, covering various task types, size distributions, and domains. TALENT offers robust preprocessing features for normalization and encoding, supports diverse metrics, and is highly customizable, allowing users to easily add new datasets and methods. It caters to both novice and expert data scientists seeking to optimize learning from tabular datasets.

Geocalc MCP

Geocalc MCP

58%

Geocalc MCP is an AI-powered geospatial tool developed during the Agents-MCP-Hackathon, designed to execute various geo-calculations independently, without relying on external third-party APIs. This application offers core functionalities such as converting addresses into precise geographical coordinates, calculating distances between points, and planning optimal routes. Users can also visualize these calculations and routes on maps, and identify nearby points of interest. It provides a self-contained solution for geospatial computations, making it suitable for projects requiring independent geo-processing capabilities.

Quickcount from photos

Quickcount from photos

58%

QuickCount is an intuitive AI tool designed to streamline the process of counting objects from images. It offers fast and accurate counting capabilities, able to process hundreds of objects in as little as one second. The platform supports multiple statistical object types, with ongoing updates to expand its versatility. QuickCount emphasizes ease of use, making it accessible for a wide range of users. Additionally, it provides the functionality to save statistical results, facilitating sharing and record-keeping. This tool is ideal for anyone needing to quickly quantify items within a visual context.

USearch

USearch

58%

USearch is a fast, open-source search and clustering engine designed for vectors and arbitrary objects. It offers a highly optimized HNSW implementation, boasting up to 10x faster performance than FAISS. The engine supports a wide array of programming languages including C++, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram, making it broadly compatible across different development environments. Key features include SIMD-optimized and user-defined metrics with JIT compilation, hardware-agnostic half-precision support (bf16, e5m2, i8), and the ability to view large indexes from disk without loading them into RAM. USearch also provides heterogeneous lookups, on-the-fly deletions, and binary Tanimoto/Sorensen coefficients for specialized applications like genomics. Its compact codebase and native bindings contribute to lower call latencies and faster deployments.

Text-Classification

Text-Classification

58%

Text-Classification is an open-source project that provides implementations of several state-of-the-art text classification models using TensorFlow. It supports various models including Attention is All You Need, IndRNN, Attention-Based Bidirectional LSTM, Hierarchical Attention Networks, Adversarial Training Methods, Convolutional Neural Networks, and RMDL. The tool is designed for developers and researchers working on text classification tasks, particularly on datasets like DBpedia. It requires Python 3 and TensorFlow 1.4 or later, with updated code for preprocessing using `tf.keras.preprocessing.text`. The repository also includes performance metrics for each implemented model, offering a valuable resource for comparing different approaches.

Lottif

Lottif

58%

Lottif is a platform designed for smart lottery game generation and analysis. It leverages historical data, patterns, and trends to generate statistical picks for upcoming draws, providing explanations for the generated combinations. The tool offers various plans with different generation quotas and analytical modules, including basic analysis, heatmaps, draw DNA, real probability calculations, and combo analysis. Users can set historical ranges for analysis, combine signals from features like Radar and Heatmap, and validate/generate games, saving them in a digital wallet. Lottif aims to provide clarity, control, and performance for lottery players, guiding choices and helping manage spending.

dipy

dipy

58%

DIPY (Diffusion Imaging in Python) is a comprehensive open-source Python library designed for the analysis of MR diffusion imaging and other 3D/4D+ medical images. It provides a robust set of generic methods for tasks such as spatial normalization, signal processing, machine learning, and statistical analysis. Beyond general medical image processing, DIPY specializes in computational anatomy, offering advanced techniques for diffusion, perfusion, and structural imaging. The library is intended for research purposes, with a clear disclaimer for clinical deployment. It supports installation via pip or conda and adheres to Scientific Python SPEC 0 for version compatibility, making it accessible for researchers and developers in the medical imaging field.

moa

moa

58%

MOA (Massive Online Analysis) is a popular open-source framework designed for Big Data stream mining. It provides a comprehensive suite of machine learning algorithms, including classification, regression, clustering, outlier detection, concept drift detection, and recommender systems. Built in Java, MOA is related to the WEKA project but is specifically engineered to handle more demanding, large-scale, and real-time data stream processing challenges. The framework is extensible, allowing users to integrate new mining algorithms, stream generators, or evaluation measures, and serves as a benchmark suite for the stream mining community.

tf-gnn-samples

tf-gnn-samples

58%

tf-gnn-samples is a GitHub repository offering TensorFlow implementations of various Graph Neural Network (GNN) architectures. It serves as the code release for an article introducing GNNs with feature-wise linear modulation (GNN-FiLM). The repository includes implementations for Gated Graph Neural Networks (GGNN), Relational Graph Convolutional Networks (RGCN), Relational Graph Attention Networks (RGAT), Relational Graph Isomorphism Networks (RGIN), GNN-Edge-MLP, and Relational Graph Dynamic Convolution Networks (RGDCN). It provides scripts for training and evaluating models on tasks such as citation networks (Cora, Pubmed, Citeseer), protein-protein interaction (PPI), quantum chemistry prediction (QM9), and variable misuse detection (VarMisuse). The code allows users to reproduce experimental results presented in the accompanying research paper, making it a valuable resource for researchers and developers working with GNNs.

linfa

linfa

58%

linfa is a robust, open-source machine learning framework written in Rust, designed to provide a comprehensive toolkit for building various ML applications. It is conceptually similar to Python's scikit-learn, offering a wide array of common preprocessing tasks and classical machine learning algorithms. The framework includes implementations for algorithms such as Naive Bayes, K-Means, Gaussian-Mixture-Model, DBSCAN, OPTICS, ensemble methods like random forest, linear and logistic regression, support vector machines, decision trees, and dimensionality reduction techniques like PCA and t-SNE. linfa also supports various BLAS/LAPACK backends for optimized linear algebra routines, allowing developers to choose between pure-Rust implementations or external libraries like OpenBLAS, Netlib, or Intel MKL. This flexibility makes it suitable for developers looking to leverage Rust's performance and safety features in their ML projects.

pyRiemann

pyRiemann

58%

pyRiemann is an open-source Python machine learning package designed for processing and classifying real or complex-valued multivariate data. It leverages the Riemannian geometry of symmetric or Hermitian positive definite matrices, offering a high-level interface that mimics the scikit-learn API. While generic for multivariate data analysis, it's specifically tailored for biosignals like EEG, MEG, or EMG in brain-computer interface (BCI) applications, including motor imagery, event-related potentials, and steady-state visually evoked potentials. It also supports multisource transfer learning and remote sensing applications, such as processing radar images. The package provides functionalities for estimating covariance matrices and classifying them, making it a powerful tool for researchers and developers in these fields. It can be easily integrated into scikit-learn pipelines for comprehensive data analysis workflows.

smartcore

smartcore

58%

smartcore is a comprehensive, fast, and ergonomic open-source library designed for machine learning and numerical computing in Rust. It enables developers to apply machine learning algorithms leveraging first principles, covering a broad range of methods including linear models, tree-based methods, ensembles, SVMs, neighbors, clustering, decomposition, and preprocessing. The library emphasizes production-friendly APIs, strong typing, and good defaults, while remaining flexible for research and experimentation. It features strong linear algebra traits with optional ndarray integration, WASM-first defaults for portability, and practical utilities for model selection, evaluation, and data access. smartcore is ideal for developers building AI applications in Rust who need robust and efficient ML capabilities.

braindecode

braindecode

58%

Braindecode is an open-source Python toolbox specifically designed for decoding raw electrophysiological brain data using deep learning models. It offers a comprehensive suite of functionalities, including dataset fetchers, robust data preprocessing tools, and visualization capabilities. The toolbox also features implementations of various deep learning architectures and data augmentations, making it suitable for in-depth analysis of EEG, ECoG, and MEG signals. It caters to both neuroscientists interested in applying deep learning and deep learning researchers looking to work with neurophysiological data, providing a powerful platform for advanced brain signal analysis.

PythonNumericalDemos

PythonNumericalDemos

58%

PythonNumericalDemos is an open-source repository designed to provide Python demonstrations for spatial data analytics. It encompasses a range of topics, including geostatistical and machine learning workflows, making it a valuable resource for both students and educators. The repository is specifically tailored to support courses in data analytics and geostatistics, helping users overcome intellectual hurdles in data science. By offering practical, code-based examples, PythonNumericalDemos facilitates a deeper understanding of complex numerical methods and their application to real-world spatial data problems. Its open-source nature encourages collaboration and continuous improvement within the data science community.

cryptosense.com

cryptosense.com

58%

SandboxAQ transforms industries through the compound effects of AI and advanced computing. It develops large quantitative models (LQMs) for applications in drug discovery, material innovation, cybersecurity, and navigation. Key solutions include AQtive Guard for cryptography management, AQMed for medical diagnostics, and AQNav for enhanced autonomy. SandboxAQ's approach is rooted in science, engineered for impact, and provides real-world solutions, making it a leader in quantitative AI for global organizations. The platform also offers research and educational programs, including a residency and scholarship program, to further advance the field.