ShypdShypd.ai
📉

Data & Analytics

Browsing page 10 of AI tools for Statistical & Scientific in Data & Analytics. Sorted by confidence score — our independent quality rating.

pysentimiento

pysentimiento

60%

pysentimiento is an open-source Python toolkit designed for Sentiment Analysis and Social NLP tasks, leveraging Transformer-based models. It offers robust capabilities for sentiment analysis, hate speech detection, irony detection, and emotion analysis across multiple languages including Spanish, English, Italian, and Portuguese. Additionally, it provides NER & POS tagging for Spanish and English, and specialized contextualized hate speech detection and targeted sentiment analysis for Spanish. The library includes a tweet preprocessor optimized for transformer-based models, handling user handles, URLs, repeated characters, laughters, hashtags, and emojis. Developers can easily integrate it into their projects via pip install and utilize its `create_analyzer` function for various tasks.

SATIM

SATIM

60%

SATIM offers an AI-powered platform, OREC, designed for automated Synthetic Aperture Radar (SAR) analysis in ISR operations. It addresses the bottleneck in SAR data exploitation by automatically detecting and classifying vessels, vehicles, and aircraft from various SAR sensors, including space, airborne, and ground-based systems. The platform delivers standardized, operator-ready outputs directly into existing ISR workflows and is deployable on cloud, edge, or air-gapped infrastructure. OREC is expandable to new object types and threat profiles, ensuring a consistent analytical standard across the entire SAR stack. SATIM's solution is trusted by NATO partners and tier-1 defense contractors, with operational partnerships including Rheinmetall, Thales, and Airbus.

text-classification-surveys

text-classification-surveys

60%

text-classification-surveys is an open-source GitHub repository dedicated to compiling extensive resources for text classification within Natural Language Processing (NLP). It offers a detailed overview of various models, ranging from deep learning approaches like SpanBERT, ALBERT, and BERT, to shallow learning techniques such as LightGBM, SVM, and Random Forest. The repository also covers a wide array of text classification datasets, including MR, SST, IMDB, and Yelp, alongside common evaluation metrics like accuracy, Precision, Recall, and F1. Furthermore, it addresses technical challenges, including multi-label text classification. The content is primarily derived from the paper "A Survey on Text Classification: From Shallow to Deep Learning," making it a valuable resource for researchers and students in the field.

tab-ddpm

tab-ddpm

60%

tab-ddpm is the official open-source implementation of the paper "TabDDPM: Modelling Tabular Data with Diffusion Models" presented at ICML 2023. This tool provides researchers and developers with the necessary code to train, sample, and evaluate TabDDPM for generating synthetic tabular data. It includes scripts for hyperparameter tuning, evaluation against baselines like SMOTE and CTGAN, and privacy calculation. The repository also offers pre-tuned hyperparameters for evaluation models and provides access to datasets used in the paper, making it a comprehensive resource for experimentation and development in the field of AI and machine learning, particularly for those working with tabular datasets and diffusion models.

AI For Texting

AI For Texting

60%

AI For Texting is a free and advanced AI messages generator designed to assist users in writing, analyzing, and replying to various forms of text communication. This versatile tool supports emails, SMS, and messages for social networks and messaging applications. Users can customize AI-generated responses by selecting the desired platform, language, tone, and even adding emojis. The 'Analyze' tool provides options like Summarize, Grammar Correction, Sentiment Analysis, and Text Analysis to refine communications. Optimized for both desktop and mobile, AI For Texting is accessible without requiring a login, making it a convenient solution for personal, professional, or creative messaging needs.

wmd

wmd

60%

WMD (Word Mover's Distance) is an open-source implementation of the Word Mover's Distance algorithm, as described in Matthew J Kusner's paper "From Word Embeddings to Document Distances." This tool provides both Python and Matlab code, making it accessible for researchers and practitioners in natural language processing. It allows users to compute the distance matrix between documents based on their word embeddings, offering a robust method for comparing textual content. The repository includes scripts for extracting word vectors, computing WMD, and even a KNN function for classification. It also provides access to datasets used in the original paper, facilitating replication and further research. Prerequisites include Python 2.7 packages like gensim, numpy, and scipy, along with pre-trained word2vec embeddings.

Atomwise

Atomwise

60%

Atomwise leverages an advanced AI superplatform to revolutionize drug discovery by exploring the vast universe of chemical space. This platform is designed to identify novel, drug-like molecules that might otherwise remain undiscovered. By applying machine learning techniques, Atomwise aims to enhance the drug discovery process, particularly in the development of small-molecule drugs. The company focuses on creating programs that deliver first- and best-in-class potential, especially within immune and inflammatory diseases. Their approach is driven by a world-class team of scientists and engineers dedicated to redefining how new medications are brought to light.

Mallet

Mallet

60%

Mallet is an open-source, Java-based package designed for statistical natural language processing and machine learning applications to text. It provides sophisticated tools for document classification, including efficient text-to-feature conversion, various algorithms like Naïve Bayes and Maximum Entropy, and performance evaluation metrics. Beyond classification, Mallet supports sequence tagging for tasks such as named-entity extraction using algorithms like Hidden Markov Models and Conditional Random Fields. Its topic modeling toolkit offers efficient, sampling-based implementations of Latent Dirichlet Allocation and Hierarchical LDA. The package also includes routines for transforming text documents into numerical representations through a flexible system of "pipes" for tokenizing, stopword removal, and count vector conversion. Mallet is ideal for researchers and practitioners working with large text datasets.

python-ml-course

python-ml-course

60%

python-ml-course is an open-source educational resource designed to introduce individuals to Machine Learning using Python. The comprehensive course covers a wide range of topics, from basic Python installation and data preprocessing to advanced concepts like Deep Learning and Reinforcement Learning. It includes practical exercises, real-world datasets, and all source code on GitHub, making it suitable for hands-on learning. The course is taught by Juan Gabriel Gomila, a professional in Data Science, and aims to make complex mathematical theories and algorithms accessible. It caters to students, programmers, and data analysts looking to specialize or enhance their skills in the lucrative field of Data Science.

RemoteCLIP

RemoteCLIP

60%

RemoteCLIP is the official repository for the paper "RemoteCLIP: A Vision Language Foundation Model for Remote Sensing." This tool addresses limitations in existing remote sensing models by learning robust visual features with rich semantics and aligned text embeddings, crucial for retrieval and zero-shot applications. It leverages data scaling and conversion of heterogeneous annotations, incorporating UAV imagery to create a significantly larger pre-training dataset. RemoteCLIP supports diverse downstream tasks including zero-shot image classification, linear probing, k-NN classification, few-shot classification, image-text retrieval, and object counting, consistently outperforming baseline foundation models across various scales and datasets.

SparkNet

SparkNet

60%

SparkNet is an open-source framework designed for building and training distributed neural networks using Apache Spark. It allows users to leverage the power of Spark for scalable AI model development, particularly beneficial for handling large datasets. The framework provides functionalities for quick cluster setup on EC2, training models like Cifar and ImageNet, and installing SparkNet on existing Spark clusters. It supports GPU acceleration with CUDA and offers pre-built JavaCPP binaries for various platforms, making it a robust solution for data scientists and machine learning engineers working with distributed computing environments.

tabm

tabm

60%

TabM is an official open-source repository for the paper "TabM: Advancing Tabular Deep Learning With Parameter-Efficient Ensembling" (ICLR 2025). It offers a PyTorch-based Python package for implementing the TabM model, along with layers and tools for constructing custom architectures that efficiently ensemble MLP-like models. The tool is designed to improve performance on challenging tabular benchmarks like TabReD and has been successfully applied in Kaggle competitions. TabM is noted for its efficiency, being faster than prior tabular deep learning methods and capable of handling large datasets up to 100M+ objects. It allows for parallel training and weight sharing among MLPs, leading to better runtime, memory efficiency, and task performance.

Chinese-Text-Classification-Pytorch

Chinese-Text-Classification-Pytorch

60%

Chinese-Text-Classification-Pytorch is an open-source toolkit designed for Chinese text classification tasks, built on the PyTorch framework. It offers out-of-the-box implementations of several popular text classification models, including TextCNN, TextRNN, FastText, TextRCNN, BiLSTM_Attention, DPCNN, and Transformer. The toolkit is user-friendly and ready for immediate deployment, supporting both character-level input and the integration of pre-trained word vectors, specifically using Sougou News Word+Character 300d. It also includes a pre-processed Chinese dataset (THUCNews) for training and evaluation, making it a comprehensive resource for researchers and developers working on Chinese NLP.

SPSSAU

SPSSAU

60%

SPSSAU is an intelligent online statistical analysis platform designed to make data analysis accessible and efficient. It provides a comprehensive suite of over 500 analytical methods, such as T-tests, ANOVA, regression, correlation, clustering, and factor analysis. The platform features a "drag-and-drop" interface, allowing users to easily select analysis items and generate results with a single click. SPSSAU integrates AI to intelligently analyze data, suggest appropriate analytical options, and automate the generation of standardized analysis reports, including textual interpretations and visualizations. It also offers data processing functions like data labeling, encoding, and variable generation, alongside automatic chart generation. The platform supports both English and Chinese, offers robust security with阿里云 servers and data backup, and provides academic and enterprise-level research report services.

PreciseRoIPooling

PreciseRoIPooling

60%

PreciseRoIPooling is an open-source implementation of the Precise RoI Pooling (PrRoI Pooling) method, as proposed in the ECCV 2018 paper "Acquisition of Localization Confidence for Accurate Object Detection." This tool is designed to improve object detection accuracy by providing an integration-based average pooling method for RoI Pooling, which avoids quantization and offers a continuous gradient on bounding box coordinates. Unlike traditional RoI Pooling or RoI Align, PrRoI Pooling allows for the optimization of RoI coordinates through continuous gradients. The repository provides implementations for PyTorch (versions 1.0+ and 0.4) and TensorFlow (2.2), primarily supporting CUDA. It is a valuable resource for researchers and developers working on advanced object detection models.

prml

prml

60%

prml is an open-source GitHub repository dedicated to Christopher Bishop's seminal work, "Pattern Recognition and Machine Learning." It provides a comprehensive collection of Jupyter notebooks and Python code that implement many of the algorithms and replicate numerous graphs presented in the book. This resource is invaluable for students, professors, and researchers looking to understand and apply machine learning concepts through practical examples. The repository covers a wide range of topics, from basic probability distributions and linear models to more advanced subjects like neural networks, Gaussian processes, and hidden Markov models, making it a robust companion for academic study and practical implementation in the field of pattern recognition and machine learning.

Bert-Chinese-Text-Classification-Pytorch

Bert-Chinese-Text-Classification-Pytorch

60%

Bert-Chinese-Text-Classification-Pytorch is an open-source project designed for Chinese text classification, leveraging powerful pre-trained language models like Bert and ERNIE. Implemented in PyTorch, this tool offers an out-of-the-box solution for developers and researchers working with Chinese language data. It includes pre-trained models and a dataset of 200,000 Chinese news titles across 10 categories, making it ready for immediate use. The project also explores the integration of Bert with other neural network architectures such as CNN, RNN, RCNN, and DPCNN for comparative analysis of classification performance. It provides clear instructions for setting up the environment, using custom datasets, and running training and testing scripts.

Naru Healthcare

Naru Healthcare

60%

Naru Healthcare specializes in innovative AI-powered solutions for the healthcare sector, particularly in oncology. The company's core offering, aiatech, leverages advanced AI and proprietary clinical outcome generation algorithms to transform Real-World Data (RWD) from hospitals and patients into Real-World Evidence (RWE). This technology aims to maximize treatment effectiveness, learn from each patient, and improve clinical practices and research. Naru's Step Oncology solution provides a comprehensive system for patient monitoring, predictive modeling, and analytical visualization, supporting clinical decision-making and optimizing resource allocation in oncology.

Phy Health

Phy Health

60%

Phy Health transforms musculoskeletal (MSK) care by offering a mobile-first solution for 3D posture scanning and AI-powered insights. Users can generate a 3D body model with a 60-second scan from their mobile device, mapping 90,000 data points without hardware, wearables, or radiation. The platform then assesses head-to-toe body alignment, pinpointing strengths, imbalances, and vulnerabilities. Based on this data, it creates personalized exercise plans that update and progress with each scan. Phy Health aims to make postural health measurable, repeatable, and mobile, enabling earlier triage, better targeted care, and tracking changes over time for individuals and enterprises.

RaceDayta

RaceDayta

60%

RaceDayta is an app designed to analyze past race information and assist users in making informed betting choices for horse racing. It provides Lite and Pro ratings systems, expert AI opinions, and comprehensive full history data to help identify best-priced winners. The app also offers racecards for the current and following day, enabling users to plan their bets effectively. Available on the App Store for iPhone and iPad, RaceDayta aims to improve betting outcomes through data-driven analysis and AI-powered insights.

raster-vision

raster-vision

60%

raster-vision is an open-source Python library and framework designed for deep learning on satellite, aerial, and other large imagery sets, including oblique drone imagery. It offers built-in support for chip classification, object detection, and semantic segmentation, utilizing PyTorch backends. As a library, it provides a comprehensive suite of utilities for handling all aspects of a geospatial deep learning workflow, from reading geo-referenced data and training models to making predictions and writing out results in geo-referenced formats. As a low-code framework, it enables users to configure experiments for machine learning pipelines, including data analysis, chip creation, model training, prediction, evaluation, and deployment bundling. It also supports cloud execution via AWS Batch and AWS Sagemaker.

UgenTec

UgenTec

60%

UgenTec, now part of Velsera, offers the FastFinder platform to help molecular labs automate sample flow, perform real-time quality control, and streamline result calling. The platform includes modules like FastFinder Workflow for orchestrating sample-to-result processes, FastFinder Analysis for automated assay result reporting, and FastFinder Genotyper which uses AI for genotyping. FastFinder Insights provides smart dashboards for lab operational intelligence, while FastFinder QC offers in-run stats and real-time alerts. The Studio module allows control over assay configuration and SOP automation, ensuring reproducible and documented interpretation rules. It supports various modalities including PCR, Mass Spec, NGS, NAAT/LAMP, Serology, and dPCR, catering to clinical diagnostics, veterinary, pharma, and AgBio labs.

tinyvector

tinyvector

60%

tinyvector is a lightweight nearest-neighbor embedding database designed for AI applications that do not require the complexity or overhead of a full-scale vector database. It leverages SQLite for data storage and Pytorch for embedding operations, making it highly customizable and easy to integrate into existing workflows. The tool is currently in pre-release development, indicating ongoing enhancements and feature additions. Its small codebase allows for quick understanding and modification, catering to developers who need a simple yet effective solution for managing and querying embeddings. This makes tinyvector particularly suitable for rapid prototyping, specialized research projects, or scenarios where resource efficiency is paramount.

NeuralRays AI

NeuralRays AI

60%

NeuralRays AI is a professional services firm specializing in advanced artificial intelligence and data-driven software solutions. They offer a comprehensive suite of AI services including data strategy consulting, data science and AI consulting, AI solution development, and AI-driven automation. Additionally, they provide digital services such as product and platform engineering, cloud transformation, and digital assurance. NeuralRays AI focuses on ethical AI principles and agile delivery processes, aiming to empower clients with transformational digital products. With a global presence and extensive experience across various industries, they help businesses leverage AI to achieve their objectives and thrive in a competitive landscape. They offer flexible engagement models, including time-and-materials, fixed-price, value-based, and a unique build, operate, and transfer model.