MetaScreener is an AI Agents & Automation tool that automates systematic review screening using an ensemble of open-source large language models. It provides transparent, reproducible decisions with uncertainty quantification, saving significant time in research.
MetaScreener is an AI-powered tool designed to streamline the abstract and PDF screening phases of systematic reviews. It leverages an ensemble of 4+ open-source large language models (LLMs) to process search results from platforms like PubMed or Scopus. The tool aggregates LLM outputs through a calibrated confidence pipeline, offering transparent and reproducible decisions with quantified uncertainty. Users can define review criteria (PICO/PEO/SPIDER), upload search results, and receive include/exclude decisions with confidence scores. Uncertain cases are flagged for human review, and the system supports active learning by recalibrating model weights based on human feedback. It is cost-effective, with processing costs as low as $0.003 per paper.
Best used for
Ideal for researchers, professors, and students who need to accelerate systematic reviews, efficiently screen large volumes of abstracts and PDFs, and ensure reproducible decision-making. Especially valuable for those seeking a cost-effective, transparent, and AI-powered solution for literature screening.
MetaScreener supports 15 open-source LLMs via OpenRouter, including flagship models like DeepSeek V3, Qwen 3, and Kimi K2.5, as well as strong models like Llama 4 Maverick and lightweight options such as Gemma 3 27B.
What file formats can be uploaded for screening?
MetaScreener accepts various file formats for search results, including RIS (.ris), BibTeX (.bib), CSV (.csv), and Excel (.xlsx). For full-text screening and data extraction, it also supports PDF (.pdf) files.
How does MetaScreener ensure reproducibility?
MetaScreener ensures reproducibility by setting temperature=0.0 for all LLM calls, using a fixed seed=42 for stochastic operations, and maintaining a full audit trail of every decision, including model outputs, rule violations, and confidence scores.