Prometheus-Eval
Visit ToolPrometheus-Eval is an open-source AI tool for evaluating LLM responses in generation tasks. It uses Prometheus and GPT-4 to assess LLMs, supporting both absolute and relative grading.
At a glance
Trending
Prometheus-Eval is an open-source AI tool for evaluating LLM responses in generation tasks. It uses Prometheus and GPT-4 to assess LLMs, supporting both absolute and relative grading.
Trending
About
Prometheus-Eval is a comprehensive open-source repository designed for evaluating Large Language Models (LLMs) in various generation tasks. It leverages powerful models like Prometheus and GPT-4 to provide robust assessments. The tool supports multilingual meta-evaluation benchmarks, with recent iterations like M-Prometheus outperforming previous open LLM judges on multilingual meta-evaluation benchmarks such as MM-Eval and M-RewardBench. It also offers strong performance in English, surpassing Prometheus 2 7B and 8x7B on RewardBench. Prometheus-Eval facilitates both absolute grading, which assigns a score from 1 to 5, and relative grading, which compares two responses. It supports local inference via vllm and integration with LLM APIs through litellm, allowing users to utilize powerful evaluator LLMs like GPT-4.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending