Promptbench
Visit ToolPromptBench is an open-source framework for evaluating large language models. It provides a unified library for assessing LLM performance, robustness, and prompt engineering techniques.
At a glance
Trending
PromptBench is an open-source framework for evaluating large language models. It provides a unified library for assessing LLM performance, robustness, and prompt engineering techniques.
Trending
About
PromptBench is a PyTorch-based Python package designed as a unified evaluation framework for large language models (LLMs). It offers user-friendly APIs for researchers and developers to conduct comprehensive evaluations of LLMs, including quick performance assessments, prompt engineering method testing (like Chain-of-Thought, Emotion Prompt, and Expert Prompting), and adversarial prompt robustness analysis. The framework integrates dynamic evaluation techniques such as DyVal to mitigate test data contamination and efficient multi-prompt evaluation with PromptEval. It supports a wide range of language and multi-modal datasets and models, both open-source and proprietary, making it a versatile tool for understanding and benchmarking LLM capabilities.
Capabilities
Pricing & Plans
Open Source
Free
FAQs
Trending
Also listed in