Arena-Hard-Auto
Visit ToolArena-Hard-Auto is an AI Testing & QA tool that provides an automatic LLM benchmark. It evaluates instruction-tuned LLMs with high correlation to human preference using automatic judges like GPT-4.1 and Gemini-2.5.
At a glance
Trending