no code implementations • 25 Apr 2024 • Melissa Ailem, Katerina Marazopoulou, Charlotte Siska, James Bono
The research community often relies on a model's average performance across the test prompts of a benchmark to evaluate the model's performance.