Search Results for author: Charlotte Siska

Found 1 papers, 0 papers with code

Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks

no code implementations • 25 Apr 2024 • Melissa Ailem, Katerina Marazopoulou, Charlotte Siska, James Bono

The research community often relies on a model's average performance across the test prompts of a benchmark to evaluate the model's performance.

Semantic Similarity Semantic Textual Similarity

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.