Search Results for author: Charlotte Siska

Found 1 papers, 0 papers with code

Examining the robustness of LLM evaluation to the distributional assumptions of benchmarks

no code implementations25 Apr 2024 Melissa Ailem, Katerina Marazopoulou, Charlotte Siska, James Bono

The research community often relies on a model's average performance across the test prompts of a benchmark to evaluate the model's performance.

Semantic Similarity Semantic Textual Similarity

Cannot find the paper you are looking for? You can Submit a new open access paper.