Search Results for author: Zhuohan Long

Found 1 papers, 1 papers with code

Benchmark Self-Evolving: A Multi-Agent Framework for Dynamic LLM Evaluation

1 code implementation18 Feb 2024 Siyuan Wang, Zhuohan Long, Zhihao Fan, Zhongyu Wei, Xuanjing Huang

Towards a more scalable, robust and fine-grained evaluation, we implement six reframing operations to construct evolving instances testing LLMs against diverse queries, data noise and probing their problem-solving sub-abilities.

Model Selection

Cannot find the paper you are looking for? You can Submit a new open access paper.