Fusion-Eval: Integrating Evaluators with LLMs

15 Nov 2023  ·  Lei Shu, Nevan Wichers, Liangchen Luo, Yun Zhu, Yinxiao Liu, Jindong Chen, Lei Meng ·

Evaluating natural language systems poses significant challenges, particularly in the realms of natural language understanding and high-level reasoning. In this paper, we introduce "Fusion-Eval", an innovative approach that leverages Large Language Models (LLMs) to integrate insights from various assistant evaluators. Each of these evaluators specializes in assessing distinct aspects of responses. This unique strategy enables Fusion-Eval to function effectively across a diverse range of tasks and criteria, enhancing the effectiveness of existing evaluation methods. Fusion-Eval achieves a 0.962 system-level Kendall-Tau correlation with humans on SummEval and a 0.744 turn-level Spearman correlation on TopicalChat, which is significantly higher than baseline methods. These results highlight Fusion-Eval's significant potential in the realm of natural language system evaluation.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods