The general multi-turn dialogue evaluation dataset with nine topics. Each topic has five representative cases, resulting in a comprehensive evaluation dataset of 45 cases.
1 PAPER • NO BENCHMARKS YET