no code implementations • LREC 2020 • Shih-Hung Wu, Sheng-Lun Chien
We believe that it is easier to get consistent results on comparing two generated dialogue by two systems and it is hard to give a consistent quality score on only one system at a time.