NoDeeLe: A Novel Deep Learning Schema for Evaluating Neural Machine Translation Systems

TRITON 2021 · Despoina Mouratidis, Maria Stasimioti, Vilelmini Sosoni, Katia Lida Kermanidis ·

Due to the wide-spread development of Machine Translation (MT) systems –especially Neural Machine Translation (NMT) systems– MT evaluation, both automatic and human, has become more and more important as it helps us establish how MT systems perform. Yet, automatic evaluation metrics have lagged behind, as the most popular choices (e.g., BLEU, METEOR and ROUGE) may correlate poorly with human judgments. This paper seeks to put to the test an evaluation model based on a novel deep learning schema (NoDeeLe) used to compare two NMT systems on four different text genres, i.e. medical, legal, marketing and literary in the English-Greek language pair. The model utilizes information from the source segments, the MT outputs and the reference translation, as well as the automatic metrics BLEU, METEOR and WER. The proposed schema achieves a strong correlation with human judgment (78% average accuracy for the four texts with the highest accuracy, i.e. 85%, observed in the case of the marketing text), while it outperforms classic machine learning algorithms and automatic metrics.

PDF Abstract