Sequence-Level Knowledge Distillation

EMNLP 2016 Yoon KimAlexander M. Rush

Neural machine translation (NMT) offers a novel alternative formulation of translation that is potentially simpler than statistical approaches. However to reach competitive performance, NMT models need to be exceedingly large... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Machine Translation IWSLT2015 Thai-English Seq-KD + Seq-Inter + Word-KD BLEU score 14.2 # 1
Machine Translation WMT2014 English-German Seq-KD + Seq-Inter + Word-KD BLEU score 18.5 # 40

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet