End-to-end speech recognition using lattice-free MMI

Interspeech 2018 2018 Hossein HadianHossein SametiDaniel PoveySanjeev Khudanpur

We present our work on end-to-end training of acoustic models using the lattice-free maximum mutual information (LF-MMI) objective function in the context of hidden Markov models. By end-to-end training, we mean flat-start training of a single DNN in one stage without using any previously trained models, forced alignments, or building state-tying decision trees... (read more)

PDF Abstract

Results from the Paper


TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK BENCHMARK
Speech Recognition Switchboard (300hr) End-to-end LF-MMI Word Error Rate (WER) 9.3 # 1
Speech Recognition WSJ eval92 End-to-end LF-MMI Percentage error 3.0 # 2
Word Error Rate (WER) 3.0 # 1

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet