Espresso: A Fast End-to-end Neural Speech Recognition Toolkit

18 Sep 2019Yiming WangTongfei ChenHainan XuShuoyang DingHang LvYiwen ShaoNanyun PengLei XieShinji WatanabeSanjeev Khudanpur

We present Espresso, an open-source, modular, extensible end-to-end neural automatic speech recognition (ASR) toolkit based on the deep learning library PyTorch and the popular neural machine translation toolkit fairseq. Espresso supports distributed training across GPUs and computing nodes, and features various decoding approaches commonly employed in ASR, including look-ahead word-based language model fusion, for which a fast, parallelized decoder is implemented... (read more)

PDF Abstract

Results from the Paper


 Ranked #1 on Speech Recognition on Hub5'00 SwitchBoard (Word Error Rate (WER) metric)

     Get a GitHub badge
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Speech Recognition Hub5'00 CallHome Espresso Word Error Rate (WER) 19.1 # 1
Speech Recognition Hub5'00 SwitchBoard Espresso Word Error Rate (WER) 9.2 # 1
Speech Recognition LibriSpeech test-clean Espresso Word Error Rate (WER) 2.8 # 11
Speech Recognition LibriSpeech test-other Espresso Word Error Rate (WER) 8.7 # 12
Speech Recognition WSJ eval92 Espresso Word Error Rate (WER) 3.4 # 2

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet