Jasper: An End-to-End Convolutional Neural Acoustic Model

5 Apr 2019Jason LiVitaly LavrukhinBoris GinsburgRyan LearyOleksii KuchaievJonathan M. CohenHuyen NguyenRavi Teja Gadde

In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data. Our model, Jasper, uses only 1D convolutions, batch normalization, ReLU, dropout, and residual connections... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Speech Recognition Hub5'00 SwitchBoard Jasper DR 10x5 CallHome 16.2 # 1
SwitchBoard 7.8 # 1
Speech Recognition LibriSpeech test-clean Jasper DR 10x5 Word Error Rate (WER) 2.95 # 13
Speech Recognition LibriSpeech test-clean Jasper DR 10x5 (+ Time/Freq Masks) Word Error Rate (WER) 2.84 # 12
Speech Recognition LibriSpeech test-other Jasper DR 10x5 Word Error Rate (WER) 8.79 # 13
Speech Recognition LibriSpeech test-other Jasper DR 10x5 (+ Time/Freq Masks) Word Error Rate (WER) 7.84 # 11
Speech Recognition WSJ eval92 Jasper 10x3 Word Error Rate (WER) 6.9 # 4

Methods used in the Paper