We show that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech--two vastly different languages.
Ranked #1 on Speech Recognition on WSJ eval93 (using extra training data)
On LibriSpeech, we achieve 6. 8% WER on test-other without the use of a language model, and 5. 8% WER with shallow fusion with a language model.
Ranked #2 on Speech Recognition on Hub5'00 SwitchBoard
We present a state-of-the-art speech recognition system developed using end-to-end deep learning.
We study pseudo-labeling for the semi-supervised training of ResNet, Time-Depth Separable ConvNets, and Transformers for speech recognition, with either CTC or Seq2Seq loss functions.
Ranked #3 on Speech Recognition on LibriSpeech test-clean (using extra training data)
In this paper, we report state-of-the-art results on LibriSpeech among end-to-end speech recognition models without any external training data.
Ranked #1 on Speech Recognition on Hub5'00 SwitchBoard
The performance of automatic speech recognition (ASR) has improved tremendously due to the application of deep neural networks (DNNs).
Recently, there has been an increasing interest in end-to-end speech recognition that directly transcribes speech to text without any predefined alignments.
To the best knowledge of the authors, the results obtained when training on the full LibriSpeech training set, are the best published currently, both for the hybrid DNN/HMM and the attention-based systems.
Ranked #5 on Speech Recognition on LibriSpeech test-other
Sequence-to-sequence attention-based models on subword units allow simple open-vocabulary end-to-end speech recognition.
Ranked #17 on Speech Recognition on LibriSpeech test-clean