TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	EXTRA DATA	REMOVE
Speech Recognition	LibriSpeech test-clean	Transformer Transducer	Word Error Rate (WER)	2.0	# 15
Speech Recognition	LibriSpeech test-other	Transformer Transducer	Word Error Rate (WER)	4.20	# 17

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-rnn-transducer-based-asr-with/speech-recognition-on-librispeech-test-clean)](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-clean?p=improving-rnn-transducer-based-asr-with)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/improving-rnn-transducer-based-asr-with/speech-recognition-on-librispeech-test-other)](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-other?p=improving-rnn-transducer-based-asr-with)`

Improving RNN Transducer Based ASR with Auxiliary Tasks

5 Nov 2020 · Chunxi Liu, Frank Zhang, Duc Le, Suyoun Kim, Yatharth Saraf, Geoffrey Zweig ·

End-to-end automatic speech recognition (ASR) models with a single neural network have recently demonstrated state-of-the-art results compared to conventional hybrid speech recognizers. Specifically, recurrent neural network transducer (RNN-T) has shown competitive ASR performance on various benchmarks. In this work, we examine ways in which RNN-T can achieve better ASR accuracy via performing auxiliary tasks. We propose (i) using the same auxiliary task as primary RNN-T ASR task, and (ii) performing context-dependent graphemic state prediction as in conventional hybrid modeling. In transcribing social media videos with varying training data size, we first evaluate the streaming ASR performance on three languages: Romanian, Turkish and German. We find that both proposed methods provide consistent improvements. Next, we observe that both auxiliary tasks demonstrate efficacy in learning deep transformer encoders for RNN-T criterion, thus achieving competitive results - 2.0%/4.2% WER on LibriSpeech test-clean/other - as compared to prior top performing models.

PDF Abstract

Code

Add Remove Mark official

upskyy/Transformer-Transducer

Tasks

Add Remove

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

speech-recognition

Speech Recognition

Datasets

LibriSpeech

Results from the Paper

Edit

Ranked #15 on Speech Recognition on LibriSpeech test-clean

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Uses Extra Training Data	Result	Benchmark
Speech Recognition	LibriSpeech test-clean	Transformer Transducer	Word Error Rate (WER)	2.0	# 15			Compare
Speech Recognition	LibriSpeech test-other	Transformer Transducer	Word Error Rate (WER)	4.20	# 17			Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Improving RNN Transducer Based ASR with Auxiliary Tasks

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove