TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Speech Recognition	swb_hub_500 WER fullSWBCH	IBM (LSTM encoder-decoder)	Percentage error	7.8	# 2
Speech Recognition	Switchboard + Hub500	IBM (LSTM encoder-decoder)	Percentage error	4.7	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/single-headed-attention-based-sequence-to/speech-recognition-on-swb_hub_500-wer)](https://paperswithcode.com/sota/speech-recognition-on-swb_hub_500-wer?p=single-headed-attention-based-sequence-to)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/single-headed-attention-based-sequence-to/speech-recognition-on-switchboard-hub500)](https://paperswithcode.com/sota/speech-recognition-on-switchboard-hub500?p=single-headed-attention-based-sequence-to)`

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard

20 Jan 2020 · Zoltán Tüske, George Saon, Kartik Audhkhasi, Brian Kingsbury ·

It is generally believed that direct sequence-to-sequence (seq2seq) speech recognition models are competitive with hybrid models only when a large amount of data, at least a thousand hours, is available for training. In this paper, we show that state-of-the-art recognition performance can be achieved on the Switchboard-300 database using a single headed attention, LSTM based model. Using a cross-utterance language model, our single-pass speaker independent system reaches 6.4% and 12.5% word error rate (WER) on the Switchboard and CallHome subsets of Hub5'00, without a pronunciation lexicon. While careful regularization and data augmentation are crucial in achieving this level of performance, experiments on Switchboard-2000 show that nothing is more useful than more data. Overall, the combination of various regularizations and a simple but fairly large model results in a new state of the art, 4.7% and 7.8% WER on the Switchboard and CallHome sets, using SWB-2000 without any external data resources.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Data Augmentation

Language Modelling

speech-recognition

Speech Recognition

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #2 on Speech Recognition on swb_hub_500 WER fullSWBCH

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Speech Recognition	swb_hub_500 WER fullSWBCH	IBM (LSTM encoder-decoder)	Percentage error	7.8	# 2		Compare
Speech Recognition	Switchboard + Hub500	IBM (LSTM encoder-decoder)	Percentage error	4.7	# 2		Compare

Methods

Add Remove

LSTM • Sigmoid Activation • Tanh Activation

Edit Social Preview

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove