TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Recognition	Hub5'00 SwitchBoard	LAS + SpecAugment (with LM, Switchboard mild policy)	CallHome	14.6	# 2
Speech Recognition	Hub5'00 SwitchBoard	LAS + SpecAugment (with LM, Switchboard mild policy)	SwitchBoard	6.8	# 1
Speech Recognition	Hub5'00 SwitchBoard	LAS + SpecAugment (with LM, Switchboard strong policy)	CallHome	14	# 1
Speech Recognition	Hub5'00 SwitchBoard	LAS + SpecAugment (with LM, Switchboard strong policy)	SwitchBoard	7.1	# 2
Speech Recognition	LibriSpeech test-clean	LAS (no LM)	Word Error Rate (WER)	2.7	# 34
Speech Recognition	LibriSpeech test-clean	LAS + SpecAugment	Word Error Rate (WER)	2.5	# 31
Speech Recognition	LibriSpeech test-other	LAS (no LM)	Word Error Rate (WER)	6.5	# 33
Speech Recognition	LibriSpeech test-other	LAS + SpecAugment	Word Error Rate (WER)	5.8	# 30

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/specaugment-a-simple-data-augmentation-method/speech-recognition-on-hub500-switchboard)](https://paperswithcode.com/sota/speech-recognition-on-hub500-switchboard?p=specaugment-a-simple-data-augmentation-method)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/specaugment-a-simple-data-augmentation-method/speech-recognition-on-librispeech-test-other)](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-other?p=specaugment-a-simple-data-augmentation-method)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/specaugment-a-simple-data-augmentation-method/speech-recognition-on-librispeech-test-clean)](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-clean?p=specaugment-a-simple-data-augmentation-method)`

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 Apr 2019 · Daniel S. Park, William Chan, Yu Zhang, Chung-Cheng Chiu, Barret Zoph, Ekin D. Cubuk, Quoc V. Le ·

We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients). The augmentation policy consists of warping the features, masking blocks of frequency channels, and masking blocks of time steps. We apply SpecAugment on Listen, Attend and Spell networks for end-to-end speech recognition tasks. We achieve state-of-the-art performance on the LibriSpeech 960h and Swichboard 300h tasks, outperforming all prior work. On LibriSpeech, we achieve 6.8% WER on test-other without the use of a language model, and 5.8% WER with shallow fusion with a language model. This compares to the previous state-of-the-art hybrid system of 7.5% WER. For Switchboard, we achieve 7.2%/14.6% on the Switchboard/CallHome portion of the Hub5'00 test set without the use of a language model, and 6.8%/14.1% with shallow fusion, which compares to the previous state-of-the-art hybrid system at 8.3%/17.3% WER.

PDF Abstract

Code

Add Remove Mark official

mozilla/DeepSpeech

24,279

PaddlePaddle/PaddleSpeech

10,131

makcedward/nlpaug

4,298

iver56/audiomentations

1,691

DemisEom/SpecAugment

630

See all 29 implementations

Tasks

Add Remove

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

Data Augmentation

Language Modelling

Speech Recognition

Datasets

LibriSpeech 2000 HUB5 English

Results from the Paper

Edit

Ranked #1 on Speech Recognition on Hub5'00 SwitchBoard

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Recognition	Hub5'00 SwitchBoard	LAS + SpecAugment (with LM, Switchboard mild policy)	CallHome	14.6	# 2	Compare
Speech Recognition	Hub5'00 SwitchBoard	LAS + SpecAugment (with LM, Switchboard mild policy)	SwitchBoard	6.8	# 1	Compare
Speech Recognition	Hub5'00 SwitchBoard	LAS + SpecAugment (with LM, Switchboard strong policy)	CallHome	14	# 1	Compare
Speech Recognition	Hub5'00 SwitchBoard	LAS + SpecAugment (with LM, Switchboard strong policy)	SwitchBoard	7.1	# 2	Compare
Speech Recognition	LibriSpeech test-clean	LAS (no LM)	Word Error Rate (WER)	2.7	# 34	Compare
Speech Recognition	LibriSpeech test-clean	LAS + SpecAugment	Word Error Rate (WER)	2.5	# 31	Compare
Speech Recognition	LibriSpeech test-other	LAS (no LM)	Word Error Rate (WER)	6.5	# 33	Compare
Speech Recognition	LibriSpeech test-other	LAS + SpecAugment	Word Error Rate (WER)	5.8	# 30	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove