SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition

18 Apr 2019Daniel S. ParkWilliam ChanYu ZhangChung-Cheng ChiuBarret ZophEkin D. CubukQuoc V. Le

We present SpecAugment, a simple data augmentation method for speech recognition. SpecAugment is applied directly to the feature inputs of a neural network (i.e., filter bank coefficients)... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK RESULT BENCHMARK
Speech Recognition Hub5'00 SwitchBoard LAS + SpecAugment (with LM, Switchboard mild policy) SwitchBoard 6.8 # 2
Speech Recognition Hub5'00 SwitchBoard LAS + SpecAugment (with LM, Switchboard strong policy) CallHome 14 # 2
Speech Recognition LibriSpeech test-clean LAS (no LM) Word Error Rate (WER) 2.7 # 10
Speech Recognition LibriSpeech test-clean LAS + SpecAugment Word Error Rate (WER) 2.5 # 8
Speech Recognition LibriSpeech test-other LAS (no LM) Word Error Rate (WER) 6.5 # 9
Speech Recognition LibriSpeech test-other LAS + SpecAugment Word Error Rate (WER) 5.8 # 8

Methods used in the Paper


METHOD TYPE
🤖 No Methods Found Help the community by adding them if they're not listed; e.g. Deep Residual Learning for Image Recognition uses ResNet