TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Speech Recognition	CHiME-6 dev_gss12	ConformerXXL-PS + G-Augment	Word Error Rate (WER)	26	# 1
Speech Recognition	CHiME-6 eval	ConformerXXL-PS + G-Augment	Word Error Rate (WER)	30.7	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/g-augment-searching-for-the-meta-structure-of/speech-recognition-on-chime-6-dev-gss12)](https://paperswithcode.com/sota/speech-recognition-on-chime-6-dev-gss12?p=g-augment-searching-for-the-meta-structure-of)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/g-augment-searching-for-the-meta-structure-of/speech-recognition-on-chime-6-eval)](https://paperswithcode.com/sota/speech-recognition-on-chime-6-eval?p=g-augment-searching-for-the-meta-structure-of)`

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

19 Oct 2022 · Gary Wang, Ekin D. Cubuk, Andrew Rosenberg, Shuyang Cheng, Ron J. Weiss, Bhuvana Ramabhadran, Pedro J. Moreno, Quoc V. Le, Daniel S. Park ·

Data augmentation is a ubiquitous technique used to provide robustness to automatic speech recognition (ASR) training. However, even as so much of the ASR training process has become automated and more "end-to-end", the data augmentation policy (what augmentation functions to use, and how to apply them) remains hand-crafted. We present Graph-Augment, a technique to define the augmentation space as directed acyclic graphs (DAGs) and search over this space to optimize the augmentation policy itself. We show that given the same computational budget, policies produced by G-Augment are able to perform better than SpecAugment policies obtained by random search on fine-tuning tasks on CHiME-6 and AMI. G-Augment is also able to establish a new state-of-the-art ASR performance on the CHiME-6 evaluation set (30.7% WER). We further demonstrate that G-Augment policies show better transfer properties across warm-start to cold-start training and model size compared to random-searched SpecAugment policies.

PDF Abstract

Code

Add Remove Mark official

No code implementations yet. Submit your code now

Tasks

Add Remove

Automatic Speech Recognition

Automatic Speech Recognition (ASR)

Data Augmentation

speech-recognition

Speech Recognition

Datasets

Add Datasets introduced or used in this paper

Results from the Paper

Edit

Ranked #1 on Speech Recognition on CHiME-6 eval

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Result	Benchmark
Speech Recognition	CHiME-6 dev_gss12	ConformerXXL-PS + G-Augment	Word Error Rate (WER)	26	# 1		Compare
Speech Recognition	CHiME-6 eval	ConformerXXL-PS + G-Augment	Word Error Rate (WER)	30.7	# 1		Compare

Methods

Add Remove

Random Search

Edit Social Preview

G-Augment: Searching for the Meta-Structure of Data Augmentation Policies for ASR

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove