TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Speech Recognition	Libri-Light test-clean	wav2vec 2.0 Large-10h-LV-60k	Word Error Rate (WER)	2.5	# 1
Speech Recognition	Libri-Light test-other	wav2vec 2.0 Large-10h-LV-60k	Word Error Rate (WER)	5.0	# 1
Speech Recognition	LibriSpeech test-clean	wav2vec 2.0 with Libri-Light	Word Error Rate (WER)	1.8	# 9
Speech Recognition	LibriSpeech test-other	wav2vec 2.0 with Libri-Light	Word Error Rate (WER)	3.3	# 6
Speech Recognition	LibriSpeech test-other	wav2vec 2.0	Word Error Rate (WER)	4.1	# 14
Speech Recognition	TIMIT	wav2vec 2.0	Percentage error	8.3	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wav2vec-2-0-a-framework-for-self-supervised/speech-recognition-on-libri-light-test-clean)](https://paperswithcode.com/sota/speech-recognition-on-libri-light-test-clean?p=wav2vec-2-0-a-framework-for-self-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wav2vec-2-0-a-framework-for-self-supervised/speech-recognition-on-libri-light-test-other)](https://paperswithcode.com/sota/speech-recognition-on-libri-light-test-other?p=wav2vec-2-0-a-framework-for-self-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wav2vec-2-0-a-framework-for-self-supervised/speech-recognition-on-timit)](https://paperswithcode.com/sota/speech-recognition-on-timit?p=wav2vec-2-0-a-framework-for-self-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wav2vec-2-0-a-framework-for-self-supervised/speech-recognition-on-librispeech-test-other)](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-other?p=wav2vec-2-0-a-framework-for-self-supervised)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/wav2vec-2-0-a-framework-for-self-supervised/speech-recognition-on-librispeech-test-clean)](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-clean?p=wav2vec-2-0-a-framework-for-self-supervised)`

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

NeurIPS 2020 · Alexei Baevski, Henry Zhou, Abdel-rahman Mohamed, Michael Auli ·

We show for the first time that learning powerful representations from speech audio alone followed by fine-tuning on transcribed speech can outperform the best semi-supervised methods while being conceptually simpler. wav2vec 2.0 masks the speech input in the latent space and solves a contrastive task defined over a quantization of the latent representations which are jointly learned. Experiments using all labeled data of Librispeech achieve 1.8/3.3 WER on the clean/other test sets. When lowering the amount of labeled data to one hour, wav2vec 2.0 outperforms the previous state of the art on the 100 hour subset while using 100 times less labeled data. Using just ten minutes of labeled data and pre-training on 53k hours of unlabeled data still achieves 4.8/8.2 WER. This demonstrates the feasibility of speech recognition with limited amounts of labeled data.

PDF Abstract NeurIPS 2020 PDF NeurIPS 2020 Abstract

Code

Add Remove Mark official

pytorch/fairseq official

29,233

huggingface/transformers

124,889

pytorch/fairseq

29,233

wenet-e2e/wenet

↳ Quickstart in

Spaces

3,687

sh-lee-prml/hierspeechpp

↳ Quickstart in

Spaces

1,070

See all 22 implementations

Tasks

Add Remove

Quantization

Self-Supervised Learning

Speech Recognition

Datasets

LibriSpeech Libri-Light

TIMIT

Results from the Paper

Edit

Ranked #1 on Speech Recognition on TIMIT (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Speech Recognition	Libri-Light test-clean	wav2vec 2.0 Large-10h-LV-60k	Word Error Rate (WER)	2.5	# 1	Compare
Speech Recognition	Libri-Light test-other	wav2vec 2.0 Large-10h-LV-60k	Word Error Rate (WER)	5.0	# 1	Compare
Speech Recognition	LibriSpeech test-clean	wav2vec 2.0 with Libri-Light	Word Error Rate (WER)	1.8	# 9	Compare
Speech Recognition	LibriSpeech test-other	wav2vec 2.0 with Libri-Light	Word Error Rate (WER)	3.3	# 6	Compare
Speech Recognition	LibriSpeech test-other	wav2vec 2.0	Word Error Rate (WER)	4.1	# 14	Compare
Speech Recognition	TIMIT	wav2vec 2.0	Percentage error	8.3	# 1	Compare

Methods

Add Remove

Absolute Position Encodings • Adam • BPE • Dense Connections • Dropout • GELU • Gumbel Softmax • Label Smoothing • Layer Normalization • Linear Layer • Multi-Head Attention • Position-Wise Feed-Forward Layer • Residual Connection • Scaled Dot-Product Attention • Softmax • Transformer

Edit Social Preview

wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove