TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Automatic Speech Recognition	LibriSpeech test-clean	MonoBERT	Word Error Rate (WER)	3.2	# 1
Automatic Speech Recognition	LibriSpeech test-clean	PolyBERT	Word Error Rate (WER)	3.1	# 2
Automatic Speech Recognition	LibriSpeech test-other	MonoBERT	Word Error Rate (WER)	7.6	# 1
Automatic Speech Recognition	LibriSpeech test-other	PolyBERT	Word Error Rate (WER)	7.3	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pushing-the-limits-of-unsupervised-unit/automatic-speech-recognition-on-librispeech-10)](https://paperswithcode.com/sota/automatic-speech-recognition-on-librispeech-10?p=pushing-the-limits-of-unsupervised-unit)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/pushing-the-limits-of-unsupervised-unit/automatic-speech-recognition-on-librispeech-11)](https://paperswithcode.com/sota/automatic-speech-recognition-on-librispeech-11?p=pushing-the-limits-of-unsupervised-unit)`

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation

15 Jun 2023 · Ziyang Ma, Zhisheng Zheng, Guanrou Yang, Yu Wang, Chao Zhang, Xie Chen ·

The excellent generalization ability of self-supervised learning (SSL) for speech foundation models has garnered significant attention. HuBERT is a successful example that utilizes offline clustering to convert speech features into discrete units for a masked language modeling pretext task. However, simply clustering features as targets by k-means does not fully inspire the model's performance. In this work, we present an unsupervised method to improve SSL targets. Two models are proposed, MonoBERT and PolyBERT, which leverage context-independent and context-dependent phoneme-based units for pre-training. Our models outperform other SSL models significantly on the LibriSpeech benchmark without the need for iterative re-clustering and re-training. Furthermore, our models equipped with context-dependent units even outperform target-improvement models that use labeled data during pre-training. How we progressively improve the unit discovery process is demonstrated through experiments.

PDF Abstract

Code

Add Remove Mark official

yanghaha0908/fasthubert

Tasks

Add Remove

Automatic Speech Recognition

Clustering

Language Modelling

Masked Language Modeling

Self-Supervised Learning

Speech Recognition

Datasets

LibriSpeech

Results from the Paper

Add Remove

Ranked #1 on Automatic Speech Recognition on LibriSpeech test-other

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Automatic Speech Recognition	LibriSpeech test-clean	MonoBERT	Word Error Rate (WER)	3.2	# 1	Compare
Automatic Speech Recognition	LibriSpeech test-clean	PolyBERT	Word Error Rate (WER)	3.1	# 2	Compare
Automatic Speech Recognition	LibriSpeech test-other	MonoBERT	Word Error Rate (WER)	7.6	# 1	Compare
Automatic Speech Recognition	LibriSpeech test-other	PolyBERT	Word Error Rate (WER)	7.3	# 2	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Pushing the Limits of Unsupervised Unit Discovery for SSL Speech Representation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove