Acoustic Unit Discovery

9 papers with code • 1 benchmarks • 0 datasets

This task has no description! Would you like to contribute one?

Most implemented papers

Unsupervised speech representation learning using WaveNet autoencoders

bshall/ZeroSpeech 25 Jan 2019

We consider the task of unsupervised extraction of meaningful latent representations of speech by applying autoencoding neural networks to speech waveforms.

Word Segmentation on Discovered Phone Units with Dynamic Programming and Self-Supervised Scoring

kamperh/vqwordseg 24 Feb 2022

This paper instead revisits an older approach to word segmentation: bottom-up phone-like unit discovery is performed first, and symbolic word segmentation is then performed on top of the discovered units (without influencing the lower level).

Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

bshall/ZeroSpeech 19 May 2020

The idea is to learn a representation of speech by predicting future acoustic units.

Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery

beer-asr/beer 8 Apr 2019

This work tackles the problem of learning a set of language specific acoustic units from unlabeled speech recordings given a set of labeled recordings from other languages.

Unsupervised Acoustic Unit Discovery by Leveraging a Language-Independent Subword Discriminative Feature Representation

syfengcuhk/mboshi 2 Apr 2021

In the first stage, a recently proposed method in the task of unsupervised subword modeling is improved by replacing a monolingual out-of-domain (OOD) ASR system with a multilingual one to create a subword-discriminative representation that is more language-independent.

Analyzing Speaker Information in Self-Supervised Models to Improve Zero-Resource Speech Processing

bshall/cpc 2 Aug 2021

In this paper, we first show that the per-utterance mean of CPC features captures speaker information to a large extent.

Variable-rate hierarchical CPC leads to acoustic unit discovery in speech

chorowski-lab/hcpc 5 Jun 2022

The success of deep learning comes from its ability to capture the hierarchical structure of data by learning high-level representations defined in terms of low-level ones.

Bootstrapping meaning through listening: Unsupervised learning of spoken sentence embeddings

lingjzhu/spoken_sent_embedding 23 Oct 2022

Inducing semantic representations directly from speech signals is a highly challenging task but has many useful applications in speech mining and spoken language understanding.

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

vectominist/spin 18 May 2023

Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging.