Search Results for author: Heng-Jui Chang

Found 14 papers, 6 papers with code

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

no code implementations15 Nov 2023 Heng-Jui Chang, James Glass

This paper introduces Robust Spin (R-Spin), a data-efficient domain-specific self-supervision method for speaker and noise-invariant speech representations by learning discrete acoustic units with speaker-invariant clustering (Spin).

Clustering Representation Learning

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

1 code implementation18 May 2023 Heng-Jui Chang, Alexander H. Liu, James Glass

Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging.

Acoustic Unit Discovery Clustering +3

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

1 code implementation NeurIPS 2023 Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass

In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering.

Clustering Language Modelling +3

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

no code implementations2 Nov 2022 Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-Yi Lee, David Harwath

This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.

Image Retrieval Retrieval +1

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

1 code implementation3 Oct 2022 Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-Yi Lee, David Harwath

Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly.

Language Modelling Retrieval +1

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

1 code implementation5 Oct 2021 Heng-Jui Chang, Shu-wen Yang, Hung-Yi Lee

Self-supervised speech representation learning methods like wav2vec 2. 0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks.

Multi-Task Learning Representation Learning

Towards Lifelong Learning of End-to-end ASR

no code implementations4 Apr 2021 Heng-Jui Chang, Hung-Yi Lee, Lin-shan Lee

We can collect new data describing the new environment and fine-tune the system, but this naturally leads to higher error rates for the earlier datasets, referred to as catastrophic forgetting.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training

no code implementations5 May 2020 Heng-Jui Chang, Alexander H. Liu, Hung-Yi Lee, Lin-shan Lee

Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data.

speech-recognition Speech Recognition +1

Cannot find the paper you are looking for? You can Submit a new open access paper.