Search Results for author: Heng-Jui Chang

Found 14 papers, 6 papers with code

A Large-Scale Evaluation of Speech Foundation Models

no code implementations • 15 Apr 2024 • Shu-wen Yang, Heng-Jui Chang, Zili Huang, Andy T. Liu, Cheng-I Lai, Haibin Wu, Jiatong Shi, Xuankai Chang, Hsiang-Sheng Tsai, Wen-Chin Huang, Tzu-hsun Feng, Po-Han Chi, Yist Y. Lin, Yung-Sung Chuang, Tzu-Hsien Huang, Wei-Cheng Tseng, Kushal Lakhotia, Shang-Wen Li, Abdelrahman Mohamed, Shinji Watanabe, Hung-Yi Lee

In this work, we establish the Speech processing Universal PERformance Benchmark (SUPERB) to study the effectiveness of the paradigm for speech.

Benchmarking

Paper
Add Code

SpeechCLIP+: Self-supervised multi-task representation learning for speech via CLIP and speech-image data

1 code implementation • 10 Feb 2024 • Hsuan-Fu Wang, Yi-Jen Shih, Heng-Jui Chang, Layne Berry, Puyuan Peng, Hung-Yi Lee, Hsin-Min Wang, David Harwath

Second, we propose a new hybrid architecture that merges the cascaded and parallel architectures of SpeechCLIP into a multi-task learning framework.

Keyword Extraction Multi-Task Learning +2

Paper
Code

R-Spin: Efficient Speaker and Noise-invariant Representation Learning with Acoustic Pieces

no code implementations • 15 Nov 2023 • Heng-Jui Chang, James Glass

This paper introduces Robust Spin (R-Spin), a data-efficient domain-specific self-supervision method for speaker and noise-invariant speech representations by learning discrete acoustic units with speaker-invariant clustering (Spin).

Clustering Representation Learning

Paper
Add Code

CoLLD: Contrastive Layer-to-layer Distillation for Compressing Multilingual Pre-trained Speech Encoders

no code implementations • 14 Sep 2023 • Heng-Jui Chang, Ning Dong, Ruslan Mavlyutov, Sravya Popuri, Yu-An Chung

Large-scale self-supervised pre-trained speech encoders outperform conventional approaches in speech recognition and translation tasks.

Contrastive Learning Knowledge Distillation +5

Paper
Add Code

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

1 code implementation • 18 May 2023 • Heng-Jui Chang, Alexander H. Liu, James Glass

Self-supervised speech representation models have succeeded in various tasks, but improving them for content-related problems using unlabeled data is challenging.

Acoustic Unit Discovery Clustering +3

Paper
Code

DinoSR: Self-Distillation and Online Clustering for Self-supervised Speech Representation Learning

1 code implementation • NeurIPS 2023 • Alexander H. Liu, Heng-Jui Chang, Michael Auli, Wei-Ning Hsu, James R. Glass

In this paper, we introduce self-distillation and online clustering for self-supervised speech representation learning (DinoSR) which combines masked language modeling, self-distillation, and online clustering.

Clustering Language Modelling +3

Paper
Code

M-SpeechCLIP: Leveraging Large-Scale, Pre-Trained Models for Multilingual Speech to Image Retrieval

no code implementations • 2 Nov 2022 • Layne Berry, Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Hung-Yi Lee, David Harwath

This work investigates the use of large-scale, English-only pre-trained models (CLIP and HuBERT) for multilingual image-speech retrieval.

Image Retrieval Retrieval +1

Paper
Add Code

SpeechCLIP: Integrating Speech with Pre-Trained Vision and Language Model

1 code implementation • 3 Oct 2022 • Yi-Jen Shih, Hsuan-Fu Wang, Heng-Jui Chang, Layne Berry, Hung-Yi Lee, David Harwath

Data-driven speech processing models usually perform well with a large amount of text supervision, but collecting transcribed speech data is costly.

Language Modelling Retrieval +1

104

Paper
Code

SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities

1 code implementation • ACL 2022 • Hsiang-Sheng Tsai, Heng-Jui Chang, Wen-Chin Huang, Zili Huang, Kushal Lakhotia, Shu-wen Yang, Shuyan Dong, Andy T. Liu, Cheng-I Jeff Lai, Jiatong Shi, Xuankai Chang, Phil Hall, Hsuan-Jui Chen, Shang-Wen Li, Shinji Watanabe, Abdelrahman Mohamed, Hung-Yi Lee

In this paper, we introduce SUPERB-SG, a new benchmark focused on evaluating the semantic and generative capabilities of pre-trained models by increasing task diversity and difficulty over SUPERB.

Self-Supervised Learning Transfer Learning

2,098

Paper
Code

Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models

no code implementations • 7 Oct 2021 • Liang-Hsuan Tseng, Yu-Kuan Fu, Heng-Jui Chang, Hung-Yi Lee

Code-switching (CS) is common in daily conversations where more than one language is used within a sentence.

Language Identification Self-Supervised Learning +3

Paper
Add Code

DistilHuBERT: Speech Representation Learning by Layer-wise Distillation of Hidden-unit BERT

1 code implementation • 5 Oct 2021 • Heng-Jui Chang, Shu-wen Yang, Hung-Yi Lee

Self-supervised speech representation learning methods like wav2vec 2. 0 and Hidden-unit BERT (HuBERT) leverage unlabeled speech data for pre-training and offer good representations for numerous speech processing tasks.

Multi-Task Learning Representation Learning

2,098

Paper
Code

Non-autoregressive Mandarin-English Code-switching Speech Recognition

no code implementations • 6 Apr 2021 • Shun-Po Chuang, Heng-Jui Chang, Sung-Feng Huang, Hung-Yi Lee

Mandarin-English code-switching (CS) is frequently used among East and Southeast Asian people.

Sentence speech-recognition +1

Paper
Add Code

Towards Lifelong Learning of End-to-end ASR

no code implementations • 4 Apr 2021 • Heng-Jui Chang, Hung-Yi Lee, Lin-shan Lee

We can collect new data describing the new environment and fine-tune the system, but this naturally leads to higher error rates for the earlier datasets, referred to as catastrophic forgetting.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Pseudo Whisper Pre-training

no code implementations • 5 May 2020 • Heng-Jui Chang, Alexander H. Liu, Hung-Yi Lee, Lin-shan Lee

Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data.

speech-recognition Speech Recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.