no code implementations • 20 Feb 2024 • Haibin Wu, Xuanjun Chen, Yi-Cheng Lin, Kai-Wei Chang, Ho-Lam Chung, Alexander H. Liu, Hung-Yi Lee
Neural audio codecs are initially introduced to compress audio data into compact codes to reduce transmission latency.
1 code implementation • 20 Feb 2024 • Haibin Wu, Ho-Lam Chung, Yi-Cheng Lin, Yuan-Kuei Wu, Xuanjun Chen, Yu-Chi Pai, Hsiu-Hsuan Wang, Kai-Wei Chang, Alexander H. Liu, Hung-Yi Lee
The sound codec's dual roles in minimizing data transmission latency and serving as tokenizers underscore its critical importance.
2 code implementations • 27 Oct 2022 • Xuanjun Chen, Haibin Wu, Chung-Che Wang, Hung-Yi Lee, Jyh-Shing Roger Jang
This paper proposed an MTDVocaLiST model, which is trained by our proposed multimodal Transformer distillation (MTD) loss.
no code implementations • 3 Oct 2022 • Xuanjun Chen, Haibin Wu, Helen Meng, Hung-Yi Lee, Jyh-Shing Roger Jang
Audio-visual active speaker detection (AVASD) is well-developed, and now is an indispensable front-end for several multi-modal applications.
Adversarial Robustness Audio-Visual Active Speaker Detection
no code implementations • 31 Mar 2022 • Yen-Lun Liao, Xuanjun Chen, Chung-Che Wang, Jyh-Shing Roger Jang
The countermeasure (CM) model is developed to protect ASV systems from spoof attacks and prevent resulting personal information leakage in Automatic Speaker Verification (ASV) system.