no code implementations • 9 Mar 2024 • Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li
Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient.
no code implementations • 18 Jan 2024 • Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li
Transformer architecture has enabled recent progress in speech enhancement.
no code implementations • 26 Dec 2023 • Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng
This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition.
1 code implementation • 18 Dec 2023 • Rui Cao, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang
By bridging the speech enhancement and the Information Bottleneck principle in this letter, we rethink a universal plug-and-play strategy and propose a Refining Underlying Information framework called RUI to rise to the challenges both in theory and practice.
no code implementations • 8 Nov 2023 • Jingru Lin, Meng Ge, Wupeng Wang, Haizhou Li, Mengling Feng
Self-supervised pre-trained speech models were shown effective for various downstream speech processing tasks.
no code implementations • 18 May 2023 • Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang
Recently, stunning improvements on multi-channel speech separation have been achieved by neural beamformers when direction information is available.
no code implementations • 7 Dec 2022 • Yanjie Fu, Haoran Yin, Meng Ge, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang
Recently, many deep learning based beamformers have been proposed for multi-channel speech separation.
no code implementations • 9 Oct 2022 • Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang
In the first stage, we pre-extract a target speech with visual cues and estimate the underlying phonetic sequence.
1 code implementation • 15 Jul 2022 • Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei Zhang, Lin Qiu, Jianwu Dang
These algorithms are usually achieved by mapping the multi-channel audio input to the single output (i. e. overall spatial pseudo-spectrum (SPS) of all sources), that is called MISO.
no code implementations • 29 Jun 2022 • Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv, Yuqin Lin, Jianwu Dang
Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switching speech recognition.
no code implementations • 28 Jun 2022 • Di Jin, Rui Wang, Meng Ge, Dongxiao He, Xiang Li, Wei Lin, Weixiong Zhang
Due to the homophily assumption of Graph Convolutional Networks (GCNs) that these methods use, they are not suitable for heterophily graphs where nodes with different labels or dissimilar attributes tend to be adjacent.
2 code implementations • 24 Jun 2022 • Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang
Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio.
1 code implementation • 31 Mar 2022 • Zexu Pan, Meng Ge, Haizhou Li
We propose a hybrid continuity loss function for time-domain speaker extraction algorithms to settle the over-suppression problem.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 21 Feb 2022 • Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li
Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance.
1 code implementation • 30 Sep 2021 • Zexu Pan, Meng Ge, Haizhou Li
The speaker extraction algorithm requires an auxiliary reference, such as a video recording or a pre-recorded speech, to form top-down auditory attention on the target speaker.
no code implementations • 19 Nov 2020 • Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li
Speaker extraction requires a sample speech from the target speaker as the reference.
no code implementations • 10 May 2020 • Meng Ge, Cheng-Lin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li
To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+.
Ranked #1 on Speech Extraction on WSJ0-2mix-extr
Speech Extraction Audio and Speech Processing Sound