Search Results for author: Meng Ge

Found 17 papers, 6 papers with code

sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

no code implementations • 9 Mar 2024 • Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li

Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient.

Paper
Add Code

An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

no code implementations • 18 Jan 2024 • Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li

Transformer architecture has enabled recent progress in speech enhancement.

POS Position +1

Paper
Add Code

The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

no code implementations • 26 Dec 2023 • Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition.

Automatic Speech Recognition Data Augmentation +2

Paper
Add Code

A Refining Underlying Information Framework for Monaural Speech Enhancement

1 code implementation • 18 Dec 2023 • Rui Cao, Tianrui Wang, Meng Ge, Longbiao Wang, Jianwu Dang

By bridging the speech enhancement and the Information Bottleneck principle in this letter, we rethink a universal plug-and-play strategy and propose a Refining Underlying Information framework called RUI to rise to the challenges both in theory and practice.

Speech Enhancement

Paper
Code

Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech

no code implementations • 8 Nov 2023 • Jingru Lin, Meng Ge, Wupeng Wang, Haizhou Li, Mengling Feng

Self-supervised pre-trained speech models were shown effective for various downstream speech processing tasks.

Paper
Add Code

Locate and Beamform: Two-dimensional Locating All-neural Beamformer for Multi-channel Speech Separation

no code implementations • 18 May 2023 • Yanjie Fu, Meng Ge, Honglong Wang, Nan Li, Haoran Yin, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

Recently, stunning improvements on multi-channel speech separation have been achieved by neural beamformers when direction information is available.

Speech Separation

Paper
Add Code

MIMO-DBnet: Multi-channel Input and Multiple Outputs DOA-aware Beamforming Network for Speech Separation

no code implementations • 7 Dec 2022 • Yanjie Fu, Haoran Yin, Meng Ge, Longbiao Wang, Gaoyan Zhang, Jianwu Dang, Chengyun Deng, Fei Wang

Recently, many deep learning based beamformers have been proposed for multi-channel speech separation.

Speech Separation

Paper
Add Code

VCSE: Time-Domain Visual-Contextual Speaker Extraction Network

no code implementations • 9 Oct 2022 • Junjie Li, Meng Ge, Zexu Pan, Longbiao Wang, Jianwu Dang

In the first stage, we pre-extract a target speech with visual cues and estimate the underlying phonetic sequence.

Lip Reading

Paper
Add Code

MIMO-DoAnet: Multi-channel Input and Multiple Outputs DoA Network with Unknown Number of Sound Sources

1 code implementation • 15 Jul 2022 • Haoran Yin, Meng Ge, Yanjie Fu, Gaoyan Zhang, Longbiao Wang, Lei Zhang, Lin Qiu, Jianwu Dang

These algorithms are usually achieved by mapping the multi-channel audio input to the single output (i. e. overall spatial pseudo-spectrum (SPS) of all sources), that is called MISO.

Paper
Code

Language-specific Characteristic Assistance for Code-switching Speech Recognition

no code implementations • 29 Jun 2022 • Tongtong Song, Qiang Xu, Meng Ge, Longbiao Wang, Hao Shi, Yongjie Lv, Yuqin Lin, Jianwu Dang

Dual-encoder structure successfully utilizes two language-specific encoders (LSEs) for code-switching speech recognition.

speech-recognition Speech Recognition

Paper
Add Code

RAW-GNN: RAndom Walk Aggregation based Graph Neural Network

no code implementations • 28 Jun 2022 • Di Jin, Rui Wang, Meng Ge, Dongxiao He, Xiang Li, Wei Lin, Weixiong Zhang

Due to the homophily assumption of Graph Convolutional Networks (GCNs) that these methods use, they are not suitable for heterophily graphs where nodes with different labels or dissimilar attributes tend to be adjacent.

Representation Learning

Paper
Add Code

Iterative Sound Source Localization for Unknown Number of Sources

2 code implementations • 24 Jun 2022 • Yanjie Fu, Meng Ge, Haoran Yin, Xinyuan Qian, Longbiao Wang, Gaoyan Zhang, Jianwu Dang

Sound source localization aims to seek the direction of arrival (DOA) of all sound sources from the observed multi-channel audio.

Paper
Code

A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction

1 code implementation • 31 Mar 2022 • Zexu Pan, Meng Ge, Haizhou Li

We propose a hybrid continuity loss function for time-domain speaker extraction algorithms to settle the over-suppression problem.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

L-SpEx: Localized Target Speaker Extraction

1 code implementation • 21 Feb 2022 • Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance.

Target Speaker Extraction

Paper
Code

USEV: Universal Speaker Extraction with Visual Cue

1 code implementation • 30 Sep 2021 • Zexu Pan, Meng Ge, Haizhou Li

The speaker extraction algorithm requires an auxiliary reference, such as a video recording or a pre-recorded speech, to form top-down auditory attention on the target speaker.

Paper
Code

Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals

no code implementations • 19 Nov 2020 • Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Speaker extraction requires a sample speech from the target speaker as the reference.

Paper
Add Code

SpEx+: A Complete Time Domain Speaker Extraction Network

no code implementations • 10 May 2020 • Meng Ge, Cheng-Lin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+.

Ranked #1 on Speech Extraction on WSJ0-2mix-extr

Speech Extraction Audio and Speech Processing Sound

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.