no code implementations • 11 Apr 2024 • Yufeng Yue, Meng Yu, Luojie Yang, Yi Yang
Image restoration is rather challenging in adverse weather conditions, especially when multiple degradations occur simultaneously.
no code implementations • 11 Apr 2024 • Meng Yu, Te Cui, Haoyang Lu, Yufeng Yue
Image dehazing poses significant challenges in environmental perception.
no code implementations • 22 Nov 2023 • Meng Yu, Dong Yu
Audio zooming, a signal processing technique, enables selective focusing and enhancement of sound signals from a specified region, attenuating others.
no code implementations • 27 Sep 2023 • Hao Zhang, Yixuan Zhang, Meng Yu, Dong Yu
In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process.
no code implementations • 27 Sep 2023 • Yixuan Zhang, Hao Zhang, Meng Yu, Dong Yu
Acoustic howling suppression (AHS) is a critical challenge in audio communication systems.
no code implementations • 16 Sep 2023 • Heming Wang, Meng Yu, Hao Zhang, Chunlei Zhang, Zhongweiyang Xu, Muqiao Yang, Yixuan Zhang, Dong Yu
Enhancing speech signal quality in adverse acoustic environments is a persistent challenge in speech processing.
no code implementations • 4 May 2023 • Hao Zhang, Meng Yu, Yuzhong Wu, Tao Yu, Dong Yu
During offline training, a pre-processed signal obtained from the Kalman filter and an ideal microphone signal generated via teacher-forced training strategy are used to train the deep neural network (DNN).
no code implementations • 2 May 2023 • Hao Zhang, Meng Yu, Dong Yu
In particular, the interplay between acoustic echo and acoustic howling in a hybrid meeting makes the joint suppression of them difficult.
no code implementations • 18 Feb 2023 • Hao Zhang, Meng Yu, Dong Yu
In this paper, we formulate acoustic howling suppression (AHS) as a supervised learning problem and propose a deep learning approach, called Deep AHS, to address it.
no code implementations • 29 Jan 2023 • Yixuan Zhang, Meng Yu, Hao Zhang, Dong Yu, DeLiang Wang
The robustness of the Kalman filter to double talk and its rapid convergence make it a popular approach for addressing acoustic echo cancellation (AEC) challenges.
no code implementations • 22 Nov 2022 • Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu
While current deep learning (DL)-based beamforming techniques have been proved effective in speech separation, they are often designed to process narrow-band (NB) frequencies independently which results in higher computational costs and inference times, making them unsuitable for real-world use.
no code implementations • 13 Jun 2022 • Xizhe Zhang, Chunyu Pan, Xinru Wei, Meng Yu, Shuangjie Liu, Jun An, Jieping Yang, Baojun Wei, Wenjun Hao, Yang Yao, Yuyan Zhu, Weixiong Zhang
One of the recent approaches is based on network structural controllability that focuses on finding a control scheme and driver genes that can steer the cell from an arbitrary state to a designated state.
no code implementations • 20 May 2022 • Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu
Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back.
1 code implementation • 31 Mar 2022 • Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu
In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting.
no code implementations • 30 Mar 2022 • Ziang Long, Yunling Zheng, Meng Yu, Jack Xin
Variational auto-encoder (VAE) is an effective neural network architecture to disentangle a speech utterance into speaker identity and linguistic content latent embeddings, then generate an utterance for a target speaker from that of a source speaker.
no code implementations • 29 Nov 2021 • Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu
Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type.
no code implementations • 9 Nov 2021 • Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu
We train the proposed model in an end-to-end approach to eliminate background noise and echoes from far-end audio devices, which include nonlinear distortions.
2 code implementations • 7 Oct 2021 • Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu
We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 17 Apr 2021 • Xiyun Li, Yong Xu, Meng Yu, Shi-Xiong Zhang, Jiaming Xu, Bo Xu, Dong Yu
The spatial self-attention module is designed to attend on the cross-channel correlation in the covariance matrices.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 2 Apr 2021 • Meng Yu, Chunlei Zhang, Yong Xu, ShiXiong Zhang, Dong Yu
The objective speech quality assessment is usually conducted by comparing received speech signal with its clean reference, while human beings are capable of evaluating the speech quality without any reference, such as in the mean opinion score (MOS) tests.
no code implementations • 31 Mar 2021 • Helin Wang, Bo Wu, LianWu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu
In this paper, we exploit the effective way to leverage contextual information to improve the speech dereverberation performance in real-world reverberant environments.
no code implementations • 16 Mar 2021 • Chunlei Zhang, Meng Yu, Chao Weng, Dong Yu
This paper proposes the target speaker enhancement based speaker verification network (TASE-SVNet), an all neural model that couples target speaker enhancement and speaker embedding extraction for robust speaker verification (SV).
no code implementations • 16 Feb 2021 • Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu
In addition to using the prediction error as a metric for evaluating our localization model, we also establish its potency as a frontend with automatic speech recognition (ASR) as the downstream task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 24 Dec 2020 • Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Donald S. Williamson, Dong Yu
Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 13 Dec 2020 • Wei Xia, Chunlei Zhang, Chao Weng, Meng Yu, Dong Yu
First, we examine a simple contrastive learning approach (SimCLR) with a momentum contrastive (MoCo) learning framework, where the MoCo speaker embedding system utilizes a queue to maintain a large set of negative examples.
no code implementations • 26 Nov 2020 • Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu
Target-speaker speech recognition aims to recognize target-speaker speech from noisy environments with background noise and interfering speakers.
Speech Enhancement Speech Extraction +1 Sound Audio and Speech Processing
no code implementations • 30 Oct 2020 • Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu
The advantages of D-ASR over existing methods are threefold: (1) it provides explicit speaker locations, (2) it improves the explainability factor, and (3) it achieves better ASR performance as the process is more streamlined.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 21 Aug 2020 • Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen
Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources.
1 code implementation • 16 Aug 2020 • Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Dong Yu
Speech separation algorithms are often used to separate the target speech from other interfering sources.
1 code implementation • 8 May 2020 • Yong Xu, Meng Yu, Shi-Xiong Zhang, Lian-Wu Chen, Chao Weng, Jianming Liu, Dong Yu
Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear speech distortions that are harmful for the automatic speech recognition (ASR).
Audio and Speech Processing Sound
no code implementations • 9 Mar 2020 • Rongzhi Gu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu
Hand-crafted spatial features (e. g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods.
no code implementations • 17 Dec 2019 • Fahimeh Bahmaninezhad, Shi-Xiong Zhang, Yong Xu, Meng Yu, John H. L. Hansen, Dong Yu
The initial solutions introduced for deep learning based speech separation analyzed the speech signals into time-frequency domain with STFT; and then encoded mixed signals were fed into a deep neural network based separator.
no code implementations • 16 Sep 2019 • Ke Tan, Yong Xu, Shi-Xiong Zhang, Meng Yu, Dong Yu
Background noise, interfering speech and room reverberation frequently distort target speech in real listening environments.
Audio and Speech Processing Sound Signal Processing
4 code implementations • 4 Sep 2019 • Chengzhu Yu, Heng Lu, Na Hu, Meng Yu, Chao Weng, Kun Xu, Peng Liu, Deyi Tuo, Shiyin Kang, Guangzhi Lei, Dan Su, Dong Yu
In this paper, we present a generic and robust multimodal synthesis system that produces highly natural speech and facial expression simultaneously.
no code implementations • 17 May 2019 • Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu
We study the speech separation problem for far-field data (more similar to naturalistic audio streams) and develop multi-channel solutions for both frequency and time-domain separators with utilizing spectral, spatial and speaker location information.
no code implementations • 15 May 2019 • Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu
This paper extended the previous approach and proposed a new end-to-end model for multi-channel speech separation.
no code implementations • 7 Apr 2019 • Jian Wu, Yong Xu, Shi-Xiong Zhang, Lian-Wu Chen, Meng Yu, Lei Xie, Dong Yu
Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as speech recognition and speech enhancement.
Audio and Speech Processing Sound