Search Results for author: Shi-Xiong Zhang

Found 32 papers, 6 papers with code

RIR-SF: Room Impulse Response Based Spatial Feature for Multi-channel Multi-talker ASR

no code implementations31 Oct 2023 Yiwen Shao, Shi-Xiong Zhang, Dong Yu

Multi-channel multi-talker automatic speech recognition (ASR) presents ongoing challenges within the speech community, particularly when confronted with significant reverberation effects.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

UniX-Encoder: A Universal $X$-Channel Speech Encoder for Ad-Hoc Microphone Array Speech Processing

no code implementations25 Oct 2023 Zili Huang, Yiwen Shao, Shi-Xiong Zhang, Dong Yu

2) Multi-Task Capability: Beyond the single-task focus of previous systems, UniX-Encoder acts as a robust upstream model, adeptly extracting features for diverse tasks including ASR and speaker recognition.

speaker-diarization Speaker Diarization +3

Deep Neural Mel-Subband Beamformer for In-car Speech Separation

no code implementations22 Nov 2022 Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu

While current deep learning (DL)-based beamforming techniques have been proved effective in speech separation, they are often designed to process narrow-band (NB) frequencies independently which results in higher computational costs and inference times, making them unsuitable for real-world use.

Speech Separation

NeuralEcho: A Self-Attentive Recurrent Neural Network For Unified Acoustic Echo Suppression And Speech Enhancement

no code implementations20 May 2022 Meng Yu, Yong Xu, Chunlei Zhang, Shi-Xiong Zhang, Dong Yu

Acoustic echo cancellation (AEC) plays an important role in the full-duplex speech communication as well as the front-end speech enhancement for recognition in the conditions when the loudspeaker plays back.

Acoustic echo cancellation Speech Enhancement +2

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

1 code implementation31 Mar 2022 Soumi Maiti, Yushi Ueda, Shinji Watanabe, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Yong Xu

In this paper, we present a novel framework that jointly performs three tasks: speaker diarization, speech separation, and speaker counting.

speaker-diarization Speaker Diarization +1

Joint Modeling of Code-Switched and Monolingual ASR via Conditional Factorization

no code implementations29 Nov 2021 Brian Yan, Chunlei Zhang, Meng Yu, Shi-Xiong Zhang, Siddharth Dalmia, Dan Berrebbi, Chao Weng, Shinji Watanabe, Dong Yu

Conversational bilingual speech encompasses three types of utterances: two purely monolingual types and one intra-sententially code-switched type.

speech-recognition Speech Recognition

Multi-Channel Multi-Speaker ASR Using 3D Spatial Feature

no code implementations22 Nov 2021 Yiwen Shao, Shi-Xiong Zhang, Dong Yu

Experimental results show that 1) the proposed ALL-In-One model achieved a comparable error rate to the pipelined system while reducing the inference time by half; 2) the proposed 3D spatial feature significantly outperformed (31\% CERR) all previous works of using the 1D directional information in both paradigms.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Joint Neural AEC and Beamforming with Double-Talk Detection

no code implementations9 Nov 2021 Vinay Kothapally, Yong Xu, Meng Yu, Shi-Xiong Zhang, Dong Yu

We train the proposed model in an end-to-end approach to eliminate background noise and echoes from far-end audio devices, which include nonlinear distortions.

Acoustic echo cancellation Denoising +2

FAST-RIR: Fast neural diffuse room impulse response generator

2 code implementations7 Oct 2021 Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu

We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

TeCANet: Temporal-Contextual Attention Network for Environment-Aware Speech Dereverberation

no code implementations31 Mar 2021 Helin Wang, Bo Wu, LianWu Chen, Meng Yu, Jianwei Yu, Yong Xu, Shi-Xiong Zhang, Chao Weng, Dan Su, Dong Yu

In this paper, we exploit the effective way to leverage contextual information to improve the speech dereverberation performance in real-world reverberant environments.

Room Impulse Response (RIR) Speech Dereverberation

Multi-channel Multi-frame ADL-MVDR for Target Speech Separation

no code implementations24 Dec 2020 Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Donald S. Williamson, Dong Yu

Many purely neural network based speech separation approaches have been proposed to improve objective assessment scores, but they often introduce nonlinear distortions that are harmful to modern automatic speech recognition (ASR) systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition with Source Localization

no code implementations30 Oct 2020 Aswin Shanmugam Subramanian, Chao Weng, Shinji Watanabe, Meng Yu, Yong Xu, Shi-Xiong Zhang, Dong Yu

The advantages of D-ASR over existing methods are threefold: (1) it provides explicit speaker locations, (2) it improves the explainability factor, and (3) it achieves better ASR performance as the process is more streamlined.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Multi-Channel Speaker Verification for Single and Multi-talker Speech

no code implementations23 Oct 2020 Saurabh Kataria, Shi-Xiong Zhang, Dong Yu

We find the improvements from speaker-dependent directional features more consistent in multi-talker conditions than clean.

Action Detection Activity Detection +2

An Overview of Deep-Learning-Based Audio-Visual Speech Enhancement and Separation

1 code implementation21 Aug 2020 Daniel Michelsanti, Zheng-Hua Tan, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu, Jesper Jensen

Speech enhancement and speech separation are two related tasks, whose purpose is to extract either one or more target speech signals, respectively, from a mixture of sounds generated by several sources.

Speech Enhancement Speech Separation

ADL-MVDR: All deep learning MVDR beamformer for target speech separation

1 code implementation16 Aug 2020 Zhuohuang Zhang, Yong Xu, Meng Yu, Shi-Xiong Zhang, LianWu Chen, Dong Yu

Speech separation algorithms are often used to separate the target speech from other interfering sources.

Speech Separation

Neural Spatio-Temporal Beamformer for Target Speech Separation

1 code implementation8 May 2020 Yong Xu, Meng Yu, Shi-Xiong Zhang, Lian-Wu Chen, Chao Weng, Jianming Liu, Dong Yu

Purely neural network (NN) based speech separation and enhancement methods, although can achieve good objective scores, inevitably cause nonlinear speech distortions that are harmful for the automatic speech recognition (ASR).

Audio and Speech Processing Sound

Multi-modal Multi-channel Target Speech Separation

no code implementations16 Mar 2020 Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Lian-Wu Chen, Yuexian Zou, Dong Yu

Target speech separation refers to extracting a target speaker's voice from an overlapped audio of simultaneous talkers.

Speech Separation

Enhancing End-to-End Multi-channel Speech Separation via Spatial Feature Learning

no code implementations9 Mar 2020 Rongzhi Gu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

Hand-crafted spatial features (e. g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods.

Speech Separation

Self-supervised learning for audio-visual speaker diarization

no code implementations13 Feb 2020 Yifan Ding, Yong Xu, Shi-Xiong Zhang, Yahuan Cong, Liqiang Wang

Speaker diarization, which is to find the speech segments of specific speakers, has been widely used in human-centered applications such as video conferences or human-computer interaction systems.

Self-Supervised Learning speaker-diarization +2

Audio-visual Recognition of Overlapped speech for the LRS2 dataset

no code implementations6 Jan 2020 Jianwei Yu, Shi-Xiong Zhang, Jian Wu, Shahram Ghorbani, Bo Wu, Shiyin Kang, Shansong Liu, Xunying Liu, Helen Meng, Dong Yu

Experiments on overlapped speech simulated from the LRS2 dataset suggest the proposed AVSR system outperformed the audio only baseline LF-MMI DNN system by up to 29. 98\% absolute in word error rate (WER) reduction, and produced recognition performance comparable to a more complex pipelined system.

Audio-Visual Speech Recognition Automatic Speech Recognition (ASR) +4

Review of Single-cell RNA-seq Data Clustering for Cell Type Identification and Characterization

no code implementations3 Jan 2020 Shi-Xiong Zhang, Xiangtao Li, Qiuzhen Lin, Ka-Chun Wong

In recent years, the advances in single-cell RNA-seq techniques have enabled us to perform large-scale transcriptomic profiling at single-cell resolution in a high-throughput manner.

Clustering Dimensionality Reduction

A Unified Framework for Speech Separation

no code implementations17 Dec 2019 Fahimeh Bahmaninezhad, Shi-Xiong Zhang, Yong Xu, Meng Yu, John H. L. Hansen, Dong Yu

The initial solutions introduced for deep learning based speech separation analyzed the speech signals into time-frequency domain with STFT; and then encoded mixed signals were fed into a deep neural network based separator.

Speech Separation

Audio-Visual Speech Separation and Dereverberation with a Two-Stage Multimodal Network

no code implementations16 Sep 2019 Ke Tan, Yong Xu, Shi-Xiong Zhang, Meng Yu, Dong Yu

Background noise, interfering speech and room reverberation frequently distort target speech in real listening environments.

Audio and Speech Processing Sound Signal Processing

A comprehensive study of speech separation: spectrogram vs waveform separation

no code implementations17 May 2019 Fahimeh Bahmaninezhad, Jian Wu, Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Meng Yu, Dong Yu

We study the speech separation problem for far-field data (more similar to naturalistic audio streams) and develop multi-channel solutions for both frequency and time-domain separators with utilizing spectral, spatial and speaker location information.

speech-recognition Speech Recognition +1

End-to-End Multi-Channel Speech Separation

no code implementations15 May 2019 Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu

This paper extended the previous approach and proposed a new end-to-end model for multi-channel speech separation.

Speech Separation

Encrypted Speech Recognition using Deep Polynomial Networks

no code implementations11 May 2019 Shi-Xiong Zhang, Yifan Gong, Dong Yu

One good property of the DPN is that it can be trained on unencrypted speech features in the traditional way.

speech-recognition Speech Recognition

Time Domain Audio Visual Speech Separation

no code implementations7 Apr 2019 Jian Wu, Yong Xu, Shi-Xiong Zhang, Lian-Wu Chen, Meng Yu, Lei Xie, Dong Yu

Audio-visual multi-modal modeling has been demonstrated to be effective in many speech related tasks, such as speech recognition and speech enhancement.

Audio and Speech Processing Sound

End-to-End Attention based Text-Dependent Speaker Verification

no code implementations3 Jan 2017 Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, Yifan Gong

A new type of End-to-End system for text-dependent speaker verification is presented in this paper.

Text-Dependent Speaker Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.