Search Results for author: Pengcheng Guo

Found 20 papers, 3 papers with code

Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

no code implementations8 Apr 2024 He Wang, Pengcheng Guo, Xucheng Wan, Huan Zhou, Lei Xie

Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video.

Lipreading Lip Reading +1

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

no code implementations7 Jan 2024 He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, BinBin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition

no code implementations7 Jan 2024 He Wang, Pengcheng Guo, Pan Zhou, Lei Xie

While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness.

Audio-Visual Speech Recognition Automatic Speech Recognition +4

The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023

2 code implementations7 Jan 2024 He Wang, Pengcheng Guo, Wei Chen, Pan Zhou, Lei Xie

This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task.

speech-recognition Visual Speech Recognition

Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition

no code implementations1 Jun 2023 Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie

By incorporating additional contextual information, deep biasing methods have emerged as a promising solution for speech recognition of personalized words.

speech-recognition Speech Recognition

TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition

no code implementations23 May 2023 Hongfei Xue, Qijie Shao, Peikun Chen, Pengcheng Guo, Lei Xie, Jie Liu

Different from UniSpeech, UniData2vec replaces the quantized discrete representations with continuous and contextual representations from a teacher model for phonetically-aware pre-training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

no code implementations23 May 2023 Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie

The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge

no code implementations11 Mar 2023 Pengcheng Guo, He Wang, Bingshen Mu, Ao Zhang, Peikun Chen

This paper describes our NPU-ASLP system for the Audio-Visual Diarization and Recognition (AVDR) task in the Multi-modal Information based Speech Processing (MISP) 2022 Challenge.

Audio-Visual Speech Recognition speech-recognition +1

Preserving background sound in noise-robust voice conversion via multi-task learning

no code implementations6 Nov 2022 Jixun Yao, Yi Lei, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie, Hai Li, Junhui Liu, Danming Xie

Background sound is an informative form of art that is helpful in providing a more immersive experience in real-application voice conversion (VC) scenarios.

Multi-Task Learning Voice Conversion

Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling

no code implementations6 Nov 2022 Jixun Yao, Qing Wang, Yi Lei, Pengcheng Guo, Lei Xie, Namin Wang, Jie Liu

By directly scaling the formant and F0, the speaker distinguishability degradation of the anonymized speech caused by the introduction of other speakers is prevented.

Speaker Verification

NWPU-ASLP System for the VoicePrivacy 2022 Challenge

no code implementations24 Sep 2022 Jixun Yao, Qing Wang, Li Zhang, Pengcheng Guo, Yuhao Liang, Lei Xie

Our system consists of four modules, including feature extractor, acoustic model, anonymization module, and neural vocoder.

Speaker Verification

Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

no code implementations2 Jul 2022 Kun Wei, Pengcheng Guo, Ning Jiang

Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

1 code implementation7 Oct 2021 BinBin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di wu, Zhendong Peng

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total.

Label Error Detection Optical Character Recognition +4

Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain

1 code implementation16 Jun 2021 Pengcheng Guo, Xuankai Chang, Shinji Watanabe, Lei Xie

Moreover, by including the data of variable numbers of speakers, our model can even better than the PIT-Conformer AR model with only 1/7 latency, obtaining WERs of 19. 9% and 34. 3% on WSJ0-2mix and WSJ0-3mix sets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

no code implementations16 Jun 2018 Pengcheng Guo, Hai-Hua Xu, Lei Xie, Eng Siong Chng

In this paper, we present our overall efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.