Search Results for author: Pengcheng Guo

Found 20 papers, 3 papers with code

Enhancing Lip Reading with Multi-Scale Video and Multi-Encoder

no code implementations • 8 Apr 2024 • He Wang, Pengcheng Guo, Xucheng Wan, Huan Zhou, Lei Xie

Automatic lip-reading (ALR) aims to automatically transcribe spoken content from a speaker's silent lip motion captured in video.

Lipreading Lip Reading +1

Paper
Add Code

ICMC-ASR: The ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition Challenge

no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Yue Li, Ao Zhang, Jiayao Sun, Lei Xie, Wei Chen, Pan Zhou, Hui Bu, Xin Xu, BinBin Zhang, Zhuo Chen, Jian Wu, Longbiao Wang, Eng Siong Chng, Sun Li

To promote speech processing and recognition research in driving scenarios, we build on the success of the Intelligent Cockpit Speech Recognition Challenge (ICSRC) held at ISCSLP 2022 and launch the ICASSP 2024 In-Car Multi-Channel Automatic Speech Recognition (ICMC-ASR) Challenge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

MLCA-AVSR: Multi-Layer Cross Attention Fusion based Audio-Visual Speech Recognition

no code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Pan Zhou, Lei Xie

While automatic speech recognition (ASR) systems degrade significantly in noisy environments, audio-visual speech recognition (AVSR) systems aim to complement the audio stream with noise-invariant visual cues and improve the system's robustness.

Audio-Visual Speech Recognition Automatic Speech Recognition +4

Paper
Add Code

The NPU-ASLP-LiAuto System Description for Visual Speech Recognition in CNVSRC 2023

2 code implementations • 7 Jan 2024 • He Wang, Pengcheng Guo, Wei Chen, Pan Zhou, Lei Xie

This paper delineates the visual speech recognition (VSR) system introduced by the NPU-ASLP-LiAuto (Team 237) in the first Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) 2023, engaging in the fixed and open tracks of Single-Speaker VSR Task, and the open track of Multi-Speaker VSR Task.

speech-recognition Visual Speech Recognition

Paper
Code

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

no code implementations • 27 Sep 2023 • Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang

Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling.

Automatic Speech Recognition Self-Supervised Learning +3

Paper
Add Code

Adaptive Contextual Biasing for Transducer Based Streaming Speech Recognition

no code implementations • 1 Jun 2023 • Tianyi Xu, Zhanheng Yang, Kaixun Huang, Pengcheng Guo, Ao Zhang, Biao Li, Changru Chen, Chao Li, Lei Xie

By incorporating additional contextual information, deep biasing methods have emerged as a promising solution for speech recognition of personalized words.

speech-recognition Speech Recognition

Paper
Add Code

TranUSR: Phoneme-to-word Transcoder Based Unified Speech Representation Learning for Cross-lingual Speech Recognition

no code implementations • 23 May 2023 • Hongfei Xue, Qijie Shao, Peikun Chen, Pengcheng Guo, Lei Xie, Jie Liu

Different from UniSpeech, UniData2vec replaces the quantized discrete representations with continuous and contextual representations from a teacher model for phonetically-aware pre-training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR

no code implementations • 23 May 2023 • Yuhao Liang, Fan Yu, Yangze Li, Pengcheng Guo, Shiliang Zhang, Qian Chen, Lei Xie

The recently proposed serialized output training (SOT) simplifies multi-talker automatic speech recognition (ASR) by generating speaker transcriptions separated by a special token.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Contextualized End-to-End Speech Recognition with Contextual Phrase Prediction Network

no code implementations • 21 May 2023 • Kaixun Huang, Ao Zhang, Zhanheng Yang, Pengcheng Guo, Bingshen Mu, Tianyi Xu, Lei Xie

In this study, we introduce a contextual phrase prediction network for an attention-based deep bias method.

speech-recognition Speech Recognition

Paper
Add Code

The NPU-ASLP System for Audio-Visual Speech Recognition in MISP 2022 Challenge

no code implementations • 11 Mar 2023 • Pengcheng Guo, He Wang, Bingshen Mu, Ao Zhang, Peikun Chen

This paper describes our NPU-ASLP system for the Audio-Visual Diarization and Recognition (AVDR) task in the Multi-modal Information based Speech Processing (MISP) 2022 Challenge.

Audio-Visual Speech Recognition speech-recognition +1

Paper
Add Code

Preserving background sound in noise-robust voice conversion via multi-task learning

no code implementations • 6 Nov 2022 • Jixun Yao, Yi Lei, Qing Wang, Pengcheng Guo, Ziqian Ning, Lei Xie, Hai Li, Junhui Liu, Danming Xie

Background sound is an informative form of art that is helpful in providing a more immersive experience in real-application voice conversion (VC) scenarios.

Multi-Task Learning Voice Conversion

Paper
Add Code

Distinguishable Speaker Anonymization based on Formant and Fundamental Frequency Scaling

no code implementations • 6 Nov 2022 • Jixun Yao, Qing Wang, Yi Lei, Pengcheng Guo, Lei Xie, Namin Wang, Jie Liu

By directly scaling the formant and F0, the speaker distinguishability degradation of the anonymized speech caused by the introduction of other speakers is prevented.

Speaker Verification

Paper
Add Code

NWPU-ASLP System for the VoicePrivacy 2022 Challenge

no code implementations • 24 Sep 2022 • Jixun Yao, Qing Wang, Li Zhang, Pengcheng Guo, Yuhao Liang, Lei Xie

Our system consists of four modules, including feature extractor, acoustic model, anonymization module, and neural vocoder.

Speaker Verification

Paper
Add Code

Improving Transformer-based Conversational ASR by Inter-Sentential Attention Mechanism

no code implementations • 2 Jul 2022 • Kun Wei, Pengcheng Guo, Ning Jiang

Transformer-based models have demonstrated their effectiveness in automatic speech recognition (ASR) tasks and even shown superior performance over the conventional hybrid framework.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

An Exploration of Self-Supervised Pretrained Representations for End-to-End Speech Recognition

no code implementations • 9 Oct 2021 • Xuankai Chang, Takashi Maekaku, Pengcheng Guo, Jing Shi, Yen-Ju Lu, Aswin Shanmugam Subramanian, Tianzi Wang, Shu-wen Yang, Yu Tsao, Hung-Yi Lee, Shinji Watanabe

We select several pretrained speech representations and present the experimental results on various open-source and publicly available corpora for E2E-ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

WenetSpeech: A 10000+ Hours Multi-domain Mandarin Corpus for Speech Recognition

1 code implementation • 7 Oct 2021 • BinBin Zhang, Hang Lv, Pengcheng Guo, Qijie Shao, Chao Yang, Lei Xie, Xin Xu, Hui Bu, Xiaoyu Chen, Chenchen Zeng, Di wu, Zhendong Peng

In this paper, we present WenetSpeech, a multi-domain Mandarin corpus consisting of 10000+ hours high-quality labeled speech, 2400+ hours weakly labeled speech, and about 10000 hours unlabeled speech, with 22400+ hours in total.

Ranked #5 on Speech Recognition on WenetSpeech

Label Error Detection Optical Character Recognition +4

454

Paper
Code

ESPnet-ST IWSLT 2021 Offline Speech Translation System

no code implementations • ACL (IWSLT) 2021 • Hirofumi Inaguma, Brian Yan, Siddharth Dalmia, Pengcheng Guo, Jiatong Shi, Kevin Duh, Shinji Watanabe

This year we made various efforts on training data, architecture, and audio segmentation.

Knowledge Distillation speech-recognition +2

Paper
Add Code

Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain

1 code implementation • 16 Jun 2021 • Pengcheng Guo, Xuankai Chang, Shinji Watanabe, Lei Xie

Moreover, by including the data of variable numbers of speakers, our model can even better than the PIT-Conformer AR model with only 1/7 latency, obtaining WERs of 19. 9% and 34. 3% on WSJ0-2mix and WSJ0-3mix sets.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals

no code implementations • NeurIPS 2020 • Jing Shi, Xuankai Chang, Pengcheng Guo, Shinji Watanabe, Yusuke Fujita, Jiaming Xu, Bo Xu, Lei Xie

This model additionally has a simple and efficient stop criterion for the end of the transduction, making it able to infer the variable number of output sequences.

Ranked #3 on Speech Separation on WSJ0-4mix

speech-recognition Speech Recognition +1

Paper
Add Code

Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

no code implementations • 16 Jun 2018 • Pengcheng Guo, Hai-Hua Xu, Lei Xie, Eng Siong Chng

In this paper, we present our overall efforts to improve the performance of a code-switching speech recognition system using semi-supervised training methods from lexicon learning to acoustic modeling, on the South East Asian Mandarin-English (SEAME) data.

speech-recognition Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.