Search Results for author: Gaofeng Cheng

Found 17 papers, 4 papers with code

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

no code implementations • 12 Aug 2023 • Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan

Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture

no code implementations • 5 Jul 2023 • Haoran Miao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

However, how to deploy hybrid CTC/attention systems for online speech recognition is still a non-trivial problem.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Speech Corpora Divergence Based Unsupervised Data Selection for ASR

no code implementations • 26 Feb 2023 • Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

This study proposes a unsupervised target-aware data selection method based on speech corpora divergence (SCD), which can measure the similarity between two speech corpora.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

no code implementations • 12 Oct 2022 • Shuhao Deng, Chengfei Li, Jinfeng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

1 code implementation • 17 Aug 2022 • Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee, Yonghong Yan

In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level.

Machine Translation speaker-diarization +1

Paper
Code

Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies

no code implementations • 6 Jul 2022 • Zehan Li, Haoran Miao, Keqi Deng, Gaofeng Cheng, Sanli Tian, Ta Li, Yonghong Yan

Firstly, we introduce a real-time encoder states revision strategy to modify previous states.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization

no code implementations • 28 Jun 2022 • Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

For online speaker diarization, samples arrive incrementally, and the overall distribution of the samples is invisible.

Clustering Online Clustering +2

Paper
Add Code

Boosting Cross-Domain Speech Recognition with Self-Supervision

1 code implementation • 20 Jun 2022 • Han Zhu, Gaofeng Cheng, Jindong Wang, Wenxin Hou, Pengyuan Zhang, Yonghong Yan

The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Decoupled Federated Learning for ASR with Non-IID Data

no code implementations • 18 Jun 2022 • Han Zhu, Jindong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Secondly, to reduce the communication and computation costs, we propose decoupled federated learning (DecoupleFL).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset

no code implementations • 31 Mar 2022 • Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan

As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research on a series of speech-related tasks, including automatic speech recognition, speaker diarization, topic detection, keyword search, text-to-speech, etc.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Improving CTC-based speech recognition via knowledge transferring from pre-trained language models

1 code implementation • 22 Feb 2022 • Keqi Deng, Songjun Cao, Yike Zhang, Long Ma, Gaofeng Cheng, Ji Xu, Pengyuan Zhang

Recently, end-to-end automatic speech recognition models based on connectionist temporal classification (CTC) have achieved impressive results, especially when fine-tuned from wav2vec2. 0 models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Improving non-autoregressive end-to-end speech recognition with pre-trained acoustic and language models

no code implementations • 25 Jan 2022 • Keqi Deng, Zehui Yang, Shinji Watanabe, Yosuke Higuchi, Gaofeng Cheng, Pengyuan Zhang

The proposed NAR model significantly surpasses previous NAR systems on the AISHELL-1 benchmark and shows a potential for English tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Multi-Variant Consistency based Self-supervised Learning for Robust Automatic Speech Recognition

no code implementations • 23 Dec 2021 • Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang

Nevertheless, most of the previous SSL methods ignore the influence of the background noise or reverberation, which is crucial to deploying ASR systems in real-world speech applications.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR

no code implementations • 9 Oct 2021 • Han Zhu, Li Wang, Jindong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

In this work, in order to build a better pre-trained model for low-resource ASR, we propose a pre-training approach called wav2vec-S, where we use task-specific semi-supervised pre-training to refine the self-supervised pre-trained model for the ASR task thus more effectively utilize the capacity of the pre-trained model to generate task-specific representations for ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture

no code implementations • 15 Jan 2020 • Haoran Miao, Gaofeng Cheng, Changfeng Gao, Pengyuan Zhang, Yonghong Yan

To support the online recognition, we integrate the state reuse chunk-SAE and the MTA based SAD into online CTC/attention architecture.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation

no code implementations • 25 Dec 2019 • Lu Huang, Gaofeng Cheng, Pengyuan Zhang, Yi Yang, Shumin Xu, Jiasong Sun

The experimental results show that uPIT outperforms cPIT when LC-BLSTM is used during inference.

Speech Separation

Paper
Add Code

Semi-Orthogonal Low-Rank Matrix Factorization for Deep Neural Networks

1 code implementation • Interspeech 2018 2018 • Daniel Povey, Gaofeng Cheng, Yiming Wang, Ke Li, Hainan Xu, Mahsa Yarmohammadi, Sanjeev Khudanpur

Time Delay Neural Networks (TDNNs), also known as onedimensional Convolutional Neural Networks (1-d CNNs), are an efficient and well-performing neural network architecture for speech recognition.

speech-recognition Speech Recognition

143

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.