Search Results for author: Zhengkun Tian

Found 24 papers, 4 papers with code

CPPF: A contextual and post-processing-free model for automatic speech recognition

no code implementations14 Sep 2023 Lei Zhang, Zhengkun Tian, Xiang Chen, Jiaming Sun, Hongyu Xiang, Ke Ding, Guanglu Wan

To address this issue, we draw inspiration from the multifaceted capabilities of LLMs and Whisper, and focus on integrating multiple ASR text processing tasks related to speech recognition into the ASR model.

Automatic Speech Recognition speech-recognition +1

Peak-First CTC: Reducing the Peak Latency of CTC Models by Applying Peak-First Regularization

no code implementations7 Nov 2022 Zhengkun Tian, Hongyu Xiang, Min Li, Feifei Lin, Ke Ding, Guanglu Wan

To reduce the peak latency, we propose a simple and novel method named peak-first regularization, which utilizes a frame-wise knowledge distillation function to force the probability distribution of the CTC model to shift left along the time axis instead of directly modifying the calculation process of CTC loss and gradients.

Knowledge Distillation

Fully Automated End-to-End Fake Audio Detection

no code implementations20 Aug 2022 Chenglong Wang, Jiangyan Yi, JianHua Tao, Haiyang Sun, Xun Chen, Zhengkun Tian, Haoxin Ma, Cunhang Fan, Ruibo Fu

The existing fake audio detection systems often rely on expert experience to design the acoustic features or manually design the hyperparameters of the network structure.

Reducing language context confusion for end-to-end code-switching automatic speech recognition

no code implementations28 Jan 2022 Shuai Zhang, Jiangyan Yi, Zhengkun Tian, JianHua Tao, Yu Ting Yeung, Liqun Deng

We propose a language-related attention mechanism to reduce multilingual context confusion for the E2E code-switching ASR model based on the Equivalence Constraint (EC) Theory.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A Large-Scale Chinese Multimodal NER Dataset with Speech Clues

1 code implementation ACL 2021 Dianbo Sui, Zhengkun Tian, Yubo Chen, Kang Liu, Jun Zhao

In this paper, we aim to explore an uncharted territory, which is Chinese multimodal named entity recognition (NER) with both textual and acoustic contents.

named-entity-recognition Named Entity Recognition +1

FSR: Accelerating the Inference Process of Transducer-Based Models by Applying Fast-Skip Regularization

no code implementations7 Apr 2021 Zhengkun Tian, Jiangyan Yi, Ye Bai, JianHua Tao, Shuai Zhang, Zhengqi Wen

It takes a lot of computation and time to predict the blank tokens, but only the non-blank tokens will appear in the final output sequence.

Position speech-recognition +1

TSNAT: Two-Step Non-Autoregressvie Transformer Models for Speech Recognition

1 code implementation4 Apr 2021 Zhengkun Tian, Jiangyan Yi, JianHua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen, Xuefei Liu

To address these two problems, we propose a new model named the two-step non-autoregressive transformer(TSNAT), which improves the performance and accelerating the convergence of the NAR model by learning prior knowledge from a parameters-sharing AR model.

speech-recognition Speech Recognition +1

Gated Recurrent Fusion with Joint Training Framework for Robust End-to-End Speech Recognition

no code implementations9 Nov 2020 Cunhang Fan, Jiangyan Yi, JianHua Tao, Zhengkun Tian, Bin Liu, Zhengqi Wen

The joint training framework for speech enhancement and recognition methods have obtained quite good performances for robust end-to-end automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Decoupling Pronunciation and Language for End-to-end Code-switching Automatic Speech Recognition

no code implementations28 Oct 2020 Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Ye Bai, JianHua Tao, Zhengqi Wen

In this paper, we propose a decoupled transformer model to use monolingual paired data and unpaired text data to alleviate the problem of code-switching data shortage.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

One In A Hundred: Select The Best Predicted Sequence from Numerous Candidates for Streaming Speech Recognition

no code implementations28 Oct 2020 Zhengkun Tian, Jiangyan Yi, Ye Bai, JianHua Tao, Shuai Zhang, Zhengqi Wen

Inspired by the success of two-pass end-to-end models, we introduce a transformer decoder and the two-stage inference method into the streaming CTC model.

Language Modelling speech-recognition +1

Deep imitator: Handwriting calligraphy imitation via deep attention networks

no code implementations Pattern Recognition 2020 Bocheng Zhao, JianHua Tao, Minghao Yang, Zhengkun Tian, Cunhang Fan, Ye Bai

Calligraphy imitation (CI) from a handful of target handwriting samples is such a challenging task that most of the existing writing style analysis or handwriting generation methods do not exhibit satisfactory performance.

Deep Attention Handwriting generation

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition

no code implementations16 May 2020 Zhengkun Tian, Jiangyan Yi, Jian-Hua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen

To address this problem and improve the inference speed, we propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition, which introduces a CTC module to predict the length of the target sequence and accelerate the convergence.

Machine Translation speech-recognition +2

Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition

no code implementations19 Feb 2020 Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Jian-Hua Tao, Ye Bai

Recently, language identity information has been utilized to improve the performance of end-to-end code-switching (CS) speech recognition.

Language Identification speech-recognition +1

Synchronous Transformers for End-to-End Speech Recognition

no code implementations6 Dec 2019 Zhengkun Tian, Jiangyan Yi, Ye Bai, Jian-Hua Tao, Shuai Zhang, Zhengqi Wen

Once a fixed-length chunk of the input sequence is processed by the encoder, the decoder begins to predict symbols immediately.

speech-recognition Speech Recognition

Integrating Knowledge into End-to-End Speech Recognition from External Text-Only Data

no code implementations4 Dec 2019 Ye Bai, Jiangyan Yi, Jian-Hua Tao, Zhengqi Wen, Zhengkun Tian, Shuai Zhang

To alleviate the above two issues, we propose a unified method called LST (Learn Spelling from Teachers) to integrate knowledge into an AED model from the external text-only data and leverage the whole context in a sentence.

Language Modelling Sentence +2

Self-Attention Transducers for End-to-End Speech Recognition

no code implementations28 Sep 2019 Zhengkun Tian, Jiangyan Yi, Jian-Hua Tao, Ye Bai, Zhengqi Wen

Furthermore, a path-aware regularization is proposed to assist SA-T to learn alignments and improve the performance.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.