Search Results for author: Jian-Hua Tao

Found 23 papers, 2 papers with code

Spike-Triggered Non-Autoregressive Transformer for End-to-End Speech Recognition

no code implementations • 16 May 2020 • Zhengkun Tian, Jiangyan Yi, Jian-Hua Tao, Ye Bai, Shuai Zhang, Zhengqi Wen

To address this problem and improve the inference speed, we propose a spike-triggered non-autoregressive transformer model for end-to-end speech recognition, which introduces a CTC module to predict the length of the target sequence and accelerate the convergence.

Machine Translation speech-recognition +2

Paper
Add Code

Listen Attentively, and Spell Once: Whole Sentence Generation via a Non-Autoregressive Architecture for Low-Latency Speech Recognition

no code implementations • 11 May 2020 • Ye Bai, Jiangyan Yi, Jian-Hua Tao, Zhengkun Tian, Zhengqi Wen, Shuai Zhang

Without beam-search, the one-pass propagation much reduces inference time cost of LASO.

Sentence speech-recognition +1

Paper
Add Code

Simultaneous Denoising and Dereverberation Using Deep Embedding Features

no code implementations • 6 Apr 2020 • Cunhang Fan, Jian-Hua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen

In this paper, we propose a joint training method for simultaneous speech denoising and dereverberation using deep embedding features, which is based on the deep clustering (DC).

Clustering Deep Clustering +4

Paper
Add Code

Adversarial Transfer Learning for Punctuation Restoration

no code implementations • 1 Apr 2020 • Jiangyan Yi, Jian-Hua Tao, Ye Bai, Zhengkun Tian, Cunhang Fan

The other is that POS tags are provided by an external POS tagger.

Language Modelling Multi-Task Learning +4

Paper
Add Code

Deep Attention Fusion Feature for Speech Separation with End-to-End Post-filter Method

no code implementations • 17 Mar 2020 • Cunhang Fan, Jian-Hua Tao, Bin Liu, Jiangyan Yi, Zhengqi Wen, Xuefei Liu

Secondly, to pay more attention to the outputs of the pre-separation stage, an attention module is applied to acquire deep attention fusion features, which are extracted by computing the similarity between the mixture and the pre-separated speech.

Deep Attention Speech Separation

Paper
Add Code

Rnn-transducer with language bias for end-to-end Mandarin-English code-switching speech recognition

no code implementations • 19 Feb 2020 • Shuai Zhang, Jiangyan Yi, Zhengkun Tian, Jian-Hua Tao, Ye Bai

Recently, language identity information has been utilized to improve the performance of end-to-end code-switching (CS) speech recognition.

Language Identification speech-recognition +1

Paper
Add Code

Spatial and spectral deep attention fusion for multi-channel speech separation using deep embedding features

no code implementations • 5 Feb 2020 • Cunhang Fan, Bin Liu, Jian-Hua Tao, Jiangyan Yi, Zhengqi Wen

Specifically, we apply the deep clustering network to extract deep embedding features.

Clustering Deep Attention +2

Paper
Add Code

Synchronous Transformers for End-to-End Speech Recognition

no code implementations • 6 Dec 2019 • Zhengkun Tian, Jiangyan Yi, Ye Bai, Jian-Hua Tao, Shuai Zhang, Zhengqi Wen

Once a fixed-length chunk of the input sequence is processed by the encoder, the decoder begins to predict symbols immediately.

speech-recognition Speech Recognition

Paper
Add Code

Integrating Knowledge into End-to-End Speech Recognition from External Text-Only Data

no code implementations • 4 Dec 2019 • Ye Bai, Jiangyan Yi, Jian-Hua Tao, Zhengqi Wen, Zhengkun Tian, Shuai Zhang

To alleviate the above two issues, we propose a unified method called LST (Learn Spelling from Teachers) to integrate knowledge into an AED model from the external text-only data and leverage the whole context in a sentence.

Language Modelling Sentence +2

Paper
Add Code

Unsupervised Representation Learning with Future Observation Prediction for Speech Emotion Recognition

no code implementations • 24 Oct 2019 • Zheng Lian, Jian-Hua Tao, Bin Liu, Jian Huang

Prior works on speech emotion recognition utilize various unsupervised learning approaches to deal with low-resource samples.

Representation Learning Speech Emotion Recognition +1

Paper
Add Code

Conversational Emotion Analysis via Attention Mechanisms

no code implementations • 24 Oct 2019 • Zheng Lian, Jian-Hua Tao, Bin Liu, Jian Huang

Different from the emotion recognition in individual utterances, we propose a multimodal learning framework using relation and dependencies among the utterances for conversational emotion analysis.

Emotion Recognition

Paper
Add Code

Domain adversarial learning for emotion recognition

no code implementations • 24 Oct 2019 • Zheng Lian, Jian-Hua Tao, Bin Liu, Jian Huang

The secondary task is to learn a common representation where speaker identities can not be distinguished.

Emotion Recognition

Paper
Add Code

Expression Analysis Based on Face Regions in Read-world Conditions

no code implementations • 23 Oct 2019 • Zheng Lian, Ya Li, Jian-Hua Tao, Jian Huang, Ming-Yue Niu

To sum up, the contributions of this paper lie in two areas: 1) We visualize concerned areas of human faces in emotion recognition; 2) We analyze the contribution of different face areas to different emotions in real-world conditions through experimental analysis.

Facial Emotion Recognition Facial Expression Recognition +1

Paper
Add Code

Speech Emotion Recognition via Contrastive Loss under Siamese Networks

no code implementations • 23 Oct 2019 • Zheng Lian, Ya Li, Jian-Hua Tao, Jian Huang

It outperforms the baseline system that is optimized without the contrastive loss function with 1. 14% and 2. 55% in the weighted accuracy and the unweighted accuracy, respectively.

feature selection Speech Emotion Recognition

Paper
Add Code

Self-Attention Transducers for End-to-End Speech Recognition

no code implementations • 28 Sep 2019 • Zhengkun Tian, Jiangyan Yi, Jian-Hua Tao, Ye Bai, Zhengqi Wen

Furthermore, a path-aware regularization is proposed to assist SA-T to learn alignments and improve the performance.

speech-recognition Speech Recognition

Paper
Add Code

Discriminative Learning for Monaural Speech Separation Using Deep Embedding Features

no code implementations • 23 Jul 2019 • Cunhang Fan, Bin Liu, Jian-Hua Tao, Jiangyan Yi, Zhengqi Wen

Firstly, a DC network is trained to extract deep embedding features, which contain each source's information and have an advantage in discriminating each target speakers.

Clustering Deep Clustering +1

Paper
Add Code

Forward-Backward Decoding for Regularizing End-to-End TTS

1 code implementation • 18 Jul 2019 • Yibin Zheng, Xi Wang, Lei He, Shifeng Pan, Frank K. Soong, Zhengqi Wen, Jian-Hua Tao

Experimental results show our proposed methods especially the second one (bidirectional decoder regularization), leads a significantly improvement on both robustness and overall naturalness, as outperforming baseline (the revised version of Tacotron2) with a MOS gap of 0. 14 in a challenging test, and achieving close to human quality (4. 42 vs. 4. 49 in MOS) on general test.

29,314

Paper
Code