no code implementations • 3 Dec 2023 • Zhilin Lu, Rongpeng Li, Ming Lei, Chan Wang, Zhifeng Zhao, Honggang Zhang
In particular, to enable stable optimization via a nondifferentiable semantic metric, we regard sentence similarity as a reward and formulate this learning process as an RL problem.
no code implementations • 6 Feb 2023 • Wenkang Xu, Yongbo Xiao, An Liu, Ming Lei, MinJian Zhao
A location domain channel modeling method is proposed based on the position of targets and scatterers in the scattering environment, and the resulting radar and communication channels exhibit a two-dimensional (2-D) joint burst sparsity.
no code implementations • 16 Feb 2022 • Yi Ren, Ming Lei, Zhiying Huang, Shiliang Zhang, Qian Chen, Zhijie Yan, Zhou Zhao
Specifically, we first introduce a word-level prosody encoder, which quantizes the low-frequency band of the speech and compresses prosody attributes in the latent prosody vector (LPV).
2 code implementations • 28 Nov 2021 • Zhihao Du, Shiliang Zhang, Siqi Zheng, Weilong Huang, Ming Lei
In this paper, we reformulate this task as a single-label prediction problem by encoding the multi-speaker labels with power set.
no code implementations • 14 Oct 2021 • Ziyue Jiang, Yi Ren, Ming Lei, Zhou Zhao
Federated learning enables collaborative training of machine learning models under strict privacy restrictions and federated text-to-speech aims to synthesize natural speech of multiple users with a few audio training samples stored in their devices locally.
no code implementations • 9 Sep 2021 • Siqi Zheng, Shiliang Zhang, Weilong Huang, Qian Chen, Hongbin Suo, Ming Lei, Jinwei Feng, Zhijie Yan
We propose BeamTransformer, an efficient architecture to leverage beamformer's edge in spatial filtering and transformer's capability in context sequence modeling.
no code implementations • 17 Jun 2021 • Chenye Cui, Yi Ren, Jinglin Liu, Feiyang Chen, Rongjie Huang, Ming Lei, Zhou Zhao
Finally, by showing a comparable performance in the emotional speech synthesis task, we successfully demonstrate the ability of the proposed model.
no code implementations • 11 Jun 2020 • Yi Wei, Ming-Min Zhao, Min-Jian Zhao, Ming Lei
Linear Programming (LP) is an important decoding technique for binary linear codes.
no code implementations • 21 May 2020 • Haoneng Luo, Shiliang Zhang, Ming Lei, Lei Xie
Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies.
1 code implementation • 21 May 2020 • Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie
Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention.
Sound Audio and Speech Processing
no code implementations • 14 Feb 2020 • Yi Wei, Ming-Min Zhao, Min-Jian Zhao, Ming Lei
Inspired by the recent advances in deep learning (DL), this work presents a deep neural network aided decoding algorithm for binary linear codes.
1 code implementation • 10 Jun 2019 • Yi Wei, Ming-Min Zhao, Mingyi Hong, Min-Jian Zhao, Ming Lei
Furthermore, in order to reduce the memory costs, a novel quantized LcgNet is proposed, where a low-resolution nonuniform quantizer is integrated into the LcgNet to smartly quantize the aforementioned step-sizes.
no code implementations • 27 Mar 2019 • Shiliang Zhang, Ming Lei, Zhijie Yan
Results in a 20, 000 hours Mandarin speech recognition task show that the proposed spelling correction model can achieve a CER of 3. 41%, which results in 22. 9% and 53. 2% relative improvement compared to the baseline CTC-based systems decoded with and without language model respectively.
1 code implementation • 4 Mar 2018 • Shiliang Zhang, Ming Lei, Zhijie Yan, Li-Rong Dai
In a 20000 hours Mandarin recognition task, the LFR trained DFSMN can achieve more than 20% relative improvement compared to the LFR trained BLSTM.
no code implementations • 26 Feb 2018 • Mengxiao Bi, Heng Lu, Shiliang Zhang, Ming Lei, Zhijie Yan
The Bidirectional LSTM (BLSTM) RNN based speech synthesis system is among the best parametric Text-to-Speech (TTS) systems in terms of the naturalness of generated speech, especially the naturalness in prosody.
no code implementations • 4 Jun 2017 • Yan Zhang, An Pan, Ming Lei, Baoli Yao
Fourier ptychographic microscopy (FPM) is a recently proposed computational imaging technique with both high resolution and wide field-of-view.