Search Results for author: Yuekai Zhang

Found 6 papers, 5 papers with code

TorchAudio 2.1: Advancing speech recognition, self-supervised learning, and audio processing components for PyTorch

1 code implementation • 27 Oct 2023 • Jeff Hwang, Moto Hira, Caroline Chen, Xiaohui Zhang, Zhaoheng Ni, Guangzhi Sun, Pingchuan Ma, Ruizhe Huang, Vineel Pratap, Yuekai Zhang, Anurag Kumar, Chin-Yun Yu, Chuang Zhu, Chunxi Liu, Jacob Kahn, Mirco Ravanelli, Peng Sun, Shinji Watanabe, Yangyang Shi, Yumeng Tao, Robin Scheibler, Samuele Cornell, Sean Kim, Stavros Petridis

TorchAudio is an open-source audio and speech processing library built for PyTorch.

Self-Supervised Learning Speech Enhancement +2

2,394

Paper
Code

LightVessel: Exploring Lightweight Coronary Artery Vessel Segmentation via Similarity Knowledge Distillation

no code implementations • 2 Nov 2022 • Hao Dang, Yuekai Zhang, Xingqun Qi, Wanting Zhou, Muyi Sun

To tackle this problem, we propose \textbf{LightVessel}, a Similarity Knowledge Distillation Framework, for lightweight coronary artery vessel segmentation.

Decoder Knowledge Distillation

Paper
Add Code

TrimTail: Low-Latency Streaming ASR with Simple but Effective Spectrogram-Level Length Penalty

1 code implementation • 1 Nov 2022 • Xingchen Song, Di wu, Zhiyong Wu, BinBin Zhang, Yuekai Zhang, Zhendong Peng, Wenpeng Li, Fuping Pan, Changbao Zhu

In this paper, we present TrimTail, a simple but effective emission regularization method to improve the latency of streaming ASR models.

3,754

Paper
Code

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

2 code implementations • 29 Nov 2021 • Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe

However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks.

Spoken Language Understanding

7,953

Paper
Code

SPGISpeech: 5,000 hours of transcribed financial audio for fully formatted end-to-end speech recognition

1 code implementation • 5 Apr 2021 • Patrick K. O'Neill, Vitaly Lavrukhin, Somshubra Majumdar, Vahid Noroozi, Yuekai Zhang, Oleksii Kuchaiev, Jagadeesh Balam, Yuliya Dovzhenko, Keenan Freyberg, Michael D. Shulman, Boris Ginsburg, Shinji Watanabe, Georg Kucsko

In the English speech-to-text (STT) machine learning task, acoustic models are conventionally trained on uncased Latin characters, and any necessary orthography (such as capitalization, punctuation, and denormalization of non-standard words) is imputed by separate post-processing models.

Ranked #3 on Speech Recognition on SPGISpeech

speech-recognition Speech Recognition

7,952

Paper
Code

Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

1 code implementation • 22 Oct 2020 • Jiatong Shi, Shuai Guo, Nan Huo, Yuekai Zhang, Qin Jin

The neural network (NN) based singing voice synthesis (SVS) systems require sufficient data to train well and are prone to over-fitting due to data scarcity.

Singing Voice Synthesis

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.