no code implementations • 24 Mar 2024 • Bolin Ni, Hongbo Zhao, Chenghao Zhang, Ke Hu, Gaofeng Meng, Zhaoxiang Zhang, Shiming Xiang
Existing methods commonly utilize the one-hot labels and randomly initialize the classifier head.
no code implementations • 23 Jan 2024 • W. Ronny Huang, Cyril Allauzen, Tongzhou Chen, Kilol Gupta, Ke Hu, James Qin, Yu Zhang, Yongqiang Wang, Shuo-Yiin Chang, Tara N. Sainath
In the era of large models, the autoregressive nature of decoding often results in latency serving as a significant bottleneck.
1 code implementation • 12 Dec 2023 • Ke Hu, Weidong Qiu, Peng Tang
Our comprehensive analysis reveals that FNR-FL not only accelerates convergence but also significantly surpasses other contemporary federated learning algorithms in test accuracy, particularly under feature distribution skew scenarios.
no code implementations • 11 Aug 2023 • Cal Peyser, Zhong Meng, Ke Hu, Rohit Prabhavalkar, Andrew Rosenberg, Tara N. Sainath, Michael Picheny, Kyunghyun Cho
The last year has seen astonishing progress in text-prompted image generation premised on the idea of a cross-modal representation space in which the text and image domains are represented jointly.
no code implementations • 25 May 2023 • Ke Hu, Bo Li, Tara N. Sainath, Yu Zhang, Francoise Beaufays
We evaluate the proposed model on a set of 12 languages, and achieve an average 11. 9% relative improvement in WER over the baseline.
no code implementations • 23 Mar 2023 • Sepand Mavandadi, Tara N. Sainath, Ke Hu, Zelin Wu
We propose a new two-pass E2E speech recognition model that improves ASR performance by training on a combination of paired data and unpaired text data.
no code implementations • 2 Mar 2023 • Yu Zhang, Wei Han, James Qin, Yongqiang Wang, Ankur Bapna, Zhehuai Chen, Nanxin Chen, Bo Li, Vera Axelrod, Gary Wang, Zhong Meng, Ke Hu, Andrew Rosenberg, Rohit Prabhavalkar, Daniel S. Park, Parisa Haghani, Jason Riesa, Ginger Perng, Hagen Soltau, Trevor Strohman, Bhuvana Ramabhadran, Tara Sainath, Pedro Moreno, Chung-Cheng Chiu, Johan Schalkwyk, Françoise Beaufays, Yonghui Wu
We introduce the Universal Speech Model (USM), a single large model that performs automatic speech recognition (ASR) across 100+ languages.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 17 Feb 2023 • Ke Hu, Tara N. Sainath, Bo Li, Nan Du, Yanping Huang, Andrew M. Dai, Yu Zhang, Rodrigo Cabrera, Zhifeng Chen, Trevor Strohman
In this work, we propose to train a single multilingual language model (LM) for shallow fusion in multiple languages.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 11 Oct 2022 • Ke Hu, Bo Li, Tara N. Sainath
In this work, we investigate second-pass deliberation for multilingual speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 29 Jun 2022 • Ke Hu, Tara N. Sainath, Yanzhang He, Rohit Prabhavalkar, Trevor Strohman, Sepand Mavandadi, Weiran Wang
Text-only and semi-supervised training based on audio-only data has gained popularity recently due to the wide availability of unlabeled text and speech data.
no code implementations • 5 May 2022 • Xin Chen, Qingtao Tang, Ke Hu, Yue Xu, Shihang Qiu, Jia Cheng, Jun Lei
In Meituan, one of the largest e-commerce platform in China, an item is typically displayed with its image and whether a user clicks the item or not is usually influenced by its image, which implies that user's image behaviors are helpful for understanding user's visual preference and improving the accuracy of CTR prediction.
no code implementations • 15 Apr 2022 • Weiran Wang, Ke Hu, Tara N. Sainath
We propose a streaming non-autoregressive (non-AR) decoding algorithm to deliberate the hypothesis alignment of a streaming RNN-T model.
no code implementations • 18 Jan 2022 • Ke Hu, Yi Qi, Jianqiang Huang, Jia Cheng, Jun Lei
To address this problem, we formulate CTR prediction as a continual learning task and propose COLF, a hybrid COntinual Learning Framework for CTR prediction, which has a memory-based modular architecture that is designed to adapt, learn and give predictions continuously when faced with non-stationary drifting click data streams.
1 code implementation • 25 Nov 2021 • Jin Xu, Mingjian Chen, Jianqiang Huang, Xingyuan Tang, Ke Hu, Jian Li, Jia Cheng, Jun Lei
Graph Neural Networks (GNNs) have become increasingly popular and achieved impressive results in many graph-based applications.
no code implementations • 6 Jul 2021 • Huaju Liang, Hongyang Bai, Ke Hu, Xinbo Lv
This paper proposes an artificial neural network to determine orientation using polarized skylight.
1 code implementation • 10 Jun 2021 • Jianqiang Huang, Ke Hu, Qingtao Tang, Mingjian Chen, Yi Qi, Jia Cheng, Jun Lei
Click-through rate (CTR) prediction plays an important role in online advertising and recommender systems.
no code implementations • 11 Mar 2021 • David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao, Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara N. Sainath, Ian McGraw
We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 27 Jan 2021 • Ke Hu, Ruoming Pang, Tara N. Sainath, Trevor Strohman
In this work, we explore using transformer layers instead of long-short term memory (LSTM) layers for deliberation rescoring.
no code implementations • 4 Nov 2020 • Xun Yuan, Ke Hu, Song Chen
We design Gaussian loss for the training process of SobelNet to detect corner points as keypoints.
no code implementations • 13 Aug 2020 • Shaojin Ding, Ye Jia, Ke Hu, Quan Wang
In this paper, we propose Textual Echo Cancellation (TEC) - a framework for cancelling the text-to-speech (TTS) playback echo from overlapping speech recordings.
no code implementations • 28 Mar 2020 • Tara N. Sainath, Yanzhang He, Bo Li, Arun Narayanan, Ruoming Pang, Antoine Bruguier, Shuo-Yiin Chang, Wei Li, Raziel Alvarez, Zhifeng Chen, Chung-Cheng Chiu, David Garcia, Alex Gruenstein, Ke Hu, Minho Jin, Anjuli Kannan, Qiao Liang, Ian McGraw, Cal Peyser, Rohit Prabhavalkar, Golan Pundak, David Rybach, Yuan Shangguan, Yash Sheth, Trevor Strohman, Mirko Visontai, Yonghui Wu, Yu Zhang, Ding Zhao
Thus far, end-to-end (E2E) models have not been shown to outperform state-of-the-art conventional models with respect to both quality, i. e., word error rate (WER), and latency, i. e., the time the hypothesis is finalized after the user stops speaking.
no code implementations • 17 Mar 2020 • Ke Hu, Tara N. Sainath, Ruoming Pang, Rohit Prabhavalkar
End-to-end (E2E) models have made rapid progress in automatic speech recognition (ASR) and perform competitively relative to conventional models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 21 Jun 2019 • Ke Hu, Antoine Bruguier, Tara N. Sainath, Rohit Prabhavalkar, Golan Pundak
Contextual automatic speech recognition, i. e., biasing recognition towards a given context (e. g. user's playlists, or contacts), is challenging in end-to-end (E2E) models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 17 Jun 2019 • Ke Hu, Hasim Sak, Hank Liao
In this work, we apply the domain adversarial network to encourage the shared layers of a multilingual model to learn language-invariant features.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • WS 2018 • Antonio Toral, Sheila Castilho, Ke Hu, Andy Way
We reassess a recent study (Hassan et al., 2018) that claimed that machine translation (MT) has reached human parity for the translation of news from Chinese into English, using pairwise ranking and considering three variables that were not taken into account in that previous study: the language in which the source side of the test set was originally written, the translation proficiency of the evaluators, and the provision of inter-sentential context.