no code implementations • 28 Mar 2024 • Yash Jain, David Chan, Pranav Dheram, Aparna Khare, Olabanji Shonibare, Venkatesh Ravichandran, Shalini Ghosh
Recent advances in machine learning have demonstrated that multi-modal pre-training can improve automatic speech recognition (ASR) performance compared to randomly initialized models, even when models are fine-tuned on uni-modal tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 26 Jan 2024 • Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran
We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM).
no code implementations • 17 Jan 2024 • Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow
Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands.
no code implementations • 27 Mar 2023 • Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran
We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity.
no code implementations • 22 Oct 2022 • Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas
Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 2 Feb 2022 • Aparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke
We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 11 Feb 2021 • Karel Mundnich, Alexandra Fenster, Aparna Khare, Shiva Sundaram
To better study the task of highlight detection, we run a pilot experiment with highlights annotations for a small subset of video clips and fine-tune our best model on it.
no code implementations • 20 Nov 2020 • Aparna Khare, Srinivas Parthasarathy, Shiva Sundaram
Self-supervised learning has shown improvements on tasks with limited labeled datasets in domains like speech and natural language.
no code implementations • 10 Sep 2020 • Aparna Khare, Srinivas Parthasarathy, Shiva Sundaram
General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +6
no code implementations • ACL 2020 • Georgios Paraskevopoulos, Srinivas Parthasarathy, Aparna Khare, Shiva Sundaram
We particularly focus on the scene context provided by the visual information, to ground the ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 29 Apr 2020 • Georgios Paraskevopoulos, Srinivas Parthasarathy, Aparna Khare, Shiva Sundaram
We particularly focus on the scene context provided by the visual information, to ground the ASR.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 1 Feb 2020 • Sanna Wager, Aparna Khare, Minhua Wu, Kenichi Kumatani, Shiva Sundaram
Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1