Search Results for author: Aparna Khare

Found 12 papers, 0 papers with code

Multi-Stage Multi-Modal Pre-Training for Automatic Speech Recognition

no code implementations • 28 Mar 2024 • Yash Jain, David Chan, Pranav Dheram, Aparna Khare, Olabanji Shonibare, Venkatesh Ravichandran, Shalini Ghosh

Recent advances in machine learning have demonstrated that multi-modal pre-training can improve automatic speech recognition (ASR) performance compared to randomly initialized models, even when models are fine-tuned on uni-modal tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Turn-taking and Backchannel Prediction with Acoustic and Large Language Model Fusion

no code implementations • 26 Jan 2024 • Jinhan Wang, Long Chen, Aparna Khare, Anirudh Raju, Pranav Dheram, Di He, Minhua Wu, Andreas Stolcke, Venkatesh Ravichandran

We propose an approach for continuous prediction of turn-taking and backchanneling locations in spoken dialogue by fusing a neural acoustic model with a large language model (LLM).

Language Modelling Large Language Model

Paper
Add Code

Two-pass Endpoint Detection for Speech Recognition

no code implementations • 17 Jan 2024 • Anirudh Raju, Aparna Khare, Di He, Ilya Sklyar, Long Chen, Sam Alptekin, Viet Anh Trinh, Zhe Zhang, Colin Vaz, Venkatesh Ravichandran, Roland Maas, Ariya Rastrow

Endpoint (EP) detection is a key component of far-field speech recognition systems that assist the user through voice commands.

speech-recognition Speech Recognition

Paper
Add Code

Cross-utterance ASR Rescoring with Graph-based Label Propagation

no code implementations • 27 Mar 2023 • Srinath Tankasala, Long Chen, Andreas Stolcke, Anirudh Raju, Qianli Deng, Chander Chandak, Aparna Khare, Roland Maas, Venkatesh Ravichandran

We propose a novel approach for ASR N-best hypothesis rescoring with graph-based label propagation by leveraging cross-utterance acoustic similarity.

Fairness Language Modelling

Paper
Add Code

Guided contrastive self-supervised pre-training for automatic speech recognition

no code implementations • 22 Oct 2022 • Aparna Khare, Minhua Wu, Saurabhchand Bhati, Jasha Droppo, Roland Maas

Contrastive Predictive Coding (CPC) is a representation learning method that maximizes the mutual information between intermediate latent representations and the output of a given model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

ASR-Aware End-to-end Neural Diarization

no code implementations • 2 Feb 2022 • Aparna Khare, Eunjung Han, Yuguang Yang, Andreas Stolcke

We present a Conformer-based end-to-end neural diarization (EEND) model that uses both acoustic input and features derived from an automatic speech recognition (ASR) model.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Audiovisual Highlight Detection in Videos

no code implementations • 11 Feb 2021 • Karel Mundnich, Alexandra Fenster, Aparna Khare, Shiva Sundaram

To better study the task of highlight detection, we run a pilot experiment with highlights annotations for a small subset of video clips and fine-tune our best model on it.

Highlight Detection Object Recognition +2

Paper
Add Code

Self-Supervised learning with cross-modal transformers for emotion recognition

no code implementations • 20 Nov 2020 • Aparna Khare, Srinivas Parthasarathy, Shiva Sundaram

Self-supervised learning has shown improvements on tasks with limited labeled datasets in domains like speech and natural language.

Emotion Recognition Language Modelling +4

Paper
Add Code

Multi-modal embeddings using multi-task learning for emotion recognition

no code implementations • 10 Sep 2020 • Aparna Khare, Srinivas Parthasarathy, Shiva Sundaram

General embeddings like word2vec, GloVe and ELMo have shown a lot of success in natural language tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Add Code

Multimodal and Multiresolution Speech Recognition with Transformers

no code implementations • ACL 2020 • Georgios Paraskevopoulos, Srinivas Parthasarathy, Aparna Khare, Shiva Sundaram

We particularly focus on the scene context provided by the visual information, to ground the ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Multiresolution and Multimodal Speech Recognition with Transformers

no code implementations • 29 Apr 2020 • Georgios Paraskevopoulos, Srinivas Parthasarathy, Aparna Khare, Shiva Sundaram

We particularly focus on the scene context provided by the visual information, to ground the ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Fully Learnable Front-End for Multi-Channel Acoustic Modeling using Semi-Supervised Learning

no code implementations • 1 Feb 2020 • Sanna Wager, Aparna Khare, Minhua Wu, Kenichi Kumatani, Shiva Sundaram

Using a large offline teacher model trained on beamformed audio, we trained a simpler multi-channel student acoustic model used in the speech recognition system.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.