Search Results for author: Vineet Garg

Found 8 papers, 0 papers with code

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

no code implementations • 9 Oct 2023 • Utkarsh, Sarawgi, John Berkowitz, Vineet Garg, Arnav Kundu, Minsik Cho, Sai Srujana Buddi, Saurabh Adya, Ahmed Tewfik

Streaming neural network models for fast frame-wise responses to various speech and sensory signals are widely adopted on resource-constrained platforms.

Paper
Add Code

Does Single-channel Speech Enhancement Improve Keyword Spotting Accuracy? A Case Study

no code implementations • 27 Sep 2023 • Avamarie Brueggeman, Takuya Higuchi, Masood Delfarah, Stephen Shum, Vineet Garg

Our investigation reveals that SE can improve KWS accuracy on noisy speech when the backend model is trained on clean speech; however, despite our extensive exploration, it is difficult to improve the KWS accuracy with SE when the backend is trained on noisy speech.

Automatic Speech Recognition Keyword Spotting +3

Paper
Add Code

Leveraging Large Language Models for Exploiting ASR Uncertainty

no code implementations • 9 Sep 2023 • Pranay Dighe, Yi Su, Shangshang Zheng, Yunshu Liu, Vineet Garg, Xiaochuan Niu, Ahmed Tewfik

While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Add Code

Device-Directed Speech Detection: Regularization via Distillation for Weakly-Supervised Models

no code implementations • 30 Mar 2022 • Vineet Garg, Ognjen Rudovic, Pranay Dighe, Ahmed H. Abdelaziz, Erik Marchi, Saurabh Adya, Chandra Dhir, Ahmed Tewfik

We also show that the ensemble of the LatticeRNN and acoustic-distilled models brings further accuracy improvement of 20%.

Knowledge Distillation

Paper
Add Code

Streaming on-device detection of device directed speech from voice and touch-based invocation

no code implementations • 9 Oct 2021 • Ognjen Rudovic, Akanksha Bindal, Vineet Garg, Pramod Simha, Pranay Dighe, Sachin Kajarekar

When interacting with smart devices such as mobile phones or wearables, the user typically invokes a virtual assistant (VA) by saying a keyword or by pressing a button on the device.

Computational Efficiency

Paper
Add Code

Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation

no code implementations • 14 May 2021 • Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha, Pranay Dighe, Chandra Dhir

We propose a streaming transformer (TF) encoder architecture, which progressively processes incoming audio chunks and maintains audio context to perform both VTD and FTM tasks using only acoustic features.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Progressive Voice Trigger Detection: Accuracy vs Latency

no code implementations • 29 Oct 2020 • Siddharth Sigtia, John Bridle, Hywel Richards, Pascal Clark, Erik Marchi, Vineet Garg

We first demonstrate that by including more audio context after a detected trigger phrase, we can indeed get a more accurate decision.

Paper
Add Code

Hybrid Transformer/CTC Networks for Hardware Efficient Voice Triggering

no code implementations • 5 Aug 2020 • Saurabh Adya, Vineet Garg, Siddharth Sigtia, Pramod Simha, Chandra Dhir

Our baseline is an acoustic model(AM), with BiLSTM layers, trained by minimizing the CTC loss.

Multi-Task Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.