Search Results for author: Vineet Garg

Found 8 papers, 0 papers with code

Streaming Anchor Loss: Augmenting Supervision with Temporal Significance

no code implementations9 Oct 2023 Utkarsh, Sarawgi, John Berkowitz, Vineet Garg, Arnav Kundu, Minsik Cho, Sai Srujana Buddi, Saurabh Adya, Ahmed Tewfik

Streaming neural network models for fast frame-wise responses to various speech and sensory signals are widely adopted on resource-constrained platforms.

Does Single-channel Speech Enhancement Improve Keyword Spotting Accuracy? A Case Study

no code implementations27 Sep 2023 Avamarie Brueggeman, Takuya Higuchi, Masood Delfarah, Stephen Shum, Vineet Garg

Our investigation reveals that SE can improve KWS accuracy on noisy speech when the backend model is trained on clean speech; however, despite our extensive exploration, it is difficult to improve the KWS accuracy with SE when the backend is trained on noisy speech.

Automatic Speech Recognition Keyword Spotting +3

Leveraging Large Language Models for Exploiting ASR Uncertainty

no code implementations9 Sep 2023 Pranay Dighe, Yi Su, Shangshang Zheng, Yunshu Liu, Vineet Garg, Xiaochuan Niu, Ahmed Tewfik

While large language models excel in a variety of natural language processing (NLP) tasks, to perform well on spoken language understanding (SLU) tasks, they must either rely on off-the-shelf automatic speech recognition (ASR) systems for transcription, or be equipped with an in-built speech modality.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Streaming on-device detection of device directed speech from voice and touch-based invocation

no code implementations9 Oct 2021 Ognjen Rudovic, Akanksha Bindal, Vineet Garg, Pramod Simha, Pranay Dighe, Sachin Kajarekar

When interacting with smart devices such as mobile phones or wearables, the user typically invokes a virtual assistant (VA) by saying a keyword or by pressing a button on the device.

Computational Efficiency

Streaming Transformer for Hardware Efficient Voice Trigger Detection and False Trigger Mitigation

no code implementations14 May 2021 Vineet Garg, Wonil Chang, Siddharth Sigtia, Saurabh Adya, Pramod Simha, Pranay Dighe, Chandra Dhir

We propose a streaming transformer (TF) encoder architecture, which progressively processes incoming audio chunks and maintains audio context to perform both VTD and FTM tasks using only acoustic features.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Progressive Voice Trigger Detection: Accuracy vs Latency

no code implementations29 Oct 2020 Siddharth Sigtia, John Bridle, Hywel Richards, Pascal Clark, Erik Marchi, Vineet Garg

We first demonstrate that by including more audio context after a detected trigger phrase, we can indeed get a more accurate decision.

Cannot find the paper you are looking for? You can Submit a new open access paper.