Search Results for author: Roshan Sharma

Found 15 papers, 4 papers with code

AugSumm: towards generalizable speech summarization using synthetic labels from large language model

1 code implementation10 Jan 2024 Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, Shinji Watanabe

We tackle this challenge by proposing AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries for training and evaluation.

Language Modelling Large Language Model +1

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

no code implementations4 Oct 2023 Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe

Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models.

 Ranked #1 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

1 code implementation1 Oct 2023 Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech.

speech-recognition Speech Recognition +1

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

1 code implementation18 Sep 2023 Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee

To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark.

Augmenting text for spoken language understanding with Large Language Models

no code implementations17 Sep 2023 Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava, Kwanghoon Ahn, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer

Using the generated text with JAT and TTS for spoken semantic parsing improves EM on STOP by 1. 4% and 2. 6% absolute for existing and new domains respectively.

Semantic Parsing Spoken Language Understanding

BASS: Block-wise Adaptation for Speech Summarization

no code implementations17 Jul 2023 Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj

End-to-end speech summarization has been shown to improve performance over cascade baselines.

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

no code implementations20 Dec 2022 Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe

In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.

Dialog Act Classification Question Answering +4

Egocentric Audio-Visual Noise Suppression

no code implementations7 Nov 2022 Roshan Sharma, Weipeng He, Ju Lin, Egor Lakomkin, Yang Liu, Kaustubh Kalgaonkar

In this paper, we first demonstrate that egocentric visual information is helpful for noise suppression.

Action Classification Event Detection +3

XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers

no code implementations29 Oct 2022 Roshan Sharma, Bhiksha Raj

Transformers are among the state of the art for many tasks in speech, vision, and natural language processing, among others.

speech-recognition Speech Recognition

Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

no code implementations25 Jun 2022 Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj

This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track.

Speech Summarization using Restricted Self-Attention

no code implementations12 Oct 2021 Roshan Sharma, Shruti Palaskar, Alan W Black, Florian Metze

End-to-end modeling of speech summarization models is challenging due to memory and compute constraints arising from long input audio sequences.

Document Summarization speech-recognition +2

Cannot find the paper you are looking for? You can Submit a new open access paper.