Search Results for author: Roshan Sharma

Found 15 papers, 4 papers with code

AugSumm: towards generalizable speech summarization using synthetic labels from large language model

1 code implementation • 10 Jan 2024 • Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, Shinji Watanabe

We tackle this challenge by proposing AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries for training and evaluation.

Language Modelling Large Language Model +1

Paper
Code

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

no code implementations • 4 Oct 2023 • Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe

Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models.

Ranked #1 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

LoFT: Local Proxy Fine-tuning For Improving Transferability Of Adversarial Attacks Against Large Language Model

no code implementations • 2 Oct 2023 • Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Joseph Konan, Dareen Alharthi, Hazim T Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, Rita Singh

We hypothesize that for attacks to be transferrable, it is sufficient if the proxy can approximate the target model in the neighborhood of the harmful query.

Language Modelling Large Language Model

Paper
Add Code

Evaluating Speech Synthesis by Training Recognizers on Synthetic Speech

1 code implementation • 1 Oct 2023 • Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh

In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech.

speech-recognition Speech Recognition +1

Paper
Code

Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study

no code implementations • 27 Sep 2023 • Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang

Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling.

Automatic Speech Recognition Self-Supervised Learning +3

Paper
Add Code

Reproducing Whisper-Style Training Using an Open-Source Toolkit and Publicly Available Data

1 code implementation • 25 Sep 2023 • Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe

Pre-training speech models on large volumes of data has achieved remarkable success.

Speech Recognition Translation

7,875

Paper
Code

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

1 code implementation • 18 Sep 2023 • Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee

To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark.

Paper
Code

Augmenting text for spoken language understanding with Large Language Models

no code implementations • 17 Sep 2023 • Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava, Kwanghoon Ahn, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer

Using the generated text with JAT and TTS for spoken semantic parsing improves EM on STOP by 1. 4% and 2. 6% absolute for existing and new domains respectively.

Semantic Parsing Spoken Language Understanding

Paper
Add Code

BASS: Block-wise Adaptation for Speech Summarization

no code implementations • 17 Jul 2023 • Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj

End-to-end speech summarization has been shown to improve performance over cascade baselines.

Paper
Add Code

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

no code implementations • 20 Dec 2022 • Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe

In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.

Dialog Act Classification Question Answering +4

Paper
Add Code

Egocentric Audio-Visual Noise Suppression

no code implementations • 7 Nov 2022 • Roshan Sharma, Weipeng He, Ju Lin, Egor Lakomkin, Yang Liu, Kaustubh Kalgaonkar

In this paper, we first demonstrate that egocentric visual information is helpful for noise suppression.

Action Classification Event Detection +3

Paper
Add Code

XNOR-FORMER: Learning Accurate Approximations in Long Speech Transformers

no code implementations • 29 Oct 2022 • Roshan Sharma, Bhiksha Raj

Transformers are among the state of the art for many tasks in speech, vision, and natural language processing, among others.

speech-recognition Speech Recognition

Paper
Add Code

Unifying the Discrete and Continuous Emotion labels for Speech Emotion Recognition

no code implementations • 29 Oct 2022 • Roshan Sharma, Hira Dhamyal, Bhiksha Raj, Rita Singh

Accordingly, models that have been proposed for emotion detection use one or the other of these label types.

Multi-Task Learning Speech Emotion Recognition

Paper
Add Code

Self-supervision and Learnable STRFs for Age, Emotion, and Country Prediction

no code implementations • 25 Jun 2022 • Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj

This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track.

Paper
Add Code

Speech Summarization using Restricted Self-Attention

no code implementations • 12 Oct 2021 • Roshan Sharma, Shruti Palaskar, Alan W Black, Florian Metze

End-to-end modeling of speech summarization models is challenging due to memory and compute constraints arising from long input audio sequences.

Document Summarization speech-recognition +2

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.