1 code implementation • 10 Jan 2024 • Jee-weon Jung, Roshan Sharma, William Chen, Bhiksha Raj, Shinji Watanabe
We tackle this challenge by proposing AugSumm, a method to leverage large language models (LLMs) as a proxy for human annotators to generate augmented summaries for training and evaluation.
no code implementations • 4 Oct 2023 • Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe
Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models.
Ranked #1 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 2 Oct 2023 • Muhammad Ahmed Shah, Roshan Sharma, Hira Dhamyal, Raphael Olivier, Ankit Shah, Joseph Konan, Dareen Alharthi, Hazim T Bukhari, Massa Baali, Soham Deshmukh, Michael Kuhlmann, Bhiksha Raj, Rita Singh
We hypothesize that for attacks to be transferrable, it is sufficient if the proxy can approximate the target model in the neighborhood of the harmful query.
1 code implementation • 1 Oct 2023 • Dareen Alharthi, Roshan Sharma, Hira Dhamyal, Soumi Maiti, Bhiksha Raj, Rita Singh
In this paper, we propose an evaluation technique involving the training of an ASR model on synthetic speech and assessing its performance on real speech.
no code implementations • 27 Sep 2023 • Xuankai Chang, Brian Yan, Kwanghee Choi, Jeeweon Jung, Yichen Lu, Soumi Maiti, Roshan Sharma, Jiatong Shi, Jinchuan Tian, Shinji Watanabe, Yuya Fujita, Takashi Maekaku, Pengcheng Guo, Yao-Fei Cheng, Pavel Denisov, Kohei Saijo, Hsiu-Hsuan Wang
Speech signals, typically sampled at rates in the tens of thousands per second, contain redundancies, evoking inefficiencies in sequence modeling.
1 code implementation • 25 Sep 2023 • Yifan Peng, Jinchuan Tian, Brian Yan, Dan Berrebbi, Xuankai Chang, Xinjian Li, Jiatong Shi, Siddhant Arora, William Chen, Roshan Sharma, Wangyou Zhang, Yui Sudo, Muhammad Shakeel, Jee-weon Jung, Soumi Maiti, Shinji Watanabe
Pre-training speech models on large volumes of data has achieved remarkable success.
1 code implementation • 18 Sep 2023 • Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee
To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark.
no code implementations • 17 Sep 2023 • Roshan Sharma, Suyoun Kim, Daniel Lazar, Trang Le, Akshat Shrivastava, Kwanghoon Ahn, Piyush Kansal, Leda Sari, Ozlem Kalinli, Michael Seltzer
Using the generated text with JAT and TTS for spoken semantic parsing improves EM on STOP by 1. 4% and 2. 6% absolute for existing and new domains respectively.
no code implementations • 17 Jul 2023 • Roshan Sharma, Kenneth Zheng, Siddhant Arora, Shinji Watanabe, Rita Singh, Bhiksha Raj
End-to-end speech summarization has been shown to improve performance over cascade baselines.
no code implementations • 20 Dec 2022 • Suwon Shon, Siddhant Arora, Chyi-Jiunn Lin, Ankita Pasad, Felix Wu, Roshan Sharma, Wei-Lun Wu, Hung-Yi Lee, Karen Livescu, Shinji Watanabe
In this work, we introduce several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.
no code implementations • 7 Nov 2022 • Roshan Sharma, Weipeng He, Ju Lin, Egor Lakomkin, Yang Liu, Kaustubh Kalgaonkar
In this paper, we first demonstrate that egocentric visual information is helpful for noise suppression.
no code implementations • 29 Oct 2022 • Roshan Sharma, Bhiksha Raj
Transformers are among the state of the art for many tasks in speech, vision, and natural language processing, among others.
no code implementations • 29 Oct 2022 • Roshan Sharma, Hira Dhamyal, Bhiksha Raj, Rita Singh
Accordingly, models that have been proposed for emotion detection use one or the other of these label types.
no code implementations • 25 Jun 2022 • Roshan Sharma, Tyler Vuong, Mark Lindsey, Hira Dhamyal, Rita Singh, Bhiksha Raj
This work presents a multitask approach to the simultaneous estimation of age, country of origin, and emotion given vocal burst audio for the 2022 ICML Expressive Vocalizations Challenge ExVo-MultiTask track.
no code implementations • 12 Oct 2021 • Roshan Sharma, Shruti Palaskar, Alan W Black, Florian Metze
End-to-end modeling of speech summarization models is challenging due to memory and compute constraints arising from long input audio sequences.