1 code implementation • 14 Jun 2023 • Anuj Diwan, Eunsol Choi, David Harwath
We present the first unified study of the efficiency of self-attention-based Transformer variants spanning text, speech and vision.
1 code implementation • 24 May 2023 • Anuj Diwan, Anirudh Srinivasan, David Harwath, Eunsol Choi
We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains (European Parliament, Common Voice, and All India Radio) with single-speaker synthesized speech data.
no code implementations • 2 Dec 2022 • Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed
Additionally, current speech recognition models and continual learning algorithms are not optimized to be compute-efficient.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 3 Nov 2022 • Anuj Diwan, Puyuan Peng, Raymond J. Mooney
For the majority of the machine learning community, the expensive nature of collecting high-quality human-annotated data and the inability to efficiently finetune very large state-of-the-art pretrained models on limited compute are major bottlenecks for building models for new tasks.
1 code implementation • 1 Nov 2022 • Anuj Diwan, Layne Berry, Eunsol Choi, David Harwath, Kyle Mahowald
Recent visuolinguistic pre-trained models show promising progress on various end tasks such as image retrieval and video captioning.
1 code implementation • 1 Apr 2021 • Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham
For this purpose, we provide a total of ~600 hours of transcribed speech data, comprising train and test sets, in these languages including two code-switched language pairs, Hindi-English and Bengali-English.
no code implementations • 19 Oct 2020 • Anuj Diwan, Preethi Jyothi
This work presents a seemingly simple but effective technique to improve low-resource ASR systems for phonetic languages.