Search Results for author: Anuj Diwan

Found 7 papers, 4 papers with code

When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants

1 code implementation • 14 Jun 2023 • Anuj Diwan, Eunsol Choi, David Harwath

We present the first unified study of the efficiency of self-attention-based Transformer variants spanning text, speech and vision.

Paper
Code

Textless Low-Resource Speech-to-Speech Translation With Unit Language Models

1 code implementation • 24 May 2023 • Anuj Diwan, Anirudh Srinivasan, David Harwath, Eunsol Choi

We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains (European Parliament, Common Voice, and All India Radio) with single-speaker synthesized speech data.

Automatic Speech Recognition Denoising +6

Paper
Code

Continual Learning for On-Device Speech Recognition using Disentangled Conformers

no code implementations • 2 Dec 2022 • Anuj Diwan, Ching-Feng Yeh, Wei-Ning Hsu, Paden Tomasello, Eunsol Choi, David Harwath, Abdelrahman Mohamed

Additionally, current speech recognition models and continual learning algorithms are not optimized to be compute-efficient.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Zero-shot Video Moment Retrieval With Off-the-Shelf Models

no code implementations • 3 Nov 2022 • Anuj Diwan, Puyuan Peng, Raymond J. Mooney

For the majority of the machine learning community, the expensive nature of collecting high-quality human-annotated data and the inability to efficiently finetune very large state-of-the-art pretrained models on limited compute are major bottlenecks for building models for new tasks.

Moment Retrieval Retrieval

Paper
Add Code

Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality

1 code implementation • 1 Nov 2022 • Anuj Diwan, Layne Berry, Eunsol Choi, David Harwath, Kyle Mahowald

Recent visuolinguistic pre-trained models show promising progress on various end tasks such as image retrieval and video captioning.

Data Augmentation Image Retrieval +2

Paper
Code

Multilingual and code-switching ASR challenges for low resource Indian languages

1 code implementation • 1 Apr 2021 • Anuj Diwan, Rakesh Vaideeswaran, Sanket Shah, Ankita Singh, Srinivasa Raghavan, Shreya Khare, Vinit Unni, Saurabh Vyas, Akash Rajpuria, Chiranjeevi Yarra, Ashish Mittal, Prasanta Kumar Ghosh, Preethi Jyothi, Kalika Bali, Vivek Seshadri, Sunayana Sitaram, Samarth Bharadwaj, Jai Nanavati, Raoul Nanavati, Karthik Sankaranarayanan, Tejaswi Seeram, Basil Abraham

For this purpose, we provide a total of ~600 hours of transcribed speech data, comprising train and test sets, in these languages including two code-switched language pairs, Hindi-English and Bengali-English.

Automatic Speech Recognition (ASR) Sentence

Paper
Code

Reduce and Reconstruct: ASR for Low-Resource Phonetic Languages

no code implementations • 19 Oct 2020 • Anuj Diwan, Preethi Jyothi

This work presents a seemingly simple but effective technique to improve low-resource ASR systems for phonetic languages.

Speech Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.