Search Results for author: Anuj Diwan

Found 7 papers, 4 papers with code

When to Use Efficient Self Attention? Profiling Text, Speech and Image Transformer Variants

1 code implementation14 Jun 2023 Anuj Diwan, Eunsol Choi, David Harwath

We present the first unified study of the efficiency of self-attention-based Transformer variants spanning text, speech and vision.

Textless Low-Resource Speech-to-Speech Translation With Unit Language Models

1 code implementation24 May 2023 Anuj Diwan, Anirudh Srinivasan, David Harwath, Eunsol Choi

We train and evaluate our models for English-to-German, German-to-English and Marathi-to-English translation on three different domains (European Parliament, Common Voice, and All India Radio) with single-speaker synthesized speech data.

Automatic Speech Recognition Denoising +6

Zero-shot Video Moment Retrieval With Off-the-Shelf Models

no code implementations3 Nov 2022 Anuj Diwan, Puyuan Peng, Raymond J. Mooney

For the majority of the machine learning community, the expensive nature of collecting high-quality human-annotated data and the inability to efficiently finetune very large state-of-the-art pretrained models on limited compute are major bottlenecks for building models for new tasks.

Moment Retrieval Retrieval

Why is Winoground Hard? Investigating Failures in Visuolinguistic Compositionality

1 code implementation1 Nov 2022 Anuj Diwan, Layne Berry, Eunsol Choi, David Harwath, Kyle Mahowald

Recent visuolinguistic pre-trained models show promising progress on various end tasks such as image retrieval and video captioning.

Data Augmentation Image Retrieval +2

Reduce and Reconstruct: ASR for Low-Resource Phonetic Languages

no code implementations19 Oct 2020 Anuj Diwan, Preethi Jyothi

This work presents a seemingly simple but effective technique to improve low-resource ASR systems for phonetic languages.

Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.