no code implementations • 26 Feb 2024 • Anne Wu, Kianté Brantley, Yoav Artzi
This study evaluates three state-of-the-art MLLMs -- GPT-4V, Gemini Pro, and the open-source model IDEFICS -- on the compositional natural language vision reasoning task NLVR.
no code implementations • 3 Nov 2022 • Anne Wu, Kianté Brantley, Noriyuki Kojima, Yoav Artzi
We present lilGym, a new benchmark for language-conditioned reinforcement learning in visual environments.
no code implementations • 14 Apr 2021 • Changhan Wang, Anne Wu, Juan Pino, Alexei Baevski, Michael Auli, Alexis Conneau
In this paper, we improve speech translation (ST) through effectively leveraging large quantities of unlabeled speech and text data in different and complementary ways.
1 code implementation • ACL 2021 • Changhan Wang, Morgane Rivière, Ann Lee, Anne Wu, Chaitanya Talnikar, Daniel Haziza, Mary Williamson, Juan Pino, Emmanuel Dupoux
We introduce VoxPopuli, a large-scale multilingual corpus providing 100K hours of unlabelled speech data in 23 languages.
Ranked #3 on Speech Recognition on Common Voice French (using extra training data)
3 code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.
Ranked #8 on Speech-to-Text Translation on MuST-C EN->DE
2 code implementations • 20 Jul 2020 • Changhan Wang, Anne Wu, Juan Pino
Speech translation has recently become an increasingly popular topic of research, partly due to the development of benchmark datasets.
no code implementations • 22 Jun 2020 • Anne Wu, Changhan Wang, Juan Pino, Jiatao Gu
End-to-end speech-to-text translation can provide a simpler and smaller system but is facing the challenge of data scarcity.
1 code implementation • LREC 2020 • Changhan Wang, Juan Pino, Anne Wu, Jiatao Gu
Spoken language translation has recently witnessed a resurgence in popularity, thanks to the development of end-to-end models and the creation of new corpora, such as Augmented LibriSpeech and MuST-C.