no code implementations • 3 Nov 2023 • Tian Yun, Zilai Zeng, Kunal Handa, Ashish V. Thapliyal, Bo Pang, Ellie Pavlick, Chen Sun
Decision making via sequence modeling aims to mimic the success of language models, where actions taken by an embodied agent are modeled as tokens to predict.
1 code implementation • 12 Sep 2022 • Soravit Changpinyo, Linting Xue, Michal Yarom, Ashish V. Thapliyal, Idan Szpektor, Julien Amelot, Xi Chen, Radu Soricut
In this paper, we propose scalable solutions to multilingual visual question answering (mVQA), on both data and modeling fronts.
no code implementations • 25 May 2022 • Ashish V. Thapliyal, Jordi Pont-Tuset, Xi Chen, Radu Soricut
Research in massively multilingual image captioning has been severely hampered by a lack of high-quality evaluation datasets.
no code implementations • COLING 2022 • Wanrong Zhu, Bo Pang, Ashish V. Thapliyal, William Yang Wang, Radu Soricut
Dense video captioning aims to identify the events of interest in an input video, and generate descriptive captions for each event.
Ranked #3 on Dense Video Captioning on ViTT (CIDEr metric, using extra training data)
no code implementations • ACL 2020 • Ashish V. Thapliyal, Radu Soricut
Cross-modal language generation tasks such as image captioning are directly hurt in their ability to support non-English languages by the trend of data-hungry models combined with the lack of non-English annotations.
1 code implementation • NAACL 2021 • Tomer Levinboim, Ashish V. Thapliyal, Piyush Sharma, Radu Soricut
Automatic image captioning has improved significantly over the last few years, but the problem is far from being solved, with state of the art models still often producing low quality captions when used in the wild.