no code implementations • 9 Dec 2022 • Shen Yan, Tao Zhu, ZiRui Wang, Yuan Cao, Mi Zhang, Soham Ghosh, Yonghui Wu, Jiahui Yu
We explore an efficient approach to establish a foundational video-text model.
Ranked #1 on Video Captioning on ActivityNet Captions (using extra training data)
1 code implementation • NAACL 2019 • Soham Ghosh, Anuva Agarwal, Zarana Parekh, Alexander Hauptmann
The task of retrieving clips within videos based on a given natural language query requires cross-modal reasoning over multiple frames.
1 code implementation • 7 Mar 2019 • Emilio Parisotto, Soham Ghosh, Sai Bhargav Yalamanchi, Varsha Chinnaobireddy, Yuhuai Wu, Ruslan Salakhutdinov
In this multi-agent setting, a set of parallel agents are executed in the same environment and each of these "rollout" agents are given the means to communicate with each other.
Ranked #1 on Meta Reinforcement Learning on 3-Reacher