Search Results for author: Minh Quan Do

Found 2 papers, 0 papers with code

Vamos: Versatile Action Models for Video Understanding

no code implementations • 22 Nov 2023 • Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

What makes good video representations for video understanding, such as anticipating future activities, or answering video-conditioned questions?

Ranked #2 on Zero-Shot Video Question Answer on EgoSchema (fullset)

Language Modelling Large Language Model +2

Paper
Add Code

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

no code implementations • 31 Jul 2023 • Qi Zhao, Shijie Wang, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

We propose to formulate the LTA task from two perspectives: a bottom-up approach that predicts the next actions autoregressively by modeling temporal dynamics; and a top-down approach that infers the goal of the actor and plans the needed procedure to accomplish the goal.

Action Anticipation counterfactual +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.