Search Results for author: Minh Quan Do

Found 2 papers, 2 papers with code

Vamos: Versatile Action Models for Video Understanding

1 code implementation • 22 Nov 2023 • Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

To interpret the important text evidence for question answering, we generalize the concept bottleneck model to work with tokens and nonlinear models, which uses hard attention to select a small subset of tokens from the free-form text as inputs to the LLM reasoner.

Ranked #2 on Zero-Shot Video Question Answer on EgoSchema (fullset)

Language Modelling Large Language Model +2

Paper
Code

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

1 code implementation • 31 Jul 2023 • Qi Zhao, Shijie Wang, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

We propose to formulate the LTA task from two perspectives: a bottom-up approach that predicts the next actions autoregressively by modeling temporal dynamics; and a top-down approach that infers the goal of the actor and plans the needed procedure to accomplish the goal.

Ranked #1 on Long Term Action Anticipation on Ego4D

Action Anticipation counterfactual +1

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.