Search Results for author: Minh Quan Do

Found 2 papers, 2 papers with code

Vamos: Versatile Action Models for Video Understanding

1 code implementation22 Nov 2023 Shijie Wang, Qi Zhao, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

To interpret the important text evidence for question answering, we generalize the concept bottleneck model to work with tokens and nonlinear models, which uses hard attention to select a small subset of tokens from the free-form text as inputs to the LLM reasoner.

Language Modelling Large Language Model +2

AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?

1 code implementation31 Jul 2023 Qi Zhao, Shijie Wang, Ce Zhang, Changcheng Fu, Minh Quan Do, Nakul Agarwal, Kwonjoon Lee, Chen Sun

We propose to formulate the LTA task from two perspectives: a bottom-up approach that predicts the next actions autoregressively by modeling temporal dynamics; and a top-down approach that infers the goal of the actor and plans the needed procedure to accomplish the goal.

Action Anticipation counterfactual +1

Cannot find the paper you are looking for? You can Submit a new open access paper.