Search Results for author: Thao Minh Le

Found 12 papers, 4 papers with code

Deep Neural Networks for Visual Reasoning

no code implementations • 24 Sep 2022 • Thao Minh Le

Visual perception and language understanding are - fundamental components of human intelligence, enabling them to understand and reason about objects and their interactions.

Multimodal Reasoning Visual Reasoning

Paper
Add Code

Video Dialog as Conversation about Objects Living in Space-Time

1 code implementation • 8 Jul 2022 • Hoang-Anh Pham, Thao Minh Le, Vuong Le, Tu Minh Phuong, Truyen Tran

To tackle these challenges we present a new object-centric framework for video dialog that supports neural reasoning dubbed COST - which stands for Conversation about Objects in Space-Time.

Object Relational Reasoning +3

Paper
Code

Guiding Visual Question Answering with Attention Priors

no code implementations • 25 May 2022 • Thao Minh Le, Vuong Le, Sunil Gupta, Svetha Venkatesh, Truyen Tran

This grounding guides the attention mechanism inside VQA models through a duality of mechanisms: pre-training attention weight calculation and directly guiding the weights at inference time on a case-by-case basis.

Question Answering Visual Grounding +2

Paper
Add Code

Hierarchical Object-oriented Spatio-Temporal Reasoning for Video Question Answering

no code implementations • 25 Jun 2021 • Long Hoang Dang, Thao Minh Le, Vuong Le, Truyen Tran

Toward reaching this goal we propose an object-oriented reasoning approach in that video is abstracted as a dynamic stream of interacting objects.

Object Question Answering +1

Paper
Add Code

Object-Centric Representation Learning for Video Question Answering

no code implementations • 12 Apr 2021 • Long Hoang Dang, Thao Minh Le, Vuong Le, Truyen Tran

Video question answering (Video QA) presents a powerful testbed for human-like intelligent behaviors.

Object Question Answering +3

Paper
Add Code

Hierarchical Conditional Relation Networks for Multimodal Video Question Answering

no code implementations • 18 Oct 2020 • Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran

Video QA challenges modelers in multiple fronts.

Question Answering Relation +2

Paper
Add Code

GEFA: Early Fusion Approach in Drug-Target Affinity Prediction

1 code implementation • 25 Sep 2020 • Tri Minh Nguyen, Thin Nguyen, Thao Minh Le, Truyen Tran

In addition, previous DTA methods learn protein representation solely based on a small number of protein sequences in DTA datasets while neglecting the use of proteins outside of the DTA datasets.

Paper
Code

Dynamic Language Binding in Relational Visual Reasoning

1 code implementation • 30 Apr 2020 • Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran

We present Language-binding Object Graph Network, the first neural reasoning method with dynamic relational structures across both visual and textual domains with applications in visual question answering.

Object Question Answering +2

Paper
Code

Hierarchical Conditional Relation Networks for Video Question Answering

1 code implementation • CVPR 2020 • Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran

Video question answering (VideoQA) is challenging as it requires modeling capacity to distill dynamic visual artifacts and distant relations and to associate them with linguistic concepts.

Ranked #3 on Audio-Visual Question Answering (AVQA) on AVQA

Audio-Visual Question Answering (AVQA) Question Answering +4

128

Paper
Code

Neural Reasoning, Fast and Slow, for Video Question Answering

no code implementations • 10 Jul 2019 • Thao Minh Le, Vuong Le, Svetha Venkatesh, Truyen Tran

While recent advances in lingual and visual question answering have enabled sophisticated representations and neural reasoning mechanisms, major challenges in Video QA remain on dynamic grounding of concepts, relations and actions to support the reasoning process.

Natural Questions Question Answering +2

Paper
Add Code

Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

no code implementations • 12 Sep 2018 • Thao Minh Le, Nobuyuki Shimizu, Takashi Miyazaki, Koichi Shinoda

With the widespread use of intelligent systems, such as smart speakers, addressee recognition has become a concern in human-computer interaction, as more and more people expect such systems to understand complicated social scenes, including those outdoors, in cafeterias, and hospitals.

Paper
Add Code

A Fine-to-Coarse Convolutional Neural Network for 3D Human Action Recognition

no code implementations • 30 May 2018 • Thao Minh Le, Nakamasa Inoue, Koichi Shinoda

This paper presents a new framework for human action recognition from a 3D skeleton sequence.

Ranked #99 on Skeleton Based Action Recognition on NTU RGB+D

Skeleton Based Action Recognition

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.