no code implementations • 13 Jun 2023 • Chen Cai, Suchen Wang, Kim-Hui Yap, Yi Wang
Weakly-supervised grounded image captioning (WSGIC) aims to generate the caption and ground (localize) predicted object words in the input image without using bounding box supervision.
1 code implementation • 26 Mar 2023 • Yue Zhang, Suchen Wang, Shichao Kan, Zhenyu Weng, Yigang Cen, Yap-Peng Tan
Our key idea is to formulate the POAR problem as an image-text search problem.
1 code implementation • 28 Oct 2022 • Henghui Ding, Chang Liu, Suchen Wang, Xudong Jiang
We propose a Vision-Language Transformer (VLT) framework for referring segmentation to facilitate deep interactions among multi-modal information and enhance the holistic understanding to vision-language features.
Ranked #3 on Referring Video Object Segmentation on MeViS
Referring Expression Segmentation Referring Video Object Segmentation
1 code implementation • CVPR 2022 • Suchen Wang, Yueqi Duan, Henghui Ding, Yap-Peng Tan, Kim-Hui Yap, Junsong Yuan
More specifically, we propose a new HOI visual encoder to detect the interacting humans and objects, and map them to a joint feature space to perform interaction recognition.
1 code implementation • ICCV 2021 • Henghui Ding, Chang Liu, Suchen Wang, Xudong Jiang
We introduce transformer and multi-head attention to build a network with an encoder-decoder attention mechanism architecture that "queries" the given image with the language expression.
Generalized Referring Expression Comprehension Generalized Referring Expression Segmentation +1
no code implementations • ICCV 2021 • Suchen Wang, Kim-Hui Yap, Henghui Ding, Jiyan Wu, Junsong Yuan, Yap-Peng Tan
In this work, we study the problem of human-object interaction (HOI) detection with large vocabulary object categories.
1 code implementation • CVPR 2020 • Suchen Wang, Kim-Hui Yap, Junsong Yuan, Yap-Peng Tan
To recognize objects from unseen categories, we devise a zero-shot classification module upon the classifier of seen categories.
no code implementations • CVPR 2019 • Suchen Wang, Jingjing Meng, Junsong Yuan, Yap-Peng Tan
Given labeled source data and big unlabeled target data, we aim to find representatives in the target data, which can not only represent and associate data points belonging to each labeled category, but also discover novel categories in the target data, if any.