no code implementations • 1 Apr 2024 • Rongjie Li, Yu Wu, Xuming He
Generative vision-language models (VLMs) have shown impressive performance in zero-shot vision-language tasks like image captioning and visual question answering.
no code implementations • 1 Apr 2024 • Rongjie Li, Songyang Zhang, Dahua Lin, Kai Chen, Xuming He
Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks.
1 code implementation • 23 Jan 2024 • Rongjie Li, Songyang Zhang, Xuming He
Moreover, we design a graph assembling module to infer the connectivity of the bipartite scene graph based on our entity-aware structure, enabling us to generate the scene graph in an end-to-end manner.
1 code implementation • CVPR 2022 • Rongjie Li, Songyang Zhang, Xuming He
Scene Graph Generation (SGG) remains a challenging visual understanding task due to its compositional property.
no code implementations • 29 Sep 2021 • Rongjie Li, Songyang Zhang, Xuming He
We develop a decoding-and-assembling paradigm for the end-to-end scene graph generation.
3 code implementations • CVPR 2021 • Rongjie Li, Songyang Zhang, Bo Wan, Xuming He
Scene graph generation is an important visual understanding task with a broad range of vision applications.
1 code implementation • ICCV 2019 • Bo Wan, Desen Zhou, Yongfei Liu, Rongjie Li, Xuming He
Reasoning human object interactions is a core problem in human-centric scene understanding and detecting such relations poses a unique challenge to vision systems due to large variations in human-object configurations, multiple co-occurring relation instances and subtle visual difference between relation categories.