no code implementations • 20 Aug 2022 • Qingrong Cheng, Keyu Wen, Xiaodong Gu
To address this issue, we propose a novel Vision-Language Matching strategy for text-to-image synthesis, named VLMGAN*, which introduces a dual vision-language matching mechanism to strengthen the image quality and semantic consistency.
no code implementations • 13 Aug 2022 • Zhenshan Tan, Cheng Chen, Keyu Wen, Yuzhuo Qin, Xiaodong Gu
With the design of negative samples, the noise objects are suppressed.
no code implementations • 2 Jul 2022 • Keyu Wen, Zhenshan Tan, Qingrong Cheng, Cheng Chen, Xiaodong Gu
Concretely, the first module is a weight-sharing transformer that builds on the head of the visual and textual encoders, aiming to semantically align text and image.
1 code implementation • ICCV 2021 • Keyu Wen, Jin Xia, Yuanyuan Huang, Linyang Li, Jiayan Xu, Jie Shao
There are two key designs in it, one is the weight-sharing transformer on top of the visual and textual encoders to align text and image semantically, the other is three kinds of contrastive learning designed for sharing knowledge between different modalities.
1 code implementation • 22 Oct 2020 • Keyu Wen, Xiaodong Gu, Qingrong Cheng
Thus, a novel multi-level semantic relations enhancement approach named Dual Semantic Relations Attention Network(DSRAN) is proposed which mainly consists of two modules, separate semantic relations module and the joint semantic relations module.