1 code implementation • 9 Apr 2024 • Yupei Zhang, Li Pan, Qiushi Yang, Tan Li, Zhen Chen
Specifically, to enhance the representation abilities of vision and language encoders, we propose the Multi-level Reconstruction Pre-training (MR-Pretrain) strategy, including a feature-level and data-level reconstruction, which guides models to capture the semantic information from masked inputs of different modalities.
no code implementations • 3 Jul 2023 • Jiaqi Wang, Zhengliang Liu, Lin Zhao, Zihao Wu, Chong Ma, Sigang Yu, Haixing Dai, Qiushi Yang, Yiheng Liu, Songyao Zhang, Enze Shi, Yi Pan, Tuo Zhang, Dajiang Zhu, Xiang Li, Xi Jiang, Bao Ge, Yixuan Yuan, Dinggang Shen, Tianming Liu, Shu Zhang
This review aims to summarize the methods employed in the computer vision domain for large vision models and visual prompt engineering, exploring the latest advancements in visual prompt engineering.
no code implementations • 28 Apr 2023 • Jiaqi Wang, Enze Shi, Sigang Yu, Zihao Wu, Chong Ma, Haixing Dai, Qiushi Yang, Yanqing Kang, Jinru Wu, Huawen Hu, Chenxi Yue, Haiyang Zhang, Yiheng Liu, Yi Pan, Zhengliang Liu, Lichao Sun, Xiang Li, Bao Ge, Xi Jiang, Dajiang Zhu, Yixuan Yuan, Dinggang Shen, Tianming Liu, Shu Zhang
Prompt engineering is a critical technique in the field of natural language processing that involves designing and optimizing the prompts used to input information into models, aiming to enhance their performance on specific tasks.
no code implementations • ICCV 2023 • Qiushi Yang, Wuyang Li, Baopu Li, Yixuan Yuan
Moreover, to enhance semantic relation modeling, we propose relation matching to align the sample-wise relation between the intact and masked features.
1 code implementation • CVPR 2022 • Xinyu Liu, Wuyang Li, Qiushi Yang, Baopu Li, Yixuan Yuan
Domain Adaptive Object Detection (DAOD) models a joint distribution of images and labels from an annotated source domain and learns a domain-invariant transformation to estimate the target labels with the given target domain images.