no code implementations • 22 Mar 2024 • Zhichao Wei, Qingkun Su, Long Qin, Weizhi Wang
CLS embeddings are used on the one hand to augment the text embeddings, and on the other hand together with patch embeddings to derive a small number of detail-rich subject embeddings, both of which are efficiently integrated into the diffusion model through the well-designed multimodal cross-attention mechanism.
no code implementations • 22 May 2023 • Zhenghao Zhang, Zhichao Wei, Shengfan Zhang, Zuozhuo Dai, Siyu Zhu
Unsupervised video object segmentation has made significant progress in recent years, but the manual annotation of video mask datasets is expensive and limits the diversity of available datasets.
no code implementations • 16 Jan 2023 • Zhichao Wei, Xiaohao Chen, Mingqiang Chen, Siyu Zhu
Referring image segmentation aims to segment the image region of interest according to the given language expression, which is a typical multi-modal task.