no code implementations • 24 Mar 2024 • Xiangpeng Yang, Linchao Zhu, Hehe Fan, Yi Yang
We find that the crux of the issue stems from the imprecise distribution of attention weights across designated regions, including inaccurate text-to-attribute control and attention leakage.
1 code implementation • 19 Jan 2024 • Xiangpeng Yang, Linchao Zhu, Xiaohan Wang, Yi Yang
(2) Equipping the visual and text encoder with separated prompts failed to mitigate the visual-text modality gap.