1 code implementation • 15 Mar 2024 • Enguang Wang, Zhimao Peng, Zhengyuan Xie, Xialei Liu, Ming-Ming Cheng
Specifically, our TES leverages the property that CLIP can generate aligned vision-language features, converting visual embeddings into tokens of the CLIP's text encoder to generate pseudo text embeddings.
1 code implementation • ICCV 2019 • Zhimao Peng, Zechao Li, Junge Zhang, Yan Li, Guo-Jun Qi, Jinhui Tang
To fully explore the prior knowledge, a semantic-visual mapping network is then developed to conduct knowledge inference, which enables to infer the classifiers for novel categories from base categories.