Generative Prompt Model for Weakly Supervised Object Localization

ICCV 2023  ยท  Yuzhong Zhao, Qixiang Ye, Weijia Wu, Chunhua Shen, Fang Wan ยท

Weakly supervised object localization (WSOL) remains challenging when learning object localization models from image category labels. Conventional methods that discriminatively train activation models ignore representative yet less discriminative object parts. In this study, we propose a generative prompt model (GenPromp), defining the first generative pipeline to localize less discriminative object parts by formulating WSOL as a conditional image denoising procedure. During training, GenPromp converts image category labels to learnable prompt embeddings which are fed to a generative model to conditionally recover the input image with noise and learn representative embeddings. During inference, enPromp combines the representative embeddings with discriminative embeddings (queried from an off-the-shelf vision-language model) for both representative and discriminative capacity. The combined embeddings are finally used to generate multi-scale high-quality attention maps, which facilitate localizing full object extent. Experiments on CUB-200-2011 and ILSVRC show that GenPromp respectively outperforms the best discriminative models by 5.2% and 5.6% (Top-1 Loc), setting a solid baseline for WSOL with the generative model. Code is available at https://github.com/callsys/GenPromp.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Results from the Paper


 Ranked #1 on Weakly-Supervised Object Localization on CUB-200-2011 (Top-1 Localization Accuracy metric, using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Weakly-Supervised Object Localization CUB-200-2011 GenPromp Top-1 Localization Accuracy 87.0 # 1
Weakly-Supervised Object Localization CUB-200-2011 Stable diffusion Top-1 Localization Accuracy 87.0 # 2
GT-known localization accuracy 98.0 # 2
Weakly-Supervised Object Localization ImageNet Stable diffusion GT-known localization accuracy 75.0 # 1
Top-1 Localization Accuracy 65.2 # 1

Methods


No methods listed for this paper. Add relevant methods here