Read-only Prompt Optimization for Vision-Language Few-shot Learning

In recent years, prompt tuning has proven effective in adapting pre-trained vision-language models to downstream tasks. These methods aim to adapt the pre-trained models by introducing learnable prompts while keeping pre-trained weights frozen. However, learnable prompts can affect the internal representation within the self-attention module, which may negatively impact performance variance and generalization, especially in data-deficient settings. To address these issues, we propose a novel approach, Read-only Prompt Optimization (RPO). RPO leverages masked attention to prevent the internal representation shift in the pre-trained model. Further, to facilitate the optimization of RPO, the read-only prompts are initialized based on special tokens of the pre-trained model. Our extensive experiments demonstrate that RPO outperforms CLIP and CoCoOp in base-to-new generalization and domain generalization while displaying better robustness. Also, the proposed method achieves better generalization on extremely data-deficient settings, while improving parameter efficiency and computational overhead. Code is available at https://github.com/mlvlab/RPO.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Prompt Engineering Caltech-101 RPO Harmonic mean 96.03 # 6
Prompt Engineering DTD RPO Harmonic mean 68.61 # 6
Prompt Engineering EuroSAT RPO Harmonic mean 76.79 # 8
Prompt Engineering FGVC-Aircraft RPO Harmonic mean 35.70 # 8
Prompt Engineering Food-101 RPO Harmonic mean 90.58 # 9
Prompt Engineering ImageNet RPO Harmonic mean 74.00 # 8
Prompt Engineering Oxford 102 Flower RPO Harmonic mean 84.50 # 7
Prompt Engineering Oxford-IIIT Pet Dataset RPO Harmonic mean 96.05 # 9
Prompt Engineering Stanford Cars RPO Harmonic mean 74.69 # 7
Prompt Engineering SUN397 RPO Harmonic mean 79.18 # 8
Prompt Engineering UCF101 RPO Harmonic mean 79.34 # 8

Methods