Parameter-Efficient Long-Tailed Recognition

18 Sep 2023  ยท  Jiang-Xin Shi, Tong Wei, Zhi Zhou, Xin-Yan Han, Jie-Jing Shao, Yu-Feng Li ยท

The "pre-training and fine-tuning" paradigm in addressing long-tailed recognition tasks has sparked significant interest since the emergence of large vision-language models like the contrastive language-image pre-training (CLIP). While previous studies have shown promise in adapting pre-trained models for these tasks, they often undesirably require extensive training epochs or additional training data to maintain good performance. In this paper, we propose PEL, a fine-tuning method that can effectively adapt pre-trained models to long-tailed recognition tasks in fewer than 20 epochs without the need for extra data. We first empirically find that commonly used fine-tuning methods, such as full fine-tuning and classifier fine-tuning, suffer from overfitting, resulting in performance deterioration on tail classes. To mitigate this issue, PEL introduces a small number of task-specific parameters by adopting the design of any existing parameter-efficient fine-tuning method. Additionally, to expedite convergence, PEL presents a novel semantic-aware classifier initialization technique derived from the CLIP textual encoder without adding any computational overhead. Our experimental results on four long-tailed datasets demonstrate that PEL consistently outperforms previous state-of-the-art approaches. The source code is available at https://github.com/shijxcs/PEL.

PDF Abstract

Results from the Paper


 Ranked #1 on Long-tail Learning on CIFAR-100-LT (ฯ=10) (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Long-tail Learning CIFAR-100-LT (ฯ=10) PEL (ViT-B/16, CLIP) Error Rate 15.1 # 4
Long-tail Learning CIFAR-100-LT (ฯ=10) PEL (ViT-B/16, ImageNet-21K pre-training) Error Rate 8.7 # 1
Long-tail Learning CIFAR-100-LT (ฯ=100) PEL (ViT-B/16, CLIP) Error Rate 18.3 # 3
Long-tail Learning CIFAR-100-LT (ฯ=100) PEL (ViT-B/16, ImageNet-21K pre-training) Error Rate 10.9 # 1
Long-tail Learning CIFAR-100-LT (ฯ=50) PEL (ViT-B/16, ImageNet-21K pre-training) Error Rate 9.8 # 1
Long-tail Learning CIFAR-100-LT (ฯ=50) PEL (ViT-B/16, CLIP) Error Rate 16.9 # 4
Long-tail Learning ImageNet-LT PEL (ViT-B/16) Top-1 Accuracy 78.3 # 3
Long-tail Learning iNaturalist 2018 PEL (ViT-B/16) Top-1 Accuracy 80.4% # 3
Long-tail Learning Places-LT PEL (ViT-B/16) Top-1 Accuracy 52.2 # 1

Methods