Fine-Tuning Distorts Pretrained Features and Underperforms Out-of-Distribution

When transferring a pretrained model to a downstream task, two popular methods are fine-tuning (updating all the model parameters) and linear probing (updating only the last linear layer). It is well known that fine-tuning leads to better accuracy in-distribution (ID). However, in this paper, we show that fine-tuning can achieve worse accuracy than linear probing out-of-distribution (OOD), especially when the pretrained features are good and distribution shift is large. On six distribution shift datasets (Breeds-Living17, Breeds-Entity30, DomainNet, CIFAR $\to$ STL, CIFAR10.1, FMoW), fine-tuning obtains an average 2% higher accuracy ID but 6% lower accuracy OOD than linear probing. We theoretically analyze the tradeoffs arising in fine-tuning overparameterized two-layer linear networks, characterizing how fine-tuning can distort high-quality pretrained features which leads to low OOD accuracy. Our analysis suggests the simple two-step strategy of linear probing then full fine-tuning, which combines the benefits of both fine-tuning and linear probing to achieve better ID and OOD accuracy than fine-tuning, both theoretically and on the above datasets (1% better ID, 8% better OOD).

PDF Abstract
No code implementations yet. Submit your code now

Tasks


Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here