Spectral DeTuning

Introduced by Horwitz et al. in Recovering the Pre-Fine-Tuning Weights of Generative Models

A method that can recover the weights of the pre-fine-tuning model using a few low-rank (LoRA) fine-tuned models. In contrast to previous attacks that attempt to recover pre-fine-tuning capabilities, Spectral DeTuning aims to recover the exact pre-fine-tuning weights. Spectral DeTuning can exploit this vulnerability against large-scale models such as a personalized Stable Diffusion and an aligned Mistral.

Source: Recovering the Pre-Fine-Tuning Weights of Generative Models

Read Paper See Code