Layer rotation: a surprisingly simple indicator of generalization in deep networks?

Our work presents empirical evidence that layer rotation, i.e. the evolution across training of the cosine distance between each layer's weight vector and its initialization, constitutes an impressively consistent indicator of generalization performance. Compared to previously studied indicators of generalization, we show that layer rotation has the additional benefit of being easily monitored and controlled, as well as having a network-independent optimum: the training procedures during which all layers' weights reach a cosine distance of 1 from their initialization consistently outperform other configurations -by up to 20% test accuracy. Finally, our results also suggest that the study of layer rotation can provide a unified framework to explain the impact of weight decay and adaptive gradient methods on generalization.

PDF Abstract

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here