Feature Incay for Representation Regularization

ICLR 2018 · Yuhui Yuan, Kuiyuan Yang, Chao Zhang ·

Softmax loss is widely used in deep neural networks for multi-class classification, where each class is represented by a weight vector, a sample is represented as a feature vector, and the feature vector has the largest projection on the weight vector of the correct category when the model correctly classifies a sample. To ensure generalization, weight decay that shrinks the weight norm is often used as regularizer. Different from traditional learning algorithms where features are fixed and only weights are tunable, features are also tunable as representation learning in deep learning. Thus, we propose feature incay to also regularize representation learning, which favors feature vectors with large norm when the samples can be correctly classified. With the feature incay, feature vectors are further pushed away from the origin along the direction of their corresponding weight vectors, which achieves better inter-class separability. In addition, the proposed feature incay encourages intra-class compactness along the directions of weight vectors by increasing the small feature norm faster than the large ones. Empirical results on MNIST, CIFAR10 and CIFAR100 demonstrate feature incay can improve the generalization ability.