Complementary Relation Contrastive Distillation

Knowledge distillation aims to transfer representation ability from a teacher model to a student model. Previous approaches focus on either individual representation distillation or inter-sample similarity preservation. While we argue that the inter-sample relation conveys abundant information and needs to be distilled in a more effective way. In this paper, we propose a novel knowledge distillation method, namely Complementary Relation Contrastive Distillation (CRCD), to transfer the structural knowledge from the teacher to the student. Specifically, we estimate the mutual relation in an anchor-based way and distill the anchor-student relation under the supervision of its corresponding anchor-teacher relation. To make it more robust, mutual relations are modeled by two complementary elements: the feature and its gradient. Furthermore, the low bound of mutual information between the anchor-teacher relation distribution and the anchor-student relation distribution is maximized via relation contrastive loss, which can distill both the sample representation and the inter-sample relations. Experiments on different benchmarks demonstrate the effectiveness of our proposed CRCD.

PDF Abstract CVPR 2021 PDF CVPR 2021 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Knowledge Distillation ImageNet CRCD (T: ResNet-34 S:ResNet-18) Top-1 accuracy % 71.96 # 26
CRD training setting # 1

Methods