Multi-task Self-distillation for Graph-based Semi-Supervised Learning

2 Dec 2021  ·  Yating Ren, Junzhong Ji, Lingfeng Niu, Minglong Lei ·

Graph convolutional networks have made great progress in graph-based semi-supervised learning. Existing methods mainly assume that nodes connected by graph edges are prone to have similar attributes and labels, so that the features smoothed by local graph structures can reveal the class similarities. However, there often exist mismatches between graph structures and labels in many real-world scenarios, where the structures may propagate misleading features or labels that eventually affect the model performance. In this paper, we propose a multi-task self-distillation framework that injects self-supervised learning and self-distillation into graph convolutional networks to separately address the mismatch problem from the structure side and the label side. First, we formulate a self-supervision pipeline based on pre-text tasks to capture different levels of similarities in graphs. The feature extraction process is encouraged to capture more complex proximity by jointly optimizing the pre-text task and the target task. Consequently, the local feature aggregations are improved from the structure side. Second, self-distillation uses soft labels of the model itself as additional supervision, which has similar effects as label smoothing. The knowledge from the classification pipeline and the self-supervision pipeline is collectively distilled to improve the generalization ability of the model from the label side. Experiment results show that the proposed method obtains remarkable performance gains under several classic graph convolutional architectures.

PDF Abstract
No code implementations yet. Submit your code now
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Node Classification AMZ Computers: fixed 20 node per class SDSS-GCN Accuracy 84.86 # 1
Node Classification Citeseer: fixed 20 node per class SDSS-GAT Accuracy 76.35 # 1
Node Classification Cora: fixed 20 nodes per class SDSS-GCN Accuracy 86.00 # 1
Node Classification Pubmed: fixed 20 node per class SDSS-APPNP Accuracy 82.72 # 1

Methods


No methods listed for this paper. Add relevant methods here