Label Deconvolution for Node Representation Learning on Large-scale Attributed Graphs against Learning Bias

26 Sep 2023  ·  Zhihao Shi, Jie Wang, Fanghua Lu, Hanzhu Chen, Defu Lian, Zheng Wang, Jieping Ye, Feng Wu ·

Node representation learning on attributed graphs -- whose nodes are associated with rich attributes (e.g., texts and protein sequences) -- plays a crucial role in many important downstream tasks. To encode the attributes and graph structures simultaneously, recent studies integrate pre-trained models with graph neural networks (GNNs), where pre-trained models serve as node encoders (NEs) to encode the attributes. As jointly training large NEs and GNNs on large-scale graphs suffers from severe scalability issues, many methods propose to train NEs and GNNs separately. Consequently, they do not take feature convolutions in GNNs into consideration in the training phase of NEs, leading to a significant learning bias from that by the joint training. To address this challenge, we propose an efficient label regularization technique, namely Label Deconvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs. The inverse mapping leads to an objective function that is equivalent to that by the joint training, while it can effectively incorporate GNNs in the training phase of NEs against the learning bias. More importantly, we show that LD converges to the optimal objective function values by thejoint training under mild assumptions. Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph Benchmark datasets.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Node Property Prediction ogbn-arxiv LD+REVGAT Test Accuracy 0.7726 ± 0.0017 # 4
Validation Accuracy 0.7762 ± 0.0008 # 4
Number of params 140438868 # 5
Ext. data Yes # 1
Node Property Prediction ogbn-products LD+GAMLP Test Accuracy 0.8645 ± 0.0012 # 11
Validation Accuracy 0.9415 ± 0.0003 # 1
Number of params 144331677 # 2
Ext. data Yes # 1
Node Property Prediction ogbn-products LD+GIANT+SAGN+SCR Test Accuracy 0.8718 ± 0.0004 # 4
Validation Accuracy 0.9399 ± 0.0002 # 4
Number of params 110636896 # 6
Ext. data Yes # 1
Node Property Prediction ogbn-proteins LD+GAT Test ROC-AUC 0.8942 ± 0.0007 # 1
Validation ROC-AUC 0.9527 ± 0.0007 # 1
Number of params 664233700 # 1
Ext. data Yes # 1

Methods


No methods listed for this paper. Add relevant methods here