GraphSAINT: Graph Sampling Based Inductive Learning Method

Graph Convolutional Networks (GCNs) are powerful models for learning representations of attributed graphs. To scale GCNs to large graphs, state-of-the-art methods use various layer sampling techniques to alleviate the "neighbor explosion" problem during minibatch training. We propose GraphSAINT, a graph sampling based inductive learning method that improves training efficiency and accuracy in a fundamentally different way. By changing perspective, GraphSAINT constructs minibatches by sampling the training graph, rather than the nodes or edges across GCN layers. Each iteration, a complete GCN is built from the properly sampled subgraph. Thus, we ensure fixed number of well-connected nodes in all layers. We further propose normalization technique to eliminate bias, and sampling algorithms for variance reduction. Importantly, we can decouple the sampling from the forward and backward propagation, and extend GraphSAINT with many architecture variants (e.g., graph attention, jumping connection). GraphSAINT demonstrates superior performance in both accuracy and training time on five large graphs, and achieves new state-of-the-art F1 scores for PPI (0.995) and Reddit (0.970).

PDF Abstract ICLR 2020 PDF ICLR 2020 Abstract

Datasets


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Link Property Prediction ogbl-citation2 GraphSAINT (GCN aggr) Test MRR 0.7985 ± 0.0040 # 16
Validation MRR 0.7975 ± 0.0039 # 16
Number of params 296449 # 12
Ext. data No # 1
Node Property Prediction ogbn-mag GraphSAINT (R-GCN aggr) Test Accuracy 0.4751 ± 0.0022 # 28
Validation Accuracy 0.4837 ± 0.0026 # 28
Number of params 154366772 # 4
Ext. data No # 1
Node Property Prediction ogbn-mag GraphSAINT + metapath2vec Test Accuracy 0.4966 ± 0.0022 # 26
Validation Accuracy 0.5066 ± 0.0017 # 26
Number of params 309764724 # 2
Ext. data No # 1
Node Property Prediction ogbn-products GraphSAINT-inductive Test Accuracy 0.8027 ± 0.0026 # 44
Validation Accuracy Please tell us # 59
Number of params 331661 # 40
Ext. data No # 1
Node Property Prediction ogbn-products GraphSAINT (SAGE aggr) Test Accuracy 0.7908 ± 0.0024 # 49
Validation Accuracy 0.9162 ± 0.0008 # 46
Number of params 206895 # 45
Ext. data No # 1
Node Classification PPI GraphSAINT F1 99.50 # 3
Node Classification Reddit GraphSAINT Accuracy 97.0% # 6

Methods