Inductive Representation Learning on Large Graphs

Low-dimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node's local neighborhood. Our algorithm outperforms strong baselines on three inductive node-classification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multi-graph dataset of protein-protein interactions.

PDF Abstract NeurIPS 2017 PDF NeurIPS 2017 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Node Classification Brazil Air-Traffic GraphSAGE (Hamilton et al., [2017a]) Accuracy 0.404 # 6
Node Classification Chameleon (60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 62.15 ± 0.42 # 24
Node Classification on Non-Homophilic (Heterophilic) Graphs Chameleon(60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 62.15 ± 0.42 # 22
Graph Classification CIFAR10 100k GraphSage Accuracy (%) 66.08 # 11
Node Classification CiteSeer (0.5%) GraphSAGE Accuracy 33.8% # 14
Node Classification CiteSeer (1%) GraphSAGE Accuracy 51.0% # 13
Node Classification CiteSeer (60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 78.24 ± 0.30 # 24
Node Classification Citeseer Full-supervised GraphSAGE Accuracy 71.40% # 7
Node Classification CiteSeer with Public Split: fixed 20 nodes per class GraphSAGE Accuracy 67.2 # 35
Node Classification Cora (0.5%) GraphSAGE Accuracy 37.5% # 14
Node Classification Cora (1%) GraphSAGE Accuracy 49.0% # 13
Node Classification Cora (3%) GraphSAGE Accuracy 64.2% # 13
Node Classification Cora (60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 86.58 ± 0.26 # 24
Node Classification Cora Full-supervised GraphSAGE Accuracy 82.2% # 7
Node Classification Cora with Public Split: fixed 20 nodes per class GraphSAGE Accuracy 74.5% # 34
Node Classification Cornell (60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 71.41 ± 1.24 # 31
Node Classification on Non-Homophilic (Heterophilic) Graphs Cornell (60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 71.41 ± 1.24 # 29
Node Classification Film (60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 36.37 ± 0.21 # 24
Node Classification Flickr GraphSAGE (Hamilton et al., [2017a]) Accuracy 0.641 # 3
Link Property Prediction ogbl-citation2 Full-batch GraphSAGE Test MRR 0.8260 ± 0.0036 # 13
Validation MRR 0.8263 ± 0.0033 # 13
Number of params 460289 # 9
Ext. data No # 1
Link Property Prediction ogbl-citation2 NeighborSampling (SAGE aggr) Test MRR 0.8044 ± 0.0010 # 14
Validation MRR 0.8054 ± 0.0009 # 14
Number of params 460289 # 9
Ext. data No # 1
Link Property Prediction ogbl-collab GraphSAGE (val as input) Test Hits@50 0.5463 ± 0.0112 # 17
Validation Hits@50 0.5688 ± 0.0077 # 22
Number of params 460289 # 19
Ext. data No # 1
Link Property Prediction ogbl-collab GraphSAGE (val as input) Number of params 460289 # 19
Link Property Prediction ogbl-collab GraphSAGE Test Hits@50 0.4810 ± 0.0081 # 24
Validation Hits@50 0.5688 ± 0.0077 # 22
Number of params 460289 # 19
Ext. data No # 1
Link Property Prediction ogbl-ddi GraphSAGE Test Hits@20 0.5390 ± 0.0474 # 22
Validation Hits@20 0.6262 ± 0.0037 # 22
Number of params 1421057 # 19
Ext. data No # 1
Link Property Prediction ogbl-ppa GraphSAGE Test Hits@100 0.1655 ± 0.0240 # 23
Validation Hits@100 0.1724 ± 0.0264 # 22
Number of params 424449 # 12
Ext. data No # 1
Node Property Prediction ogbn-arxiv GraphSAGE Test Accuracy 0.7149 ± 0.0027 # 71
Validation Accuracy 0.7277 ± 0.0016 # 69
Number of params 218664 # 54
Ext. data No # 1
Node Property Prediction ogbn-mag NeighborSampling (R-GCN aggr) Test Accuracy 0.4678 ± 0.0067 # 30
Validation Accuracy 0.4761 ± 0.0068 # 30
Number of params 154366772 # 4
Ext. data No # 1
Node Property Prediction ogbn-papers100M GraphSAGE_res_incep Test Accuracy 0.6706 ± 0.0017 # 13
Validation Accuracy 0.7032 ± 0.0011 # 14
Number of params 5755172 # 14
Ext. data No # 1
Node Property Prediction ogbn-products GraphSAGE Test Accuracy 0.7829 ± 0.0016 # 53
Validation Accuracy Please tell us # 59
Number of params Please tell us # 61
Ext. data No # 1
Node Property Prediction ogbn-products NeighborSampling (SAGE aggr) Test Accuracy 0.7870 ± 0.0036 # 51
Validation Accuracy 0.9170 ± 0.0009 # 45
Number of params 206895 # 45
Ext. data No # 1
Node Property Prediction ogbn-products Full-batch GraphSAGE Test Accuracy 0.7850 ± 0.0014 # 52
Validation Accuracy 0.9224 ± 0.0007 # 37
Number of params 206895 # 45
Ext. data No # 1
Node Property Prediction ogbn-products GraphSAGE + C&S + node2vec Test Accuracy 0.8154 ± 0.0050 # 35
Validation Accuracy 0.9238 ± 0.0006 # 32
Number of params 103983 # 51
Ext. data No # 1
Node Property Prediction ogbn-proteins GraphSAGE Test ROC-AUC 0.7768 ± 0.0020 # 21
Validation ROC-AUC 0.8334 ± 0.0013 # 18
Number of params 193136 # 20
Ext. data No # 1
Node Classification PATTERN 100k GraphSage Accuracy (%) 50.516 # 9
Node Classification PPI GraphSAGE F1 61.2 # 21
Node Classification PubMed (0.03%) GraphSAGE Accuracy 45.4% # 13
Node Classification PubMed (0.05%) GraphSAGE Accuracy 53.0% # 12
Node Classification PubMed (0.1%) GraphSAGE Accuracy 65.4% # 12
Node Classification PubMed (60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 86.85 ± 0.11 # 29
Node Classification Pubmed Full-supervised GraphSAGE Accuracy 87.1% # 7
Node Classification PubMed with Public Split: fixed 20 nodes per class GraphSAGE Accuracy 76.8% # 28
Node Classification Reddit GraphSAGE Accuracy 94.32% # 13
Node Classification Squirrel (60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 41.26 ± 0.26 # 25
Node Classification Texas (60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 79.03 ± 1.20 # 31
Node Classification on Non-Homophilic (Heterophilic) Graphs Texas(60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 79.03 ± 1.20 # 28
Node Classification Wisconsin (60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 64.85 ± 5.14 # 33
Node Classification on Non-Homophilic (Heterophilic) Graphs Wisconsin(60%/20%/20% random splits) GraphSAGE 1:1 Accuracy 64.85 ± 5.14 # 30
Graph Regression ZINC-500k GraphSage MAE 0.398 # 28

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
Node Classification Europe Air-Traffic GraphSAGE (Hamilton et al., [2017a]) Accuracy 27.2 # 7
Node Classification Facebook GraphSAGE (Hamilton et al., [2017a]) Accuracy 38.9 # 5
Node Classification USA Air-Traffic GraphSAGE (Hamilton et al., [2017a]) Accuracy 31.6 # 7
Node Classification Wiki-Vote GraphSAGE (Hamilton et al., [2017a]) Accuracy 24.5 # 6

Methods