node2vec: Scalable Feature Learning for Networks

3 Jul 2016  ·  Aditya Grover, Jure Leskovec ·

Prediction tasks over nodes and edges in networks require careful effort in engineering features used by learning algorithms. Recent research in the broader field of representation learning has led to significant progress in automating prediction by learning the features themselves. However, present feature learning approaches are not expressive enough to capture the diversity of connectivity patterns observed in networks. Here we propose node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks. In node2vec, we learn a mapping of nodes to a low-dimensional space of features that maximizes the likelihood of preserving network neighborhoods of nodes. We define a flexible notion of a node's network neighborhood and design a biased random walk procedure, which efficiently explores diverse neighborhoods. Our algorithm generalizes prior work which is based on rigid notions of network neighborhoods, and we argue that the added flexibility in exploring neighborhoods is the key to learning richer representations. We demonstrate the efficacy of node2vec over existing state-of-the-art techniques on multi-label classification and link prediction in several real-world networks from diverse domains. Taken together, our work represents a new way for efficiently learning state-of-the-art task-independent representations in complex networks.

PDF Abstract
Task Dataset Model Metric Name Metric Value Global Rank Result Benchmark
Node Classification Eximtradedata node2vec Accuracy 21.50% # 4
Macro-F1 0.206 # 5
Link Property Prediction ogbl-citation2 Node2vec Test MRR 0.6141 ± 0.0011 # 17
Validation MRR 0.6124 ± 0.0011 # 17
Number of params 374911105 # 3
Ext. data No # 1
Link Property Prediction ogbl-collab Node2vec Test Hits@50 0.4888 ± 0.0054 # 23
Validation Hits@50 0.5703 ± 0.0052 # 21
Number of params 30322945 # 9
Ext. data No # 1
Link Property Prediction ogbl-ddi Node2vec Test Hits@20 0.2326 ± 0.0209 # 25
Validation Hits@20 0.3292 ± 0.0121 # 25
Number of params 645249 # 26
Ext. data No # 1
Link Property Prediction ogbl-ppa Node2vec Test Hits@100 0.2226 ± 0.0083 # 21
Validation Hits@100 0.2253 ± 0.0088 # 20
Number of params 73878913 # 4
Ext. data No # 1
Node Property Prediction ogbn-arxiv Node2vec Test Accuracy 0.7007 ± 0.0013 # 74
Validation Accuracy 0.7129 ± 0.0013 # 72
Number of params 21818792 # 8
Ext. data No # 1
Node Property Prediction ogbn-papers100M Node2vec Test Accuracy 0.5560 ± 0.0023 # 19
Validation Accuracy 0.5807 ± 0.0028 # 19
Number of params 14215818412 # 1
Ext. data No # 1
Node Property Prediction ogbn-products Node2vec Test Accuracy 0.7249 ± 0.0010 # 58
Validation Accuracy 0.9032 ± 0.0006 # 53
Number of params 313612207 # 1
Ext. data No # 1
Node Property Prediction ogbn-proteins Node2vec Test ROC-AUC 0.6881 ± 0.0065 # 24
Validation ROC-AUC 0.7007 ± 0.0053 # 22
Number of params 17094000 # 5
Ext. data No # 1
Link Prediction USAir N2V AUC 91.44 # 2
Node Classification Wikipedia node2vec Accuracy 19.10% # 4
Macro-F1 0.179 # 4

Results from Other Papers


Task Dataset Model Metric Name Metric Value Rank Source Paper Compare
Malware Clustering Android Malware Dataset node2vec ARI 16.39 # 3
Malware Detection Android Malware Dataset node2vec Accuracy 81.25 # 3

Methods