CHILI: Chemically-Informed Large-scale Inorganic Nanomaterials Dataset for Advancing Graph Machine Learning

Advances in graph machine learning (ML) have been driven by applications in chemistry as graphs have remained the most expressive representations of molecules. While early graph ML methods focused primarily on small organic molecules, recently, the scope of graph ML has expanded to include inorganic materials. Modelling the periodicity and symmetry of inorganic crystalline materials poses unique challenges, which existing graph ML methods are unable to address. Moving to inorganic nanomaterials increases complexity as the scale of number of nodes within each graph can be broad ($10$ to $10^5$). The bulk of existing graph ML focuses on characterising molecules and materials by predicting target properties with graphs as input. However, the most exciting applications of graph ML will be in their generative capabilities, which is currently not at par with other domains such as images or text. We invite the graph ML community to address these open challenges by presenting two new chemically-informed large-scale inorganic (CHILI) nanomaterials datasets: A medium-scale dataset (with overall >6M nodes, >49M edges) of mono-metallic oxide nanomaterials generated from 12 selected crystal types (CHILI-3K) and a large-scale dataset (with overall >183M nodes, >1.2B edges) of nanomaterials generated from experimentally determined crystal structures (CHILI-100K). We define 11 property prediction tasks and 6 structure prediction tasks, which are of special interest for nanomaterial research. We benchmark the performance of a wide array of baseline methods and use these benchmarking results to highlight areas which need future work. To the best of our knowledge, CHILI-3K and CHILI-100K are the first open-source nanomaterial datasets of this scale -- both on the individual graph level and of the dataset as a whole -- and the only nanomaterials datasets with high structural and elemental diversity.

PDF Abstract

Datasets


Introduced in the Paper:

CHILI-3K CHILI-100K

Used in the Paper:

OGB Materials Project

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
X-ray PDF regression CHILI-100K Mean MSE 0.007 # 1
X-ray PDF regression CHILI-100K EdgeCNN MSE 0.012 +/- 0.000 # 2
X-ray PDF regression CHILI-100K GIN MSE 0.013 +/- 0.000 # 3
X-ray PDF regression CHILI-100K GraphUNet MSE 0.013 +/- 0.000 # 3
X-ray PDF regression CHILI-100K GAT MSE 0.013 +/- 0.000 # 3
X-ray PDF regression CHILI-100K GraphSAGE MSE 0.037 +/- 0.026 # 8
X-ray PDF regression CHILI-100K PMLP MSE 0.013 +/- 0.000 # 3
X-ray PDF regression CHILI-100K GCN MSE 0.014 +/- 0.000 # 7
XRD regression CHILI-100K EdgeCNN MSE 0.006 +/- 0.000 # 1
XRD regression CHILI-100K GIN MSE 0.009 +/- 0.000 # 3
XRD regression CHILI-100K GraphUNet MSE 0.009 +/- 0.000 # 3
XRD regression CHILI-100K GAT MSE 0.108 +/- 0.172 # 8
XRD regression CHILI-100K GraphSAGE MSE 0.018 +/- 0.014 # 6
XRD regression CHILI-100K PMLP MSE 0.008 +/- 0.001 # 2
XRD regression CHILI-100K GCN MSE 0.009 +/- 0.000 # 3
XRD regression CHILI-100K Mean MSE 0.021 # 7
SAXS regression CHILI-100K EdgeCNN MSE 0.007 +/- 0.009 # 2
SAXS regression CHILI-100K GIN MSE 0.009 +/- 0.000 # 3
SAXS regression CHILI-100K GraphUNet MSE 0.009 +/- 0.000 # 3
SAXS regression CHILI-100K GAT MSE 0.009 +/- 0.000 # 3
SAXS regression CHILI-100K GraphSAGE MSE 0.011 +/- 0.002 # 7
SAXS regression CHILI-100K PMLP MSE 0.003 +/- 0.000 # 1
SAXS regression CHILI-100K GCN MSE 0.010 +/- 0.000 # 6
SAXS regression CHILI-100K Mean MSE 0.038 # 8
Distance regression CHILI-100K EdgeCNN MSE 0.030 +/- 0.001 # 1
Distance regression CHILI-100K GIN MSE 0.491 +/- 0.038 # 8
Distance regression CHILI-100K GraphUNet MSE 0.085 +/- 0.002 # 3
Distance regression CHILI-100K GAT MSE 0.252 +/- 0.003 # 5
Distance regression CHILI-100K GraphSAGE MSE 0.064 +/- 0.001 # 2
Distance regression CHILI-100K PMLP MSE 0.486 +/- 0.014 # 7
Distance regression CHILI-100K GCN MSE 0.090 +/- 0.002 # 4
Distance regression CHILI-100K Mean MSE 0.307 # 6
Position regression CHILI-100K EdgeCNN Positional MAE 16.336 +/- 0.000 # 2
Position regression CHILI-100K GIN Positional MAE 16.336 +/- 0.000 # 2
Position regression CHILI-100K GraphUNet Positional MAE 14.824 +/- 0.315 # 1
Position regression CHILI-100K GAT Positional MAE 16.336 +/- 0.000 # 2
Position regression CHILI-100K GraphSAGE Positional MAE 16.337 +/- 0.000 # 8
Position regression CHILI-100K PMLP Positional MAE 16.336 +/- 0.000 # 2
Position regression CHILI-100K GCN Positional MAE 16.336 +/- 0.000 # 2
Position regression CHILI-100K Mean Positional MAE 16.336 # 2
Space group classification CHILI-100K EdgeCNN F1-score (Weighted) 0.158 +/- 0.035 # 1
Space group classification CHILI-100K GIN F1-score (Weighted) 0.043 +/- 0.000 # 5
Space group classification CHILI-100K GraphUNet F1-score (Weighted) 0.043 +/- 0.000 # 5
Space group classification CHILI-100K GAT F1-score (Weighted) 0.044 +/- 0.001 # 3
Space group classification CHILI-100K GraphSAGE F1-score (Weighted) 0.044 +/- 0.002 # 3
Space group classification CHILI-100K PMLP F1-score (Weighted) 0.047 +/- 0.012 # 2
Space group classification CHILI-100K GCN F1-score (Weighted) 0.043 +/- 0.001 # 5
Space group classification CHILI-100K Most Frequent Class F1-score (Weighted) 0.010 # 8
Space group classification CHILI-100K Random F1-score (Weighted) 0.002 +/- 0.001 # 9
Crystal system classification CHILI-100K EdgeCNN F1-score (Weighted) 0.072 +/- 0.047 # 4
Crystal system classification CHILI-100K GIN F1-score (Weighted) 0.069 +/- 0.040 # 5
Crystal system classification CHILI-100K GraphUNet F1-score (Weighted) 0.068 +/- 0.006 # 7
Crystal system classification CHILI-100K GAT F1-score (Weighted) 0.110 +/- 0.029 # 3
Crystal system classification CHILI-100K GraphSAGE F1-score (Weighted) 0.061 +/- 0.019 # 8
Crystal system classification CHILI-100K PMLP F1-score (Weighted) 0.124 +/- 0.036 # 2
Crystal system classification CHILI-100K GCN F1-score (Weighted) 0.069 +/- 0.023 # 5
Crystal system classification CHILI-100K Most Frequent Class F1-score (Weighted) 0.046 # 9
Crystal system classification CHILI-100K Random F1-score (Weighted) 0.168 +/- 0.014 # 1
Atomic number classification CHILI-100K EdgeCNN F1-score (Weighted) 0.572 +/- 0.017 # 1
Atomic number classification CHILI-100K GIN F1-score (Weighted) 0.336 +/- 0.005 # 2
Atomic number classification CHILI-100K GraphUNet F1-score (Weighted) 0.287 +/- 0.004 # 3
Atomic number classification CHILI-100K GAT F1-score (Weighted) 0.192 +/- 0.000 # 6
Atomic number classification CHILI-100K GraphSAGE F1-score (Weighted) 0.195 +/- 0.007 # 5
Atomic number classification CHILI-100K PMLP F1-score (Weighted) 0.191 +/- 0.000 # 8
Atomic number classification CHILI-100K GCN F1-score (Weighted) 0.275 +/- 0.002 # 4
Atomic number classification CHILI-100K Most Frequent Class F1-score (Weighted) 0.192 # 6
Atomic number classification CHILI-100K Random F1-score (Weighted) 0.015 +/- 0.000 # 9
X-ray PDF regression CHILI-3K EdgeCNN MSE 0.011 +/- 0.000 # 2
X-ray PDF regression CHILI-3K GraphUNet MSE 0.012 +/- 0.000 # 3
X-ray PDF regression CHILI-3K GAT MSE 0.029 +/- 0.030 # 7
X-ray PDF regression CHILI-3K GraphSAGE MSE 0.012 +/- 0.000 # 3
X-ray PDF regression CHILI-3K PMLP MSE 0.012 +/- 0.000 # 3
X-ray PDF regression CHILI-3K GCN MSE 0.012 +/- 0.000 # 3
X-ray PDF regression CHILI-3K Mean MSE 0.008 # 1
XRD regression CHILI-3K EdgeCNN MSE 0.008 +/- 0.001 # 1
XRD regression CHILI-3K GraphUNet MSE 0.010 +/- 0.000 # 2
XRD regression CHILI-3K GAT MSE 0.010 +/- 0.000 # 2
XRD regression CHILI-3K GraphSAGE MSE 0.010 +/- 0.000 # 2
XRD regression CHILI-3K PMLP MSE 0.010 +/- 0.000 # 2
XRD regression CHILI-3K GCN MSE 0.010 +/- 0.000 # 2
XRD regression CHILI-3K Mean MSE 0.017 # 7
SAXS regression CHILI-3K EdgeCNN MSE 0.006 +/- 0.004 # 1
SAXS regression CHILI-3K GIN MSE 0.008 +/- 0.000 # 2
SAXS regression CHILI-3K GraphUNet MSE 0.008 +/- 0.000 # 2
SAXS regression CHILI-3K GAT MSE 0.008 +/- 0.000 # 2
SAXS regression CHILI-3K GraphSAGE MSE 0.008 +/- 0.001 # 2
SAXS regression CHILI-3K PMLP MSE 0.022 +/- 0.025 # 7
SAXS regression CHILI-3K GCN MSE 0.008 +/- 0.000 # 2
SAXS regression CHILI-3K Mean MSE 0.037 # 8
Distance regression CHILI-3K EdgeCNN MSE 0.015 +/- 0.001 # 1
Distance regression CHILI-3K GIN MSE 0.464 +/- 0.005 # 8
Distance regression CHILI-3K GraphUNet MSE 0.055 +/- 0.001 # 2
Distance regression CHILI-3K GAT MSE 0.342 +/- 0.117 # 6
Distance regression CHILI-3K GraphSAGE MSE 0.055 +/- 0.002 # 2
Distance regression CHILI-3K PMLP MSE 0.359 +/- 0.017 # 7
Distance regression CHILI-3K GCN MSE 0.056 +/- 0.006 # 4
Distance regression CHILI-3K Mean MSE 0.265 # 5
Position regression CHILI-3K EdgeCNN Positional MAE 16.575 +/- 0.000 # 2
Position regression CHILI-3K GIN Positional MAE 16.575 +/- 0.000 # 2
Position regression CHILI-3K GraphUNet Positional MAE 14.765 +/- 0.395 # 1
Position regression CHILI-3K GAT Positional MAE 16.575 +/- 0.000 # 2
Position regression CHILI-3K GraphSAGE Positional MAE 16.575 +/- 0.000 # 2
Position regression CHILI-3K PMLP Positional MAE 16.575 +/- 0.000 # 2
Position regression CHILI-3K GCN Positional MAE 16.575 +/- 0.000 # 2
Position regression CHILI-3K Mean Positional MAE 16.575 # 2
Space group classification CHILI-3K EdgeCNN F1-score (Weighted) 0.733 +/- 0.207 # 1
Space group classification CHILI-3K GIN F1-score (Weighted) 0.125 +/- 0.026 # 4
Space group classification CHILI-3K GraphUNet F1-score (Weighted) 0.095 +/- 0.036 # 8
Space group classification CHILI-3K GAT F1-score (Weighted) 0.113 +/- 0.013 # 5
Space group classification CHILI-3K GraphSAGE F1-score (Weighted) 0.151 +/- 0.045 # 2
Space group classification CHILI-3K PMLP F1-score (Weighted) 0.135 +/- 0.006 # 3
Space group classification CHILI-3K GCN F1-score (Weighted) 0.099 +/- 0.019 # 7
Space group classification CHILI-3K Most Frequent Class F1-score (Weighted) 0.108 # 6
Space group classification CHILI-3K Random F1-score (Weighted) 0.009 +/- 0.008 # 9
Crystal system classification CHILI-3K EdgeCNN F1-score (Weighted) 0.657 +/- 0.196 # 1
Crystal system classification CHILI-3K GIN F1-score (Weighted) 0.438 +/- 0.004 # 5
Crystal system classification CHILI-3K GraphUNet F1-score (Weighted) 0.431 +/- 0.014 # 6
Crystal system classification CHILI-3K GAT F1-score (Weighted) 0.504 +/- 0.076 # 2
Crystal system classification CHILI-3K GraphSAGE F1-score (Weighted) 0.422 +/- 0.037 # 7
Crystal system classification CHILI-3K PMLP F1-score (Weighted) 0.440 +/- 0.036 # 3
Crystal system classification CHILI-3K GCN F1-score (Weighted) 0.367 +/- 0.127 # 8
Crystal system classification CHILI-3K Most Frequent Class F1-score (Weighted) 0.440 # 3
Crystal system classification CHILI-3K Random F1-score (Weighted) 0.191 +/- 0.008 # 9
Atomic number classification CHILI-3K EdgeCNN F1-score (Weighted) 0.632 +/- 0.009 # 1
Atomic number classification CHILI-3K GIN F1-score (Weighted) 0.587 +/- 0.002 # 2
Atomic number classification CHILI-3K GraphUNet F1-score (Weighted) 0.552 +/- 0.079 # 3
Atomic number classification CHILI-3K GAT F1-score (Weighted) 0.461 +/- 0.000 # 6
Atomic number classification CHILI-3K GraphSAGE F1-score (Weighted) 0.491 +/- 0.004 # 5
Atomic number classification CHILI-3K PMLP F1-score (Weighted) 0.461 +/- 0.000 # 6
Atomic number classification CHILI-3K GCN F1-score (Weighted) 0.496 +/- 0.001 # 4
Atomic number classification CHILI-3K Most Frequent Class F1-score (Weighted) 0.461 # 6
Atomic number classification CHILI-3K Random F1-score (Weighted) 0.016 +/- 0.000 # 9

Methods


No methods listed for this paper. Add relevant methods here