3 dataset results for Graph Embedding AND Texts

The IS-A dataset is a dataset of relations extracted from a medical ontology. The different entities in the ontology are related by the “is a” relation. For example, ‘acute leukemia’ is a ‘leukemia’. The dataset has 294,693 nodes with 356,541 edges between them.

4 PAPERS • NO BENCHMARKS YET

OLPBENCH

OLPBENCH is a large Open Link Prediction benchmark, which was derived from the state-of-the-art Open Information Extraction corpus OPIEC (Gashteovski et al., 2019). OLPBENCH contains 30M open triples, 1M distinct open relations and 2.5M distinct mentions of approximately 800K entities.

4 PAPERS • NO BENCHMARKS YET

HeriGraph (Multimodal Machine Learning Datasets on Graphs of Heritage Values and Attributes)

The dataset contains constructed multi-modal features (visual and textual), pseudo-labels (on heritage values and attributes), and graph structures (with temporal, social, and spatial links) constructed using User-Generated Content data collected from Flickr social media platform in three global cities containing UNESCO World Heritage property (Amsterdam, Suzhou, Venice). The motivation of data collection in this project is to provide datasets that could be both directly applicable for ML communities as test-bed, and theoretically informative for heritage and urban scholars to draw conclusions on for planning decision-making.

1 PAPER • NO BENCHMARKS YET