The FewRel (Few-Shot Relation Classification Dataset) contains 100 relations and 70,000 instances from Wikipedia. The dataset is divided into three subsets: training set (64 relations), validation set (16 relations) and test set (20 relations).
170 PAPERS • 4 BENCHMARKS
This is the Multi-Axis Temporal RElations for Start-points (i.e., MATRES) dataset
13 PAPERS • 2 BENCHMARKS
The Cornell eRulemaking Corpus – CDCP is an argument mining corpus annotated with argumentative structure information capturing the evaluability of arguments. The corpus consists of 731 user comments on Consumer Debt Collection Practices (CDCP) rule by the Consumer Financial Protection Bureau (CFPB); the resulting dataset contains 4931 elementary unit and 1221 support relation annotations. It is a resource for building argument mining systems that can not only extract arguments from unstructured text, but also identify what additional information is necessary for readers to understand and evaluate a given argument. Immediate applications include providing real-time feedback to commenters, specifying which types of support for which propositions can be added to construct better-formed arguments.
11 PAPERS • 3 BENCHMARKS
The Discovery datasets consists of adjacent sentence pairs (s1,s2) with a discourse marker (y) that occurred at the beginning of s2. They were extracted from the depcc web corpus.
9 PAPERS • 1 BENCHMARK
GUM is an open source multilayer English corpus of richly annotated texts from twelve text types. Annotations include:
8 PAPERS • 1 BENCHMARK
RELX is a benchmark dataset for cross-lingual relation classification in English, French, German, Spanish and Turkish.
6 PAPERS • NO BENCHMARKS YET
LabPics Chemistry Dataset
5 PAPERS • NO BENCHMARKS YET
The TACRED-Revisited dataset improves the crowd-sourced TACRED dataset for relation extraction by relabeling the dev and test sets using expert linguistic annotators. Relabeling focuses on the 5K most challenging instances in dev and test, in total, 51.2% of these are corrected. Published at ACL 2020.
5 PAPERS • 1 BENCHMARK
The DISRPT 2021 shared task, co-located with CODI 2021 at EMNLP, introduces the second iteration of a cross-formalism shared task on discourse unit segmentation and connective detection, as well as the first iteration of a cross-formalism discourse relation classification task.
3 PAPERS • NO BENCHMARKS YET
The Dr. Inventor Multi-Layer Scientific Corpus (DRI Corpus) includes 40 Computer Graphics papers, selected by domain experts. Each paper of the Corpus has been annotated by three annotators by providing the following layers of annotations, each one characterizing a core aspect of scientific publications:
2 PAPERS • 2 BENCHMARKS
The AbstRCT dataset consists of randomized controlled trials retrieved from the MEDLINE database via PubMed search. The trials are annotated with argument components and argumentative relations.
1 PAPER • 2 BENCHMARKS
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
1 PAPER • NO BENCHMARKS YET
Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision. However, there is a noticeable absence of research on applying GNNs to supply chain networks. Supply chain networks are inherently graphlike in structure, making them prime candidates for applying GNN methodologies. This opens up a world of possibilities for optimizing, predicting, and solving even the most complex supply chain problems. A major setback in this approach lies in the absence of real-world benchmark datasets to facilitate the research and resolution of supply chain problem using GNNs. To address the issue, we present a real-world benchmark dataset for temporal tasks, obtained from one of the leading FMCG companies in Bangladesh, focusing on supply chain planning for production purposes. The dataset includes temporal data as node features to enable sales predictions, production planning, and the identification of fact
Green family of datasets for emergent communications on relations.
Dataset Introduction TFH_Annotated_Dataset is an annotated patent dataset pertaining to thin film head technology in hard-disk. To the best of our knowledge, this is the second labeled patent dataset public available in technology management domain that annotates both entities and the semantic relations between entities, the first one is [1].
0 PAPER • NO BENCHMARKS YET