🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language (clear)

15 dataset results for Relation Classification AND English

FewRel (Few-Shot Relation Classification Dataset)

The FewRel (Few-Shot Relation Classification Dataset) contains 100 relations and 70,000 instances from Wikipedia. The dataset is divided into three subsets: training set (64 relations), validation set (16 relations) and test set (20 relations).

170 PAPERS • 4 BENCHMARKS

MATRES

MATRES (Multi-Axis Temporal RElations for Start-points)

This is the Multi-Axis Temporal RElations for Start-points (i.e., MATRES) dataset

13 PAPERS • 2 BENCHMARKS

CDCP

CDCP (Cornell eRulemaking Corpus)

The Cornell eRulemaking Corpus – CDCP is an argument mining corpus annotated with argumentative structure information capturing the evaluability of arguments. The corpus consists of 731 user comments on Consumer Debt Collection Practices (CDCP) rule by the Consumer Financial Protection Bureau (CFPB); the resulting dataset contains 4931 elementary unit and 1221 support relation annotations. It is a resource for building argument mining systems that can not only extract arguments from unstructured text, but also identify what additional information is necessary for readers to understand and evaluate a given argument. Immediate applications include providing real-time feedback to commenters, specifying which types of support for which propositions can be added to construct better-formed arguments.

11 PAPERS • 3 BENCHMARKS

Discovery

The Discovery datasets consists of adjacent sentence pairs (s1,s2) with a discourse marker (y) that occurred at the beginning of s2. They were extracted from the depcc web corpus.

9 PAPERS • 1 BENCHMARK

GUM (Georgetown University Multilayer corpus)

GUM is an open source multilayer English corpus of richly annotated texts from twelve text types. Annotations include:

8 PAPERS • 1 BENCHMARK

RELX

RELX is a benchmark dataset for cross-lingual relation classification in English, French, German, Spanish and Turkish.

6 PAPERS • NO BENCHMARKS YET

LabPics (LabPics Dataset for computer vision for autonomous chemistry labs and medical labs)

LabPics Chemistry Dataset

5 PAPERS • NO BENCHMARKS YET

TACRED-Revisited

The TACRED-Revisited dataset improves the crowd-sourced TACRED dataset for relation extraction by relabeling the dev and test sets using expert linguistic annotators. Relabeling focuses on the 5K most challenging instances in dev and test, in total, 51.2% of these are corrected. Published at ACL 2020.

5 PAPERS • 1 BENCHMARK

DISRPT2021

DISRPT2021 (DISRPT2021 shared task on Discourse Unit Segmentation, Connective Detection and Discourse Relation Classification)

The DISRPT 2021 shared task, co-located with CODI 2021 at EMNLP, introduces the second iteration of a cross-formalism shared task on discourse unit segmentation and connective detection, as well as the first iteration of a cross-formalism discourse relation classification task.

3 PAPERS • NO BENCHMARKS YET

DRI Corpus

DRI Corpus (Dr. Inventor Multi-layer Scientific Corpus)

The Dr. Inventor Multi-Layer Scientific Corpus (DRI Corpus) includes 40 Computer Graphics papers, selected by domain experts. Each paper of the Corpus has been annotated by three annotators by providing the following layers of annotations, each one characterizing a core aspect of scientific publications:

2 PAPERS • 2 BENCHMARKS

AbstRCT - Neoplasm

The AbstRCT dataset consists of randomized controlled trials retrieved from the MEDLINE database via PubMed search. The trials are annotated with argument components and argumentative relations.

1 PAPER • 2 BENCHMARKS

CORE

CORE (Company Relation Extraction)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

SupplyGraph (SupplyGraph: A Benchmark Dataset for Supply Chain Planning using Graph Neural Networks)

Graph Neural Networks (GNNs) have gained traction across different domains such as transportation, bio-informatics, language processing, and computer vision. However, there is a noticeable absence of research on applying GNNs to supply chain networks. Supply chain networks are inherently graphlike in structure, making them prime candidates for applying GNN methodologies. This opens up a world of possibilities for optimizing, predicting, and solving even the most complex supply chain problems. A major setback in this approach lies in the absence of real-world benchmark datasets to facilitate the research and resolution of supply chain problem using GNNs. To address the issue, we present a real-world benchmark dataset for temporal tasks, obtained from one of the leading FMCG companies in Bangladesh, focusing on supply chain planning for production purposes. The dataset includes temporal data as node features to enable sales predictions, production planning, and the identification of fact

1 PAPER • NO BENCHMARKS YET

TexRel

Green family of datasets for emergent communications on relations.

1 PAPER • NO BENCHMARKS YET

TFH_Annotated_Dataset

TFH_Annotated_Dataset (Thin_Film_head_relevant_Patent_Annotated_Dataset)

Dataset Introduction TFH_Annotated_Dataset is an annotated patent dataset pertaining to thin film head technology in hard-disk. To the best of our knowledge, this is the second labeled patent dataset public available in technology management domain that annotates both entities and the semantic relations between entities, the first one is [1].

0 PAPER • NO BENCHMARKS YET

Datasets

15 dataset results for Relation Classification AND English