🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

74 dataset results for Relation Extraction

TimeBankPT is a corpus of Portuguese text with annotations about time. The annotation scheme used is similar to TimeML. TimeBankPT is the result of adapting the English corpus used in the first TempEval challenge to the Portuguese language.

4 PAPERS • 1 BENCHMARK

FB15k-237-low

The FB15k-237-low dataset is a variation of the FB15k-237 dataset where relations with a low number of triplets are kept.

3 PAPERS • NO BENCHMARKS YET

Biographical

Biographical (Biographical: A Semi-Supervised Relation Extraction Dataset)

Biographical is a semi-supervised dataset for RE. The dataset, which is aimed towards digital humanities (DH) and historical research, is automatically compiled by aligning sentences from Wikipedia articles with matching structured data from sources including Pantheon and Wikidata.

2 PAPERS • NO BENCHMARKS YET

FOBIE (Focused Open Biological Information Extraction)

The Focused Open Biology Information Extraction (FOBIE) dataset aims to support IE from Computer-Aided Biomimetics. The dataset contains ~1,500 sentences from scientific biological texts. These sentences are annotated with TRADE-OFFS and syntactically similar relations between unbounded arguments, as well as argument-modifiers.

2 PAPERS • NO BENCHMARKS YET

FREDo

FREDo is a Few-Shot Document-Level Relation Extraction Benchmark based on DocRED and SciERC. The dataset is divided into four subsets: training set (62 relations), validation set (16 relations), in-domain test set (16 relations), and cross-domain test set (7 relations).

2 PAPERS • 2 BENCHMARKS

MobIE

MobIE is a German-language dataset which is human-annotated with 20 coarse- and fine-grained entity types and entity linking information for geographically linkable entities. The dataset consists of 3,232 social media texts and traffic reports with 91K tokens, and contains 20.5K annotated entities, 13.1K of which are linked to a knowledge base. A subset of the dataset is human-annotated with seven mobility-related, n-ary relation types, while the remaining documents are annotated using a weakly-supervised labeling approach implemented with the Snorkel framework.

2 PAPERS • NO BENCHMARKS YET

NYT-H

NYT-H is a dataset for distantly-supervised relation extraction, in which DS-labelled training data is used and several annotators to label test data are hired. NYT-H can serve as a benchmark of distantly-supervised relation extraction.

2 PAPERS • NO BENCHMARKS YET

X-WikiRE

X-WikiRE is a new, large-scale multilingual relation extraction dataset in which relation extraction is framed as a problem of reading comprehension to allow for generalization to unseen relations.

2 PAPERS • NO BENCHMARKS YET

CORE

CORE (Company Relation Extraction)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

Chinese Literature NER RE

Chinese Literature NER RE is a Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text. It is constructed from hundreds of Chinese literature articles.

1 PAPER • NO BENCHMARKS YET

Dataset: Relationship extraction for knowledge graph creation from biomedical literature (Gene-Disease relationships)

This is the dataset used for classifying Gene-Disease relationship types from sentences. The dataset consists of 3 files:

1 PAPER • 1 BENCHMARK

DiaKG

DiaKG is a high-quality Chinese dataset for Diabetes knowledge graph.

1 PAPER • NO BENCHMARKS YET

FB1.5M

The FB1.5M dataset is a benchmark for Knowledge Graph Completion. It is based on Freebase and it contains 30 relations with less than 500 triplets as low-resource relations.

1 PAPER • NO BENCHMARKS YET

KGRED

KGRED (Knowledge-graph-enhanced relation extraction datasets--)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 PAPER • NO BENCHMARKS YET

LPSC

LPSC (Planetary Science Data Set)

This data set contains annotated text versions of 1635 two-page abstracts published at the Lunar and Planetary Science Conference from 1998 to 2020 of relevance to four Mars missions. The annotations were generated using named entity recognition and relation extraction provided by the MTE processing pipeline (available at https://github.com/wkiri/MTE), followed by manual review. Annotated entities include Element, Mineral, Property, and Target. Annotated relations include Contains(Target, Element | Mineral) and HasProperty(Target, Property). The extracted information (without full texts) is also available as a database (stored in .csv files) at https://pds-geosciences.wustl.edu/missions/mte/mte.htm .

1 PAPER • 2 BENCHMARKS

Medical Case Report Corpus

Medical Case Report Corpus is a new corpus comprising annotations of medical entities in case reports, originating from PubMed Central's open access library.

1 PAPER • NO BENCHMARKS YET

Multi-CrossRE

Multi-CrossRE is a broadest multi-lingual dataset for Relation Extraction (RE) including 26 languages in addition to English, and covering six text domains. It is a machine translated version of CrossRE crossre, with a sub-portion including more than 200 sentences in seven diverse languages checked by native speakers.

1 PAPER • NO BENCHMARKS YET

MultiTACRED

MultiTACRED is a multilingual version of the large-scale TAC Relation Extraction Dataset. It covers 12 typologically diverse languages from 9 language families, and was created by the Speech & Language Technology group of DFKI by machine-translating the instances of the original TACRED dataset and automatically projecting their entity annotations. For details of the original TACRED's data collection and annotation process, see the Stanford paper. Translations are syntactically validated by checking the correctness of the XML tag markup. Any translations with an invalid tag structure, e.g. missing or invalid head or tail tag pairs, are discarded (on average, 2.3% of the instances).

1 PAPER • NO BENCHMARKS YET

Part Whole Relations

The Part-Whole Relations dataset is a dataset of semantic relations between entities. It contains the following subtypes: - Component-Of - Member-Of - Portion-Of - Stuff-Of - Located-In - Contained-In - Phase-Of - Participates-In

1 PAPER • NO BENCHMARKS YET

Product Reviews 2017

The corpus contains review sentences mostly of products in electronics domain, annotated and segregated into 4 comparison categories. Each comparison sentence is annotated with names of the products (PROD1 and PROD2), the aspect (ASP) and the predicate (PRED). Dataset contains sentences after auto-labeling on SNAP dataset and manually labeled sentences from the following corpora:

1 PAPER • 1 BENCHMARK

SOMD

SOMD (SOftware Mention Detection)

The dataset contains the training and test data for the SOftware Mention Detection challenge. The data is derived from the SoMeSci Knowledge Graph of software mentions.

1 PAPER • NO BENCHMARKS YET

THRED

THRED (Two-Hop Relation Extraction Dataset)

This is two-hop relation extraction dataset derived from WikiHop dataset [1].

1 PAPER • NO BENCHMARKS YET

TexRel

Green family of datasets for emergent communications on relations.

1 PAPER • NO BENCHMARKS YET

Translated TACRED

533 parallel examples sampled from TACRED, translated into Russian and Korean (and 3 additional examples in Russian), accompanied with tranlsation of a list of trigger words collected for the different relations.

1 PAPER • NO BENCHMARKS YET

TurkQA

TurkQA consists of a selection of sentences from English Wikipedia articles, with questions and answers crowdsourced from workers on Amazon Mechanical Turk.

1 PAPER • NO BENCHMARKS YET

TFH_Annotated_Dataset

TFH_Annotated_Dataset (Thin_Film_head_relevant_Patent_Annotated_Dataset)

Dataset Introduction TFH_Annotated_Dataset is an annotated patent dataset pertaining to thin film head technology in hard-disk. To the best of our knowledge, this is the second labeled patent dataset public available in technology management domain that annotates both entities and the semantic relations between entities, the first one is [1].

0 PAPER • NO BENCHMARKS YET

Datasets

74 dataset results for Relation Extraction