DocRED (Document-Level Relation Extraction Dataset) is a relation extraction dataset constructed from Wikipedia and Wikidata. Each document in the dataset is human-annotated with named entity mentions, coreference information, intra- and inter-sentence relations, and supporting evidence. DocRED requires reading multiple sentences in a document to extract entities and infer their relations by synthesizing all information of the document. Along with the human-annotated data, the dataset provides large-scale distantly supervised data.
DocRED contains 132,375 entities and 56,354 relational facts annotated on 5,053 Wikipedia documents. In addition to the human-annotated data, the dataset provides large-scale distantly supervised data over 101,873 documents.Source: DocRED: A Large-Scale Document-Level Relation Extraction Dataset