DAWT (Densely Annotated Wikipedia Texts)

Introduced by Spasojevic et al. in DAWT: Densely Annotated Wikipedia Texts across multiple languages

The DAWT dataset consists of Densely Annotated Wikipedia Texts across multiple languages. The annotations include labeled text mentions mapping to entities (represented by their Freebase machine ids) as well as the type of the entity. The data set contains total of 13.6M articles, 5.0B tokens, 13.8M mention entity co-occurrences. DAWT contains 4.8 times more anchor text to entity links than originally present in the Wikipedia markup. Moreover, it spans several languages including English, Spanish, Italian, German, French and Arabic.

Source: DAWT: Densely Annotated Wikipedia Texts across multiple languages

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


License


  • Unknown

Modalities


Languages