OntoNotes 5.0

OntoNotes 5.0 is a large corpus comprising various genres of text (news, conversational telephone speech, weblogs, usenet newsgroups, broadcast, talk shows) in three languages (English, Chinese, and Arabic) with structural information (syntax and predicate argument structure) and shallow semantics (word sense linked to an ontology and coreference).

OntoNotes Release 5.0 contains the content of earlier releases - and adds source data from and/or additional annotations for, newswire, broadcast news, broadcast conversation, telephone conversation and web data in English and Chinese and newswire data in Arabic.

Source: https://catalog.ldc.upenn.edu/LDC2013T19

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages