HengamCopus is a Persian corpus with temporal tags (BIO standard tagging scheme). This dataset was generated by applying HengamTagger (https://github.com/kargaranamir/parstdex) to a large number of sentences. There are two types of Persian text datasets included in these collections: formal ones (Persian Wikipedia and Hamshahri Corpus), and informal ones (Twitter and HelloKish). In the creation of HengamCorpus, to maximize the diversity of patterns for training and evaluation, they uniformly draw samples from sets of sentences of unique “temporal pattern profile”, presence/absence vector of different temporal patterns within the sentence.

Papers


Paper Code Results Date Stars

Dataset Loaders


Tasks


Similar Datasets


License


Modalities


Languages