DAGW (Danish Gigaword)

Introduced by Strømberg-Derczynski et al. in The Danish Gigaword Corpus

It’s hard to develop good tools for processing Danish with computers when no large and wide-coverage dataset of Danish text is readily available. To address this, the Danish Gigaword Project (DAGW) maintains a corpus for Danish with over a billion words. The general goals are to create a dataset that is:

  • representative;
  • accessible;
  • a suitable common starting point for Danish NLP models.

Papers


Paper Code Results Date Stars

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages