WMT 2021 Ge'ez-Amharic

WMT 2021 Ge'ez-Amharic is a Ge'ez-Amharic dataset prepared for NMT tasks of the 6th Workshop on NLP at Debre Berhan University, Ethiopia. The corpus has been collected from:

  • Ethiopian Orthodox Church old bible (from ethiopianorthodox.org), Anaphora, praise of St. Virgin Mary, praise of Lord Jesus and other Church's books.
  • Ge'ez teaching books,
  • Websites and other internet sources such as www.geez.org, www.debelo.org,

The Dataset has about 15454 parallel Ge'ez and Amharic sentences for training, 1001 parallel sentences for testing and 1001 parallel sentences for validation.

Papers


Paper Code Results Date Stars

Tasks


Similar Datasets


License


Modalities


Languages