scb-mt-en-th-2020

Introduced by Lowphansirikul et al. in scb-mt-en-th-2020: A Large English-Thai Parallel Corpus

scb-mt-en-th-2020 is an English-Thai machine translation dataset with over 1 million segment pairs, curated from various sources, namely news, Wikipedia articles, SMS messages, task-based dialogs, web-crawled data and government documents.

Source: scb-mt-en-th-2020: A Large English-Thai Parallel Corpus

Papers


Paper Code Results Date Stars

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages