The One Billion Word dataset is a dataset for language modeling. The training/held-out data was produced from the WMT 2011 News Crawl data using a combination of Bash shell and Perl scripts.

Papers


Paper Code Results Date Stars

Tasks


Similar Datasets


License


Modalities


Languages