WikiText-TL-39 is a benchmark language modeling dataset in Filipino that has 39 million tokens in the training set.
3 PAPERS • NO BENCHMARKS YET