WikiText-TL-39 is a benchmark language modeling dataset in Filipino that has 39 million tokens in the training set.

Source: Evaluating Language Model Finetuning Techniques for Low-resource Languages

Papers


Paper Code Results Date Stars

Tasks


Similar Datasets


License


  • Unknown

Modalities


Languages