Search Results for author: Yangling Tong

Found 1 papers, 0 papers with code

Perplexed by Quality: A Perplexity-based Method for Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data

no code implementations20 Dec 2022 Tim Jansen, Yangling Tong, Victoria Zevallos, Pedro Ortiz Suarez

As demand for large corpora increases with the size of current state-of-the-art language models, using web data as the main part of the pre-training corpus for these models has become a ubiquitous practice.

Language Modelling

Cannot find the paper you are looking for? You can Submit a new open access paper.