Search Results for author: Yangling Tong

Found 1 papers, 0 papers with code

Perplexed by Quality: A Perplexity-based Method for Adult and Harmful Content Detection in Multilingual Heterogeneous Web Data

no code implementations • 20 Dec 2022 • Tim Jansen, Yangling Tong, Victoria Zevallos, Pedro Ortiz Suarez

As demand for large corpora increases with the size of current state-of-the-art language models, using web data as the main part of the pre-training corpus for these models has become a ubiquitous practice.

Language Modelling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.