no code implementations • 8 Sep 2023 • Max Marion, Ahmet Üstün, Luiza Pozzobon, Alex Wang, Marzieh Fadaee, Sara Hooker
In this work, we take a wider view and explore scalable estimates of data quality that can be used to systematically measure the quality of pretraining data.