Search Results for author: Dániel Simig

Found 2 papers, 1 papers with code

MEGABYTE: Predicting Million-byte Sequences with Multiscale Transformers

no code implementations NeurIPS 2023 Lili Yu, Dániel Simig, Colin Flaherty, Armen Aghajanyan, Luke Zettlemoyer, Mike Lewis

Autoregressive transformers are spectacular models for short sequences but scale poorly to long sequences such as high-resolution images, podcasts, code, or books.

Density Estimation Language Modelling

SemDeDup: Data-efficient learning at web-scale through semantic deduplication

1 code implementation16 Mar 2023 Amro Abbas, Kushal Tirumala, Dániel Simig, Surya Ganguli, Ari S. Morcos

Analyzing a subset of LAION, we show that SemDeDup can remove 50% of the data with minimal performance loss, effectively halving training time.

Cannot find the paper you are looking for? You can Submit a new open access paper.