Search Results for author: Konrad Staniszewski

Found 3 papers, 2 papers with code

Analysing The Impact of Sequence Composition on Language Model Pre-Training

1 code implementation21 Feb 2024 Yu Zhao, Yuanbin Qu, Konrad Staniszewski, Szymon Tworkowski, Wei Liu, Piotr Miłoś, Yuxiang Wu, Pasquale Minervini

In this work, we find that applying causal masking can lead to the inclusion of distracting information from previous documents during pre-training, which negatively impacts the performance of the models on language modelling and downstream tasks.

In-Context Learning Language Modelling +1

Cannot find the paper you are looking for? You can Submit a new open access paper.