Search Results for author: Konrad Staniszewski

Found 3 papers, 2 papers with code

Analysing The Impact of Sequence Composition on Language Model Pre-Training

1 code implementation • 21 Feb 2024 • Yu Zhao, Yuanbin Qu, Konrad Staniszewski, Szymon Tworkowski, Wei Liu, Piotr Miłoś, Yuxiang Wu, Pasquale Minervini

In this work, we find that applying causal masking can lead to the inclusion of distracting information from previous documents during pre-training, which negatively impacts the performance of the models on language modelling and downstream tasks.

In-Context Learning Language Modelling +1

Paper
Code

Structured Packing in LLM Training Improves Long Context Utilization

no code implementations • 28 Dec 2023 • Konrad Staniszewski, Szymon Tworkowski, Yu Zhao, Sebastian Jaszczur, Henryk Michalewski, Łukasz Kuciński, Piotr Miłoś

Recent developments in long-context large language models have attracted considerable attention.

Information Retrieval Retrieval

Paper
Add Code

Focused Transformer: Contrastive Training for Context Scaling

1 code implementation • NeurIPS 2023 • Szymon Tworkowski, Konrad Staniszewski, Mikołaj Pacek, Yuhuai Wu, Henryk Michalewski, Piotr Miłoś

This novel approach enhances the structure of the (key, value) space, enabling an extension of the context length.

Contrastive Learning

1,432

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.