Search Results for author: Adam Ibrahim

Found 6 papers, 4 papers with code

Simple and Scalable Strategies to Continually Pre-train Large Language Models

1 code implementation13 Mar 2024 Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks.

Continual Learning Language Modelling

Continual Pre-Training of Large Language Models: How to (re)warm your model?

2 code implementations8 Aug 2023 Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule.

Language Modelling

Towards Out-of-Distribution Adversarial Robustness

1 code implementation6 Oct 2022 Adam Ibrahim, Charles Guille-Escuret, Ioannis Mitliagkas, Irina Rish, David Krueger, Pouya Bashivan

Compared to existing methods, we obtain similar or superior worst-case adversarial robustness on attacks seen during training.

Adversarial Robustness

Learning Robust Kernel Ensembles with Kernel Average Pooling

no code implementations30 Sep 2022 Pouya Bashivan, Adam Ibrahim, Amirozhan Dehghani, Yifei Ren

Model ensembles have long been used in machine learning to reduce the variance in individual model predictions, making them more robust to input perturbations.

Adversarial Feature Desensitization

1 code implementation NeurIPS 2021 Pouya Bashivan, Reza Bayat, Adam Ibrahim, Kartik Ahuja, Mojtaba Faramarzi, Touraj Laleh, Blake Aaron Richards, Irina Rish

Our method, called Adversarial Feature Desensitization (AFD), aims at learning features that are invariant towards adversarial perturbations of the inputs.

Adversarial Robustness Domain Adaptation +1

Linear Lower Bounds and Conditioning of Differentiable Games

no code implementations ICML 2020 Adam Ibrahim, Waïss Azizian, Gauthier Gidel, Ioannis Mitliagkas

In this work, we approach the question of fundamental iteration complexity by providing lower bounds to complement the linear (i. e. geometric) upper bounds observed in the literature on a wide class of problems.

Cannot find the paper you are looking for? You can Submit a new open access paper.