Search Results for author: Mats L. Richter

Found 9 papers, 6 papers with code

Simple and Scalable Strategies to Continually Pre-train Large Language Models

1 code implementation • 13 Mar 2024 • Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks.

Continual Learning Language Modelling

6,598

Paper
Code

Continual Pre-Training of Large Language Models: How to (re)warm your model?

2 code implementations • 8 Aug 2023 • Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule.

Language Modelling

6,598

Paper
Code

Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models

1 code implementation • 1 Jun 2023 • Pablo Pernias, Dominic Rampas, Mats L. Richter, Christopher J. Pal, Marc Aubreville

This highly compressed representation of an image provides much more detailed guidance compared to latent representations of language and this significantly reduces the computational requirements to achieve state-of-the-art results.

Image Compression Image Generation

486

Paper
Code

Receptive Field Refinement for Convolutional Neural Networks Reliably Improves Predictive Performance

no code implementations • 26 Nov 2022 • Mats L. Richter, Christopher Pal

By further developing and formalizing the analysis of receptive field expansion in convolutional neural networks, we can predict unproductive layers in an automated manner before ever training a model.

Paper
Add Code

Should You Go Deeper? Optimizing Convolutional Neural Network Architectures without Training by Receptive Field Analysis

1 code implementation • 23 Jun 2021 • Mats L. Richter, Julius Schöning, Anna Wiedenroth, Ulf Krumnack

When optimizing convolutional neural networks (CNN) for a specific image-based task, specialists commonly overshoot the number of convolutional layers in their designs.

Paper
Code

Exploring the Properties and Evolution of Neural Network Eigenspaces during Training

no code implementations • 17 Jun 2021 • Mats L. Richter, Leila Malihi, Anne-Kathrin Patricia Windler, Ulf Krumnack

In this work we explore the information processing inside neural networks using logistic regression probes \cite{probes} and the saturation metric \cite{featurespace_saturation}.

regression

Paper
Add Code

Size Matters

no code implementations • 2 Feb 2021 • Mats L. Richter, Wolf Byttner, Ulf Krumnack, Ludwdig Schallner, Justin Shenk

Fully convolutional neural networks can process input of arbitrary size by applying a combination of downsampling and pooling.

Paper
Add Code

Feature Space Saturation during Training

2 code implementations • 15 Jun 2020 • Mats L. Richter, Justin Shenk, Wolf Byttner, Anders Arpteg, Mikael Huss

First, we show that a layer's output can be restricted to the eigenspace of its variance matrix without performance loss.

Paper
Code

Spectral Analysis of Latent Representations

1 code implementation • 19 Jul 2019 • Justin Shenk, Mats L. Richter, Anders Arpteg, Mikael Huss

We propose a metric, Layer Saturation, defined as the proportion of the number of eigenvalues needed to explain 99% of the variance of the latent representations, for analyzing the learned representations of neural network layers.

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.