Search Results for author: Mats L. Richter

Found 9 papers, 6 papers with code

Simple and Scalable Strategies to Continually Pre-train Large Language Models

1 code implementation13 Mar 2024 Adam Ibrahim, Benjamin Thérien, Kshitij Gupta, Mats L. Richter, Quentin Anthony, Timothée Lesort, Eugene Belilovsky, Irina Rish

In this work, we show that a simple and scalable combination of learning rate (LR) re-warming, LR re-decaying, and replay of previous data is sufficient to match the performance of fully re-training from scratch on all available data, as measured by the final loss and the average score on several language model (LM) evaluation benchmarks.

Continual Learning Language Modelling

Continual Pre-Training of Large Language Models: How to (re)warm your model?

2 code implementations8 Aug 2023 Kshitij Gupta, Benjamin Thérien, Adam Ibrahim, Mats L. Richter, Quentin Anthony, Eugene Belilovsky, Irina Rish, Timothée Lesort

We study the warmup phase of models pre-trained on the Pile (upstream data, 300B tokens) as we continue to pre-train on SlimPajama (downstream data, 297B tokens), following a linear warmup and cosine decay schedule.

Language Modelling

Wuerstchen: An Efficient Architecture for Large-Scale Text-to-Image Diffusion Models

1 code implementation1 Jun 2023 Pablo Pernias, Dominic Rampas, Mats L. Richter, Christopher J. Pal, Marc Aubreville

This highly compressed representation of an image provides much more detailed guidance compared to latent representations of language and this significantly reduces the computational requirements to achieve state-of-the-art results.

Image Compression Image Generation

Receptive Field Refinement for Convolutional Neural Networks Reliably Improves Predictive Performance

no code implementations26 Nov 2022 Mats L. Richter, Christopher Pal

By further developing and formalizing the analysis of receptive field expansion in convolutional neural networks, we can predict unproductive layers in an automated manner before ever training a model.

Should You Go Deeper? Optimizing Convolutional Neural Network Architectures without Training by Receptive Field Analysis

1 code implementation23 Jun 2021 Mats L. Richter, Julius Schöning, Anna Wiedenroth, Ulf Krumnack

When optimizing convolutional neural networks (CNN) for a specific image-based task, specialists commonly overshoot the number of convolutional layers in their designs.

Exploring the Properties and Evolution of Neural Network Eigenspaces during Training

no code implementations17 Jun 2021 Mats L. Richter, Leila Malihi, Anne-Kathrin Patricia Windler, Ulf Krumnack

In this work we explore the information processing inside neural networks using logistic regression probes \cite{probes} and the saturation metric \cite{featurespace_saturation}.

regression

Size Matters

no code implementations2 Feb 2021 Mats L. Richter, Wolf Byttner, Ulf Krumnack, Ludwdig Schallner, Justin Shenk

Fully convolutional neural networks can process input of arbitrary size by applying a combination of downsampling and pooling.

Feature Space Saturation during Training

2 code implementations15 Jun 2020 Mats L. Richter, Justin Shenk, Wolf Byttner, Anders Arpteg, Mikael Huss

First, we show that a layer's output can be restricted to the eigenspace of its variance matrix without performance loss.

Spectral Analysis of Latent Representations

1 code implementation19 Jul 2019 Justin Shenk, Mats L. Richter, Anders Arpteg, Mikael Huss

We propose a metric, Layer Saturation, defined as the proportion of the number of eigenvalues needed to explain 99% of the variance of the latent representations, for analyzing the learned representations of neural network layers.

Cannot find the paper you are looking for? You can Submit a new open access paper.