Search Results for author: Aran Komatsuzaki

Found 6 papers, 3 papers with code

ARB: Advanced Reasoning Benchmark for Large Language Models

no code implementations • 25 Jul 2023 • Tomohiro Sawada, Daniel Paleka, Alexander Havrilla, Pranav Tadepalli, Paula Vidas, Alexander Kranias, John J. Nay, Kshitij Gupta, Aran Komatsuzaki

As a subset of ARB, we introduce a challenging set of math and physics problems which require advanced symbolic reasoning and domain knowledge.

Math

Paper
Add Code

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

1 code implementation • 9 Dec 2022 • Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme Ruiz, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani, Neil Houlsby

In this work, we propose sparse upcycling -- a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint.

511

Paper
Code

LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

2 code implementations • 3 Nov 2021 • Christoph Schuhmann, Richard Vencu, Romain Beaumont, Robert Kaczmarczyk, Clayton Mullis, Aarush Katta, Theo Coombes, Jenia Jitsev, Aran Komatsuzaki

Multi-modal language-vision models trained on hundreds of millions of image-text pairs (e. g.

Few-Shot Learning

10,579

Paper
Code

Current Limitations of Language Models: What You Need is Retrieval

1 code implementation • 15 Sep 2020 • Aran Komatsuzaki

We classify and re-examine some of the current approaches to improve the performance-computes trade-off of language models, including (1) non-causal models (such as masked language models), (2) extension of batch length with efficient attention, (3) recurrence, (4) conditional computation and (5) retrieval.

Retrieval Text Generation

827

Paper
Code

One Epoch Is All You Need

no code implementations • 16 Jun 2019 • Aran Komatsuzaki

We compare the wall-clock time of the training of models with different parameter budget under one epoch training, and we show that size/iteration adjustment based on our proposed heuristics leads to 1-2. 7x speedup in our cases.

Language Modelling

Paper
Add Code

Extractive Summary as Discrete Latent Variables

no code implementations • 14 Nov 2018 • Aran Komatsuzaki

In this paper, we compare various methods to compress a text using a neural model.

Language Modelling Text Generation

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.