Search Results for author: Aran Komatsuzaki

Found 6 papers, 3 papers with code

ARB: Advanced Reasoning Benchmark for Large Language Models

no code implementations25 Jul 2023 Tomohiro Sawada, Daniel Paleka, Alexander Havrilla, Pranav Tadepalli, Paula Vidas, Alexander Kranias, John J. Nay, Kshitij Gupta, Aran Komatsuzaki

As a subset of ARB, we introduce a challenging set of math and physics problems which require advanced symbolic reasoning and domain knowledge.

Math

Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints

1 code implementation9 Dec 2022 Aran Komatsuzaki, Joan Puigcerver, James Lee-Thorp, Carlos Riquelme Ruiz, Basil Mustafa, Joshua Ainslie, Yi Tay, Mostafa Dehghani, Neil Houlsby

In this work, we propose sparse upcycling -- a simple way to reuse sunk training costs by initializing a sparsely activated Mixture-of-Experts model from a dense checkpoint.

Current Limitations of Language Models: What You Need is Retrieval

1 code implementation15 Sep 2020 Aran Komatsuzaki

We classify and re-examine some of the current approaches to improve the performance-computes trade-off of language models, including (1) non-causal models (such as masked language models), (2) extension of batch length with efficient attention, (3) recurrence, (4) conditional computation and (5) retrieval.

Retrieval Text Generation

One Epoch Is All You Need

no code implementations16 Jun 2019 Aran Komatsuzaki

We compare the wall-clock time of the training of models with different parameter budget under one epoch training, and we show that size/iteration adjustment based on our proposed heuristics leads to 1-2. 7x speedup in our cases.

Language Modelling

Extractive Summary as Discrete Latent Variables

no code implementations14 Nov 2018 Aran Komatsuzaki

In this paper, we compare various methods to compress a text using a neural model.

Language Modelling Text Generation

Cannot find the paper you are looking for? You can Submit a new open access paper.