Search Results for author: Karl Cobbe

Found 8 papers, 7 papers with code

Let's Verify Step by Step

3 code implementations Preprint 2023 Hunter Lightman, Vineet Kosaraju, Yura Burda, Harri Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe

We conduct our own investigation, finding that process supervision significantly outperforms outcome supervision for training models to solve problems from the challenging MATH dataset.

 Ranked #1 on Math Word Problem Solving on MATH minival (using extra training data)

Active Learning Math +2

Training Verifiers to Solve Math Word Problems

2 code implementations27 Oct 2021 Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, John Schulman

State-of-the-art language models can match human performance on many tasks, but they still struggle to robustly perform multi-step mathematical reasoning.

GSM8K Math +1

Batch size-invariance for policy optimization

1 code implementation1 Oct 2021 Jacob Hilton, Karl Cobbe, John Schulman

We say an algorithm is batch size-invariant if changes to the batch size can largely be compensated for by changes to other hyperparameters.

Phasic Policy Gradient

3 code implementations9 Sep 2020 Karl Cobbe, Jacob Hilton, Oleg Klimov, John Schulman

We introduce Phasic Policy Gradient (PPG), a reinforcement learning framework which modifies traditional on-policy actor-critic methods by separating policy and value function training into distinct phases.

 Ranked #1 on Reinforcement Learning (RL) on ProcGen (using extra training data)

Reinforcement Learning (RL)

Leveraging Procedural Generation to Benchmark Reinforcement Learning

6 code implementations ICML 2020 Karl Cobbe, Christopher Hesse, Jacob Hilton, John Schulman

We introduce Procgen Benchmark, a suite of 16 procedurally generated game-like environments designed to benchmark both sample efficiency and generalization in reinforcement learning.

Procgen Hard (100M) reinforcement-learning +1

Cannot find the paper you are looking for? You can Submit a new open access paper.