GSM8K
68 papers with code • 0 benchmarks • 0 datasets
Benchmarks
These leaderboards are used to track progress in GSM8K
Libraries
Use these libraries to find GSM8K models and implementationsMost implemented papers
Autonomous Data Selection with Language Models for Mathematical Texts
Our method showcases a 2 times increase in pretraining token efficiency compared to state-of-the-art baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities.
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Chain-of-thought prompting combined with pre-trained large language models has achieved encouraging results on complex reasoning tasks.
Learning Math Reasoning from Self-Sampled Correct and Partially-Correct Solutions
We show that our use of self-sampled correct and partially-correct solutions can benefit learning and help guide the sampling process, leading to more efficient exploration of the solution space.
Distilling Reasoning Capabilities into Smaller Language Models
In this work, we propose an alternative reasoning scheme, Socratic CoT, that learns a decomposition of the original problem into a sequence of subproblems and uses it to guide the intermediate reasoning steps.
Large Language Models Are Latent Variable Models: Explaining and Finding Good Demonstrations for In-Context Learning
This study aims to examine the in-context learning phenomenon through a Bayesian lens, viewing real-world LLMs as latent variable models.
Boosted Prompt Ensembles for Large Language Models
Methods such as chain-of-thought prompting and self-consistency have pushed the frontier of language model reasoning performance with no additional training.
Solving Math Word Problems by Combining Language Models With Symbolic Solvers
Automatically generating high-quality step-by-step solutions to math word problems has many applications in education.
Progressive-Hint Prompting Improves Reasoning in Large Language Models
The performance of Large Language Models (LLMs) in reasoning tasks depends heavily on prompt design, with Chain-of-Thought (CoT) and self-consistency being critical methods that enhance this ability.
Automatic Model Selection with Large Language Models for Reasoning
Chain-of-Thought (CoT) and Program-Aided Language Models (PAL) represent two distinct reasoning methods, each with its own strengths.
PaD: Program-aided Distillation Can Teach Small Models Reasoning Better than Chain-of-thought Fine-tuning
While large language models (LLMs) excel in various natural language processing tasks, their huge size and the inaccessibility of parameters present challenges for practical deployment.