Mathematical Reasoning
119 papers with code • 4 benchmarks • 14 datasets
Libraries
Use these libraries to find Mathematical Reasoning models and implementationsSubtasks
Most implemented papers
PAL: Program-aided Language Models
Much of this success can be attributed to prompting methods such as "chain-of-thought'', which employ LLMs for both understanding the problem description by decomposing it into steps, as well as solving each step of the problem.
Reasoning with Language Model Prompting: A Survey
Reasoning, as an essential ability for complex problem-solving, can provide back-end support for various real-world applications, such as medical diagnosis, negotiation, etc.
Mathematical Capabilities of ChatGPT
We investigate the mathematical capabilities of two iterations of ChatGPT (released 9-January-2023 and 30-January-2023) and of GPT-4 by testing them on publicly available datasets, as well as hand-crafted ones, using a novel methodology.
Sparks of Artificial General Intelligence: Early experiments with GPT-4
We contend that (this early version of) GPT-4 is part of a new cohort of LLMs (along with ChatGPT and Google's PaLM for example) that exhibit more general intelligence than previous AI models.
Self-Refine: Iterative Refinement with Self-Feedback
Motivated by how humans refine their written text, we introduce Self-Refine, an approach for improving initial outputs from LLMs through iterative feedback and refinement.
SNIP: Bridging Mathematical Symbolic and Numeric Realms with Unified Pre-training
To bridge the gap, we introduce SNIP, a Symbolic-Numeric Integrated Pre-training model, which employs contrastive learning between symbolic and numeric domains, enhancing their mutual similarities in the embeddings.
How Abilities in Large Language Models are Affected by Supervised Fine-tuning Data Composition
We propose four intriguing research questions to explore the association between model performance and various factors including data amount, composition ratio, model size and SFT strategies.
Autonomous Data Selection with Language Models for Mathematical Texts
Our method showcases a 2 times increase in pretraining token efficiency compared to state-of-the-art baselines, underscoring the potential of our approach in enhancing models' mathematical reasoning capabilities.
Evaluating Mathematical Reasoning Beyond Accuracy
To measure reasoning beyond final-answer accuracy, we introduce ReasonEval, a new methodology for evaluating the quality of reasoning steps.
Sample-Efficient Human Evaluation of Large Language Models via Maximum Discrepancy Competition
The past years have witnessed a proliferation of large language models (LLMs).