Common Sense Reasoning

255 papers with code • 24 benchmarks • 52 datasets

Common sense reasoning tasks are intended to require the model to go beyond pattern recognition. Instead, the model should use "common sense" or world knowledge to make inferences.

Benchmarks

Add a Result

These leaderboards are used to track progress in Common Sense Reasoning

Dataset	Best Model	Compare
WinoGrande	ST-MoE-32B 269B (fine-tuned)	See all
ARC (Challenge)	GPT-4 (few-shot, k=25)	See all
ARC (Easy)	ST-MoE-32B 269B (fine-tuned)	See all
CommonsenseQA	DeBERTaV3-large+KEAR	See all
ReCoRD	Turing NLR v5 XXL 5.4B (fine-tuned)	See all
BIG-bench (Disambiguation QA)	PaLM 2 (few-shot, k=3, Direct)	See all
BIG-bench (Causal Judgment)	PaLM 2 (few-shot, k=3, Direct)	See all
BIG-bench (Date Understanding)	PaLM 2 (few-shot, k=3, CoT)	See all
BIG-bench (Sports Understanding)	PaLM 2(few-shot, k=3, CoT)	See all
Event2Mind test	ConvNet	See all
Russian Event2Mind	ruscorpora word2vec (skipgram) + GRU	See all
RuCoS	Human Benchmark	See all
RWSD	Golden Transformer	See all
PARus	Human Benchmark	See all
SWAG	DeBERTalarge	See all
BIG-bench (Winowhy)	PaLM-540B (few-shot, k=5)	See all
BIG-bench (Known Unknowns)	PaLM-540B (few-shot, k=5)	See all
Event2Mind dev	ConvNet	See all
BIG-bench (Logical Sequence)	Chinchilla-70B (few-shot, k=5)	See all
CODAH	BERT Large	See all
Visual Dialog v0.9	PDUN	See all
CrowdSource QA	BERT	See all
Visual Dialog v0.9	NMN [kottur2018visual]	See all
WinoGAViL	ViLT	See all

Show all 24 benchmarks

Collapse benchmarks

Libraries

Use these libraries to find Common Sense Reasoning models and implementations

huggingface/transformers

9 papers

125,334

Tencent/TurboTransformers

3 papers

1,443

Leeroo-AI/mergoo

3 papers

250

volcengine/vegiantmodel

3 papers

197

See all 23 libraries.

Datasets

Subtasks

Anachronisms

Discourse Marker Prediction

Visual Commonsense Tests

Multiview Contextual Commonsense Inference

Latest papers with no code

Most implemented Social Latest No code

The Claude 3 Model Family: Opus, Sonnet, Haiku

no code yet • Preprint 2024

We introduce Claude 3, a new family of large multimodal models – Claude 3 Opus, our most capable offering, Claude 3 Sonnet, which provides a combination of skills and speed, and Claude 3 Haiku, our fastest and least expensive model.

Paper
Add Code

SERVAL: Synergy Learning between Vertical Models and LLMs towards Oracle-Level Zero-shot Medical Prediction

no code yet • 3 Mar 2024

Recent development of large language models (LLMs) has exhibited impressive zero-shot proficiency on generic and common sense questions.

Paper
Add Code

Know your exceptions: Towards an Ontology of Exceptions in Knowledge Representation

no code yet • 1 Mar 2024

Defeasible reasoning is a kind of reasoning where some generalisations may not be valid in all circumstances, that is general conclusions may fail in some cases.

Paper
Add Code

Commonsense Ontology Micropatterns

no code yet • 28 Feb 2024

The previously introduced Modular Ontology Modeling methodology (MOMo) attempts to mimic the human analogical process by using modular patterns to assemble more complex concepts.

Paper
Add Code

Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models

no code yet • 27 Feb 2024

For a LLM to be trustworthy, its confidence level should be well-calibrated with its actual performance.

Paper
Add Code

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

no code yet • 22 Feb 2024

To bridge this ``ideal-to-real'' gap, this paper presents \textbf{RobotScript}, a platform for 1) a deployable robot manipulation pipeline powered by code generation; and 2) a code generation benchmark for robot manipulation tasks in free-form natural language.

Paper
Add Code

EvoGrad: A Dynamic Take on the Winograd Schema Challenge with Human Adversaries

no code yet • 20 Feb 2024

Our results emphasize the challenge posed by EvoGrad: Even the best performing LLM, GPT-3. 5, achieves an accuracy of 65. 0% with an average error depth of 7. 2, a stark contrast to human performance of 92.

Paper
Add Code

Multi Task Inverse Reinforcement Learning for Common Sense Reward

no code yet • 17 Feb 2024

One of the challenges in applying reinforcement learning in a complex real-world environment lies in providing the agent with a sufficiently detailed reward function.

Paper
Add Code

Understanding In-Context Learning with a Pelican Soup Framework

no code yet • 16 Feb 2024

In this framework, we introduce (1) the notion of a common sense knowledge base, (2) a general formalism for natural language classification tasks, and the notion of (3) meaning association.

Paper
Add Code

OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models

no code yet • 16 Feb 2024

By leveraging the reasoning and generalizing abilities of foundation models, our method can understand free-form human instructions and perform effective open-set zero-shot navigation in diverse environments.

Paper
Add Code

Common Sense Reasoning

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result