Common Sense Reasoning

255 papers with code • 24 benchmarks • 52 datasets

Common sense reasoning tasks are intended to require the model to go beyond pattern recognition. Instead, the model should use "common sense" or world knowledge to make inferences.

Latest papers with no code

The Claude 3 Model Family: Opus, Sonnet, Haiku

no code yet • Preprint 2024

We introduce Claude 3, a new family of large multimodal models – Claude 3 Opus, our most capable offering, Claude 3 Sonnet, which provides a combination of skills and speed, and Claude 3 Haiku, our fastest and least expensive model.

SERVAL: Synergy Learning between Vertical Models and LLMs towards Oracle-Level Zero-shot Medical Prediction

no code yet • 3 Mar 2024

Recent development of large language models (LLMs) has exhibited impressive zero-shot proficiency on generic and common sense questions.

Know your exceptions: Towards an Ontology of Exceptions in Knowledge Representation

no code yet • 1 Mar 2024

Defeasible reasoning is a kind of reasoning where some generalisations may not be valid in all circumstances, that is general conclusions may fail in some cases.

Commonsense Ontology Micropatterns

no code yet • 28 Feb 2024

The previously introduced Modular Ontology Modeling methodology (MOMo) attempts to mimic the human analogical process by using modular patterns to assemble more complex concepts.

Fact-and-Reflection (FaR) Improves Confidence Calibration of Large Language Models

no code yet • 27 Feb 2024

For a LLM to be trustworthy, its confidence level should be well-calibrated with its actual performance.

RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation

no code yet • 22 Feb 2024

To bridge this ``ideal-to-real'' gap, this paper presents \textbf{RobotScript}, a platform for 1) a deployable robot manipulation pipeline powered by code generation; and 2) a code generation benchmark for robot manipulation tasks in free-form natural language.

EvoGrad: A Dynamic Take on the Winograd Schema Challenge with Human Adversaries

no code yet • 20 Feb 2024

Our results emphasize the challenge posed by EvoGrad: Even the best performing LLM, GPT-3. 5, achieves an accuracy of 65. 0% with an average error depth of 7. 2, a stark contrast to human performance of 92.

Multi Task Inverse Reinforcement Learning for Common Sense Reward

no code yet • 17 Feb 2024

One of the challenges in applying reinforcement learning in a complex real-world environment lies in providing the agent with a sufficiently detailed reward function.

Understanding In-Context Learning with a Pelican Soup Framework

no code yet • 16 Feb 2024

In this framework, we introduce (1) the notion of a common sense knowledge base, (2) a general formalism for natural language classification tasks, and the notion of (3) meaning association.

OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models

no code yet • 16 Feb 2024

By leveraging the reasoning and generalizing abilities of foundation models, our method can understand free-form human instructions and perform effective open-set zero-shot navigation in diverse environments.