TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Zero-Shot Text Classification	This is not a Dataset	Flan-T5-xxl	Accuracy	66.1	# 1
Zero-Shot Text Classification	This is not a Dataset	Flan-T5-xxl	Coherence	0.9	# 1
Zero-Shot Text Classification	This is not a Dataset	Falcon40B-instruct	Accuracy	54.7	# 4
Zero-Shot Text Classification	This is not a Dataset	Falcon40B-instruct	Coherence	0.1	# 3
Zero-Shot Text Classification	This is not a Dataset	WizardLM 30B	Accuracy	57.3	# 3
Zero-Shot Text Classification	This is not a Dataset	WizardLM 30B	Coherence	0.0	# 4
Zero-Shot Text Classification	This is not a Dataset	Vicuna 13B v1.1	Accuracy	57.8	# 2
Zero-Shot Text Classification	This is not a Dataset	Vicuna 13B v1.1	Coherence	0.2	# 2
Zero-Shot Text Classification	This is not a Dataset	LlaMA 65B	Accuracy	50.3	# 5
Zero-Shot Text Classification	This is not a Dataset	LlaMA 65B	Coherence	0.0	# 4
Text Classification	This is not a Dataset	Flan-T5-xxl	Accuracy	94.1	# 2
Text Classification	This is not a Dataset	Flan-T5-xxl	Coherence	51.8	# 2
Text Classification	This is not a Dataset	Vicuna13B v1.1	Accuracy	95.7	# 1
Text Classification	This is not a Dataset	Vicuna13B v1.1	Coherence	81.2	# 1

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/this-is-not-a-dataset-a-large-negation/zero-shot-text-classification-on-this-is-not)](https://paperswithcode.com/sota/zero-shot-text-classification-on-this-is-not?p=this-is-not-a-dataset-a-large-negation)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/this-is-not-a-dataset-a-large-negation/text-classification-on-this-is-not-a-dataset)](https://paperswithcode.com/sota/text-classification-on-this-is-not-a-dataset?p=this-is-not-a-dataset-a-large-negation)`

This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models

24 Oct 2023 · Iker García-Ferrero, Begoña Altuna, Javier Álvez, Itziar Gonzalez-Dios, German Rigau ·

Although large language models (LLMs) have apparently acquired a certain level of grammatical knowledge and the ability to make generalizations, they fail to interpret negation, a crucial step in Natural Language Processing. We try to clarify the reasons for the sub-optimal performance of LLMs understanding negation. We introduce a large semi-automatically generated dataset of circa 400,000 descriptive sentences about commonsense knowledge that can be true or false in which negation is present in about 2/3 of the corpus in different forms. We have used our dataset with the largest available open LLMs in a zero-shot approach to grasp their generalization and inference capability and we have also fine-tuned some of the models to assess whether the understanding of negation can be trained. Our findings show that, while LLMs are proficient at classifying affirmative sentences, they struggle with negative sentences and lack a deep understanding of negation, often relying on superficial cues. Although fine-tuning the models on negative sentences improves their performance, the lack of generalization in handling negation is persistent, highlighting the ongoing challenges of LLMs regarding negation understanding and generalization. The dataset and code are publicly available.

PDF Abstract

Code

Add Remove Mark official

hitz-zentroa/this-is-not-a-dataset official

Tasks

Add Remove

Descriptive

Negation

Text Classification

Zero-Shot Text Classification

Datasets

Introduced in the Paper:

This is not a Dataset

Results from the Paper

Add Remove

Ranked #1 on Zero-Shot Text Classification on This is not a Dataset

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Zero-Shot Text Classification	This is not a Dataset	Flan-T5-xxl	Accuracy	66.1	# 1	Compare
Zero-Shot Text Classification	This is not a Dataset	Flan-T5-xxl	Coherence	0.9	# 1	Compare
Zero-Shot Text Classification	This is not a Dataset	Falcon40B-instruct	Accuracy	54.7	# 4	Compare
Zero-Shot Text Classification	This is not a Dataset	Falcon40B-instruct	Coherence	0.1	# 3	Compare
Zero-Shot Text Classification	This is not a Dataset	WizardLM 30B	Accuracy	57.3	# 3	Compare
Zero-Shot Text Classification	This is not a Dataset	WizardLM 30B	Coherence	0.0	# 4	Compare
Zero-Shot Text Classification	This is not a Dataset	Vicuna 13B v1.1	Accuracy	57.8	# 2	Compare
Zero-Shot Text Classification	This is not a Dataset	Vicuna 13B v1.1	Coherence	0.2	# 2	Compare
Zero-Shot Text Classification	This is not a Dataset	LlaMA 65B	Accuracy	50.3	# 5	Compare
Zero-Shot Text Classification	This is not a Dataset	LlaMA 65B	Coherence	0.0	# 4	Compare
Text Classification	This is not a Dataset	Flan-T5-xxl	Accuracy	94.1	# 2	Compare
Text Classification	This is not a Dataset	Flan-T5-xxl	Coherence	51.8	# 2	Compare
Text Classification	This is not a Dataset	Vicuna13B v1.1	Accuracy	95.7	# 1	Compare
Text Classification	This is not a Dataset	Vicuna13B v1.1	Coherence	81.2	# 1	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

This is not a Dataset: A Large Negation Benchmark to Challenge Large Language Models

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove