🔔 Share your dataset with the ML community!

Filter by Modality

Filter by Task (clear)

Filter by Language

79 dataset results for Natural Language Inference

TERRa (Textual Entailment Recognition for Russian)

Textual Entailment Recognition has been proposed recently as a generic task that captures major semantic inference needs across many NLP applications, such as Question Answering, Information Retrieval, Information Extraction, and Text Summarization. This task requires to recognize, given two text fragments, whether the meaning of one text is entailed (can be inferred) from the other text.

7 PAPERS • 1 BENCHMARK

FarsTail

Natural Language Inference (NLI), also called Textual Entailment, is an important task in NLP with the goal of determining the inference relationship between a premise p and a hypothesis h. It is a three-class problem, where each pair (p, h) is assigned to one of these classes: "ENTAILMENT" if the hypothesis can be inferred from the premise, "CONTRADICTION" if the hypothesis contradicts the premise, and "NEUTRAL" if none of the above holds. There are large datasets such as SNLI, MNLI, and SciTail for NLI in English, but there are few datasets for poor-data languages like Persian. Persian (Farsi) language is a pluricentric language spoken by around 110 million people in countries like Iran, Afghanistan, and Tajikistan. FarsTail is the first relatively large-scale Persian dataset for NLI task. A total of 10,367 samples are generated from a collection of 3,539 multiple-choice questions. The train, validation, and test portions include 7,266, 1,537, and 1,564 instances, respectively.

6 PAPERS • 1 BENCHMARK

JGLUE

JGLUE, Japanese General Language Understanding Evaluation, is built to measure the general NLU ability in Japanese.

6 PAPERS • NO BENCHMARKS YET

RCB

RCB (Russian Commitment Bank)

The Russian Commitment Bank is a corpus of naturally occurring discourses whose final sentence contains a clause-embedding predicate under an entailment cancelling operator (question, modal, negation, antecedent of conditional).

6 PAPERS • 1 BENCHMARK

IMPPRES

An IMPlicature and PRESupposition diagnostic dataset (IMPPRES), consisting of >25k semiautomatically generated sentence pairs illustrating well-studied pragmatic inference types.

5 PAPERS • NO BENCHMARKS YET

LiDiRus

LiDiRus (Linguistic Diagnostic for Russian)

LiDiRus is a diagnostic dataset that covers a large volume of linguistic phenomena, while allowing you to evaluate information systems on a simple test of textual entailment recognition. See more details diagnostics.

5 PAPERS • 1 BENCHMARK

RuSentRel

RuSentRel is a corpus of analytical articles translated into Russian texts in the domain of international politics obtained from foreign authoritative sources. The collected articles contain both the author's opinion on the subject matter of the article and a large number of references mentioned between the participants of the described situations. In total, 73 large analytical texts were labeled with about 2000 relations.

5 PAPERS • NO BENCHMARKS YET

BioNLI (Biomedical Natural Language Inference)

BioNLI is a dataset in biomedical natural language inference. This dataset contains abstracts from biomedical literature and mechanistic premises generated with nine different strategies.

3 PAPERS • 1 BENCHMARK

IndoNLI

IndoNLI is the first human-elicited NLI dataset for Indonesian consisting of nearly 18K sentence pairs annotated by crowd workers and experts.

3 PAPERS • NO BENCHMARKS YET

JUSTICE (JUSTICE: A Dataset for Supreme Court’s Judgment Prediction)

The dataset contains 3304 cases from the Supreme Court of the United States from 1955 to 2021. Each case has the case's identifiers as well as the facts of the case and the decision outcome. Other related datasets rarely included the facts of the case which could prove to be helpful in natural language processing applications. One potential use case of this dataset is determining the outcome of a case using its facts.

3 PAPERS • NO BENCHMARKS YET

Pars-ABSA

Pars-ABSA is a manually annotated Persian dataset, Pars-ABSA, which is verified by 3 native Persian speakers. The dataset consists of 5,114 positive, 3,061 negative and 1,827 neutral data samples from 5,602 unique reviews.

3 PAPERS • NO BENCHMARKS YET

Tasksource

Huggingface Datasets is a great library, but it lacks standardization, and datasets require preprocessing work to be used interchangeably. tasksource automates this and facilitates reproducible multi-task learning scaling.

3 PAPERS • NO BENCHMARKS YET

WiLI-2018

WiLI-2018 is a benchmark dataset for monolingual written natural language identification. WiLI-2018 is a publicly available, free of charge dataset of short text extracts from Wikipedia. It contains 1000 paragraphs of 235 languages, totaling in 23500 paragraphs. WiLI is a classification dataset: Given an unknown paragraph written in one dominant language, it has to be decided which language it is.

3 PAPERS • NO BENCHMARKS YET

XWINO

XWINO is a multilingual collection of Winograd Schemas in six languages that can be used for evaluation of cross-lingual commonsense reasoning capabilities.

3 PAPERS • 1 BENCHMARK

esXNLI

esXNLI is a bilingual NLI dataset. It comprises 2,490 examples from 5 different genres that were originally annotated in Spanish, and translated into English by professional translators. It serves as a counterpoint to XNLI, which was originally annotated in English and translated into 14 other languages, including Spanish. The dataset was conceived to be used in conjunction with the XNLI development set to analyse the effect of translation in cross-lingual transfer learning.

3 PAPERS • NO BENCHMARKS YET

ArNLI

Natural Language Inference processes pairs of sentences to extract their semantic relations. Pair sentences are annotated with three classes (Contradictions, Entailment, Neutral).

2 PAPERS • NO BENCHMARKS YET

JaNLI

JaNLI (Japanese Adversarial Natural Language Inference)

The Japanese Adversarial NLI (JaNLI) dataset is designed to require understanding of Japanese linguistic phenomena and illuminate the vulnerabilities of models. Please see the paper Assessing the Generalization Capacity of Pre-trained Language Models through Japanese Adversarial Natural Language Inference for details.

2 PAPERS • NO BENCHMARKS YET

Mindgames

We generate epistemic reasoning problems using modal logic to target theory of mind (tom) in natural language processing models.

2 PAPERS • 1 BENCHMARK

NLI-TR (Natural Language Inference in Turkish)

Natural Language Inference in Turkish (NLI-TR) provides translations of two large English NLI datasets into Turkish and had a team of experts validate their translation quality and fidelity to the original labels.

2 PAPERS • NO BENCHMARKS YET

NewsPH-NLI

NewsPH-NLI is a sentence entailment benchmark dataset in the low-resource Filipino language.

2 PAPERS • NO BENCHMARKS YET

CANDOR Corpus (CANDOR = Conversation: A Naturalistic Dataset of Online Recordings)

The CANDOR corpus is a large, novel, multimodal corpus of 1,656 recorded conversations in spoken English. This 7+ million word, 850 hour corpus totals over 1TB of audio, video, and transcripts, with moment-to-moment measures of vocal, facial, and semantic expression, along with an extensive survey of speaker post conversation reflections.

1 PAPER • NO BENCHMARKS YET

DistNLI

This dataset is named as the DistNLI dataset, which is a synthesized benchmark aiming to probe neural network models from the aspect of conjunctions on distributivity in NLI task in American English. DistNLI consists of sentence minimal pairs (premise and hypothesis) differentiated with conjunction structure within the pair and distributivity-related linguistic phenomenon. DistNLI is compiled with 328 sentences so far (164 for distributive and 164 for ambiguous predicates), annotated by 4 proficient English speakers with a background in NLP and Linguistics. Due to the specificity of the linguistic phenomenon involved and its size, this DistNLI dataset should only be used as an adversarial dataset in the investigation of distributivity of verb predication.

1 PAPER • NO BENCHMARKS YET

GD-NLI

GD-NLI (Generated Debiased NLI Datasets)

This is a set of debiased Natural Language Inference (NLI) datasets produced by the paper Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets. The datasets are constructed by augmenting SNLI or MNLI with data samples that are generated to mitigate the spurious correlations in the original datasets. Please visit this repository for more details.

1 PAPER • NO BENCHMARKS YET

GQNLI-FR

GQNLI-FR is a manually translated French version of the GQNLI challenge dataset, originally written in English.

1 PAPER • NO BENCHMARKS YET

Gigaword Entailment

The Gigaword Entailment dataset is a dataset for entailment prediction between an article and its headline. It is built from the Gigaword dataset.

1 PAPER • NO BENCHMARKS YET

HANS

HANS (Heuristic Analysis for NLI Systems)

The HANS (Heuristic Analysis for NLI Systems) dataset which contains many examples where the heuristics fail.

1 PAPER • 1 BENCHMARK

NLI4Wills Corpus

NLI4Wills Corpus can be used to train transformers and sentence-transformer models for the validity evaluation of the legal will statements. Our dataset consists of ID numbers, three types of inputs (legal will statements, laws, and conditions) and classifications (support, refute, or unrelated).

1 PAPER • NO BENCHMARKS YET

Probability words NLI

Probability words NLI (Natural language inference with words estimative of probability (WEP))

This dataset tests the capabilities of language models to correctly capture the meaning of words denoting probabilities (WEP), e.g. words like "probably", "maybe", "surely", "impossible".

1 PAPER • 1 BENCHMARK

PropSegmEnt

PropSegmEnt is a corpus of over 35K propositions annotated by expert human raters. The dataset structure resembles the tasks of (1) segmenting sentences within a document to the set of propositions, and (2) classifying the entailment relation of each proposition with respect to a different yet topically-aligned document, i.e. documents describing the same event or entity.

1 PAPER • NO BENCHMARKS YET

RTE3-FR

RTE3-FR dataset is the French translation of the Textual Entailment English dataset used in the RTE-3 Challenge (https://nlp.stanford.edu/RTE3-pilot).

1 PAPER • NO BENCHMARKS YET

probability_words_nli

This dataset tests the capabilities of language models to correctly capture the meaning of words denoting probabilities (WEP, also called verbal probabilities), e.g. words like "probably", "maybe", "surely", "impossible".

1 PAPER • NO BENCHMARKS YET