Natural language inference is the task of determining whether a "hypothesis" is true (entailment), false (contradiction), or undetermined (neutral) given a "premise".
|A man inspects the uniform of a figure in some East Asian country.||contradiction||The man is sleeping.|
|An older and younger man smiling.||neutral||Two men are smiling and laughing at the cats playing on the floor.|
|A soccer game with multiple males playing.||entailment||Some men are playing a sport.|
|TREND||DATASET||BEST METHOD||PAPER TITLE||PAPER||CODE||COMPARE|
By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do.
Ranked #1 on Language Modelling on Penn Treebank (Word Level) (using extra training data)
COMMON SENSE REASONING COREFERENCE RESOLUTION DOMAIN ADAPTATION FEW-SHOT LEARNING LANGUAGE MODELLING NATURAL LANGUAGE INFERENCE QUESTION ANSWERING SENTENCE COMPLETION UNSUPERVISED MACHINE TRANSLATION WORD SENSE DISAMBIGUATION
In this work, we focus on the task of natural language inference (NLI) and address the following question: can we build NLI systems which produce labels with high accuracy, while also generating faithful explanations of its decisions?
This work explores the application of textual entailment in news claim verification and stance prediction using a new corpus in Arabic.
Comparative constructions pose a challenge in Natural Language Inference (NLI), which is the task of determining whether a text entails a hypothesis.
While deep learning models are making fast progress on the task of Natural Language Inference, recent studies have also shown that these models achieve high accuracy by exploiting several dataset biases, and without deep understanding of the language semantics.
Recently, there has been much interest in the question of whether deep natural language understanding models exhibit systematicity; generalizing such that units like words make consistent contributions to the meaning of the sentences in which they appear.
To ensure a common evaluation scheme and promote models that generalize to different NLU tasks, the benchmark includes datasets from varying domains and applications.