Winograd Automatic (Winograd)

Introduced by Taktasheva et al. in TAPE: Assessing Few-shot Russian Language Understanding

The Winograd schema challenge composes tasks with syntactic ambiguity, which can be resolved with logic and reasoning.

Motivation

The dataset presents an extended version of a traditional Winograd challenge (Levesque et al., 2012): each sentence contains unresolved homonymy, which can be resolved based on commonsense and reasoning. The Winograd scheme is extendable with the real-life sentences filtered out of the National Corpora with a set of 11 syntactic queries, extracting sentences like "Katya asked Masha if she..." (two possible references to a pronoun), "A change of scenery that..." (Noun phrase & subordinate clause with "that" in the same gender and number), etc. The extraction pipeline can be adjusted to various languages depending on the set of ambiguous syntactic constructions possible.

An example in English for illustration purposes:

{ ‘text’: ‘But then I was glad, because in the end the singer from Turkey who performed something national, although in a modern version, won.’, ‘answer’: ‘singer’, ‘label’: 1, ‘options’: [‘singer’, ‘Turkey’], ‘reference’: ‘who’, ‘homonymia_type’: ‘1.1’, episode: [15], ‘perturbation’ : ‘winograd’ }

Data Fields

  • text: a string containing the sentence text
  • answer: a string with a candidate for the coreference resolution
  • options: a list of all the possible candidates present in the text
  • reference: a string containing an anaphor (a word or phrase that refers back to an earlier word or phrase)
  • homonymia_type: a float corresponding to the type of the structure with syntactic homonymy
  • label: an integer, either 0 or 1, indicating whether the homonymy is resolved correctly or not
  • perturbation: a string containing the name of the perturbation applied to text. If no perturbation was applied, the dataset name is used
  • episode: a list of episodes in which the instance is used. Only used for the train set

Data Splits

The dataset consists of a training set with labeled examples and a test set in two configurations:

  • raw data: includes the original data with no additional sampling
  • episodes: data is split into evaluation episodes and includes several perturbations of test for robustness evaluation

The train and test sets are disjoint with respect to the sentence-candidate answer pairs but may include overlaps in individual sentences and homonymy type.

Test Perturbations

Each training episode in the dataset corresponds to six test variations, including the original test data and five adversarial test sets, acquired through the modification of the original test through the following text perturbations:

  • ButterFingers: randomly adds noise to data by mimicking spelling mistakes made by humans through character swaps based on their keyboard distance
  • Emojify: replaces the input words with the corresponding emojis, preserving their original meaning
  • EDAdelete: randomly deletes tokens in the text
  • EDAswap: randomly swaps tokens in the text
  • AddSent: generates extra words or a sentence at the end of the text

Papers


Paper Code Results Date Stars

Dataset Loaders


No data loaders found. You can submit your data loader here.

Tasks


Similar Datasets


License


Modalities


Languages