CheGeKa is a Jeopardy!-like Russian QA dataset collected from the official Russian quiz database ChGK.
Motivation
The task can be considered the most challenging in terms of reasoning, knowledge and logic, as the task implies the QA pairs with a free response form (no answer choices); however, a long chain of causal relationships between facts and associations forms the correct answer.
The original corpus of the CheGeKa game was introduced in Mikhalkova (2021).
An example in English for illustration purposes:
```{ 'question_id': 3665,
'question': 'THIS MAN replaced John Lennon when the Beatles got together for the last time.',
'answer': 'Julian Lennon',
'topic': 'The Liverpool Four',
'author': 'Bayram Kuliyev',
'tour_name': 'Jeopardy!. Ashgabat-1996',
'tour_link': 'https://db.chgk.info/tour/ash96sv',
'episode': [16],
'perturbation': 'chegeka'
}```
Data Fields
Data Splits
The dataset consists of a training set with labeled examples and a test set in two configurations:
Test Perturbations
Each training episode in the dataset corresponds to seven test variations, including the original test data and six adversarial test sets, acquired through the modification of the original test through the following text perturbations:
Paper | Code | Results | Date | Stars |
---|