COSTRA 1.0

Introduced by Barancikova et al. in COSTRA 1.0: A Dataset of Complex Sentence Transformations

COSTRA 1.0 is a dataset of complex sentence transformations. The dataset is intended for the study of sentence-level embeddings beyond simple word alternations or standard paraphrasing. The first version of the dataset is limited to sentences in Czech but the construction method is universal and the authors plan to use it also for other languages. The dataset consist of 4,262 unique sentences with average length of 10 words, illustrating 15 types of modifications such as simplification, generalization, or formal and informal language variation.

Source: COSTRA 1.0: A Dataset of Complex Sentence Transformations

Homepage

Benchmarks

Add a new result Link an existing benchmark

No benchmarks yet. Start a new benchmark or link an existing one.

Papers

Paper	Code	Results	Date	Stars

Dataset Loaders

Add Remove

No data loaders found. You can submit your data loader here.

Tasks

Similar Datasets

SentEval

License

Unknown

Modalities

Texts

Languages

Czech

COSTRA 1.0

Benchmarks Edit Add a new result Link an existing benchmark

Papers

Dataset Loaders Edit Add Remove

Tasks Edit