no code implementations • NAACL (SIGMORPHON) 2022 • Niyata Bafna, Zdeněk Žabokrtský
Word embeddings are growing to be a crucial resource in the field of NLP for any language.
no code implementations • LREC 2022 • Lukáš Kyjánek, Olga Lyashevskaya, Anna Nedoluzhko, Daniil Vodolazsky, Zdeněk Žabokrtský
Therefore, we devote this paper to improving one of the methods of constructing such resources and to the application of the method to a Russian lexicon, which results in the creation of the largest lexical resource of Russian derivational relations.
no code implementations • LREC 2022 • Anna Nedoluzhko, Michal Novák, Martin Popel, Zdeněk Žabokrtský, Amir Zeldes, Daniel Zeman
Recent advances in standardization for annotated language resources have led to successful large scale efforts, such as the Universal Dependencies (UD) project for multilingual syntactically annotated data.
no code implementations • Findings (EMNLP) 2021 • Martin Popel, Zdeněk Žabokrtský, Anna Nedoluzhko, Michal Novák, Daniel Zeman
One can find dozens of data resources for various languages in which coreference - a relation between two or more expressions that refer to the same real-world entity - is manually annotated.
no code implementations • CL (ACL) 2020 • Zdeněk Žabokrtský, Daniel Zeman, Magda Ševčíková
This article gives an overview of how sentence meaning is represented in eleven deep-syntactic frameworks, ranging from those based on linguistic theories elaborated for decades to rather lightweight NLP-motivated approaches.
no code implementations • LREC 2022 • Zdeněk Žabokrtský, Niyati Bafna, Jan Bodnár, Lukáš Kyjánek, Emil Svoboda, Magda Ševčíková, Jonáš Vidra
Our work aims at developing a multilingual data resource for morphological segmentation.
1 code implementation • CRAC (ACL) 2022 • Zdeněk Žabokrtský, Miloslav Konopík, Anna Nedoluzhko, Michal Novák, Maciej Ogrodniczuk, Martin Popel, Ondřej Pražák, Jakub Sido, Daniel Zeman, YIlun Zhu
The public edition of CorefUD 1. 0, which contains 13 datasets for 10 languages, was used as the source of training and evaluation data.
1 code implementation • NAACL (SIGMORPHON) 2022 • Khuyagbaatar Batsuren, Gábor Bella, Aryaman Arora, Viktor Martinović, Kyle Gorman, Zdeněk Žabokrtský, Amarsanaa Ganbold, Šárka Dohnalová, Magda Ševčíková, Kateřina Pelegrinová, Fausto Giunchiglia, Ryan Cotterell, Ekaterina Vylomova
The SIGMORPHON 2022 shared task on morpheme segmentation challenged systems to decompose a word into a sequence of morphemes and covered most types of morphology: compounds, derivations, and inflections.
Ranked #8 on Morpheme Segmentaiton on UniMorph 4.0
1 code implementation • 22 Aug 2019 • Rudolf Rosa, Zdeněk Žabokrtský
We focus on the task of unsupervised lemmatization, i. e. grouping together inflected forms of one word under one label (a lemma) without the use of annotated training data.