Search Results for author: Milan Straka

Found 45 papers, 18 papers with code

ÚFAL LatinPipe at EvaLatin 2024: Morphosyntactic Analysis of Latin

1 code implementation • 8 Apr 2024 • Milan Straka, Jana Straková, Federica Gamba

Our system consists of a fine-tuned concatenation of base and large pre-trained LMs, with a dot-product attention head for parsing and softmax classification heads for morphology to jointly learn both dependency parsing and morphological analysis.

Dependency Parsing Morphological Analysis

Paper
Code

Practical End-to-End Optical Music Recognition for Pianoform Music

1 code implementation • 20 Mar 2024 • Jiří Mayer, Milan Straka, Jan Hajič jr., Pavel Pecina

(c) We train and fine-tune an end-to-end model to serve as a baseline on the dataset and employ the TEDn metric to evaluate the model.

Benchmarking

Paper
Code

ÚFAL CorPipe at CRAC 2023: Larger Context Improves Multilingual Coreference Resolution

1 code implementation • 24 Nov 2023 • Milan Straka

We present CorPipe, the winning entry to the CRAC 2023 Shared Task on Multilingual Coreference Resolution.

coreference-resolution Language Modelling

Paper
Code

DaMuEL: A Large Multilingual Dataset for Entity Linking

no code implementations • 15 Jun 2023 • David Kubeša, Milan Straka

The dataset contains 27. 9M named entities in the knowledge base and 12. 3G tokens from Wikipedia texts.

Entity Linking

Paper
Add Code

Quality and Efficiency of Manual Annotation: Pre-annotation Bias

no code implementations • LREC 2022 • Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková, Jan Hajič

This paper presents an analysis of annotation using an automatic pre-annotation for a mid-level annotation complexity task -- dependency syntax annotation.

Paper
Add Code

ÚFAL CorPipe at CRAC 2022: Effectivity of Multilingual Models for Coreference Resolution

1 code implementation • CRAC (ACL) 2022 • Milan Straka, Jana Straková

We describe the winning submission to the CRAC 2022 Shared Task on Multilingual Coreference Resolution.

coreference-resolution

Paper
Code

Czech Grammar Error Correction with a Large and Diverse Corpus

no code implementations • 14 Jan 2022 • Jakub Náplava, Milan Straka, Jana Straková, Alexandr Rosen

We introduce a large and diverse Czech corpus annotated for grammatical error correction (GEC) with the aim to contribute to the still scarce data resources in this domain for languages other than English.

Grammatical Error Correction

Paper
Add Code

Character Transformations for Non-Autoregressive GEC Tagging

1 code implementation • WNUT (ACL) 2021 • Milan Straka, Jakub Náplava, Jana Straková

We propose a character-based nonautoregressive GEC approach, with automatically generated character transformations.

Paper
Code

ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5

1 code implementation • WNUT (ACL) 2021 • David Samuel, Milan Straka

We present the winning entry to the Multilingual Lexical Normalization (MultiLexNorm) shared task at W-NUT 2021 (van der Goot et al., 2021a), which evaluates lexical-normalization systems on 12 social media datasets in 11 languages.

Dependency Parsing Language Modelling +1

Paper
Code

Understanding Model Robustness to User-generated Noisy Texts

1 code implementation • WNUT (ACL) 2021 • Jakub Náplava, Martin Popel, Milan Straka, Jana Straková

We also compare two approaches to address the performance drop: a) training the NLP models with noised data generated by our framework; and b) reducing the input noise with external system for natural language correction.

Grammatical Error Correction Machine Translation +5

Paper
Code

RobeCzech: Czech RoBERTa, a monolingual contextualized language representation model

no code implementations • 24 May 2021 • Milan Straka, Jakub Náplava, Jana Straková, David Samuel

We present RobeCzech, a monolingual RoBERTa language representation model trained on Czech data.

Ranked #1 on Semantic Parsing on PTG (czech, MRP 2020)

Semantic Parsing

Paper
Add Code

Diacritics Restoration using BERT with Analysis on Czech language

1 code implementation • 24 May 2021 • Jakub Náplava, Milan Straka, Jana Straková

We propose a new architecture for diacritics restoration based on contextualized embeddings, namely BERT, and we evaluate it on 12 languages with diacritics.

Ranked #1 on Czech Text Diacritization on Multilingual Dataset for Training and Evaluating Diacritics Restoration Systems

Croatian Text Diacritization Czech Text Diacritization +10

Paper
Code

A matrix approach to detect temporal behavioral patterns at electric vehicle charging stations

no code implementations • 18 Feb 2021 • Milan Straka, Lucia Piatriková, Peter van Bokhoven, Ľuboš Buzna

Based on the electric vehicle (EV) arrival times and the duration of EV connection to the charging station, we identify charging patterns and derive groups of charging stations with similar charging patterns applying two approaches.

Clustering

Paper
Add Code

ÚFAL at MRP 2020: Permutation-invariant Semantic Parsing in PERIN

2 code implementations • 2 Nov 2020 • David Samuel, Milan Straka

PERIN was one of the winners of the shared task.

Ranked #1 on Semantic Parsing on DRG (english, MRP 2020)

Semantic Parsing Sentence

Paper
Code

\'UFAL at MRP 2020: Permutation-invariant Semantic Parsing in PERIN

1 code implementation • CONLL 2020 • David Samuel, Milan Straka

PERIN was one of the winners of the shared task.

Semantic Parsing Sentence

Paper
Code

Reading Comprehension in Czech via Machine Translation and Cross-lingual Transfer

no code implementations • 3 Jul 2020 • Kateřina Macková, Milan Straka

We report that a XLM-RoBERTa model trained on English data and evaluated on Czech achieves very competitive performance, only approximately 2 percent points worse than a~model trained on the translated Czech data.

Cross-Lingual Transfer Machine Translation +2

Paper
Add Code

Prague Dependency Treebank -- Consolidated 1.0

no code implementations • 5 Jun 2020 • Jan Hajič, Eduard Bejček, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková

We present a richly annotated and genre-diversified language resource, the Prague Dependency Treebank-Consolidated 1. 0 (PDT-C 1. 0), the purpose of which is - as it always been the case for the family of the Prague Dependency Treebanks - to serve both as a training data for various types of NLP tasks as well as for linguistically-oriented research.

Translation

Paper
Add Code

UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings

no code implementations • LREC 2020 • Milan Straka, Jana Straková

We present our contribution to the EvaLatin shared task, which is the first evaluation campaign devoted to the evaluation of NLP tools for Latin.

Lemmatization POS +1

Paper
Add Code

Explaining the distribution of energy consumption at slow charging infrastructure for electric vehicles from socio-economic data

no code implementations • 2 Jun 2020 • Milan Straka, Rui Carvalho, Gijs van der Poel, Ľuboš Buzna

We identified the most influential features correlated with energy consumption, indicating that the spatial context of the charging infrastructure affects its utilization pattern.

regression Variable Selection

Paper
Add Code

Prague Dependency Treebank - Consolidated 1.0

no code implementations • LREC 2020 • Jan Haji{\v{c}}, Eduard Bej{\v{c}}ek, Jaroslava Hlavacova, Marie Mikulov{\'a}, Milan Straka, Jan {\v{S}}t{\v{e}}p{\'a}nek, Barbora {\v{S}}t{\v{e}}p{\'a}nkov{\'a}

Translation

Paper
Add Code

MRP 2019: Cross-Framework Meaning Representation Parsing

no code implementations • CONLL 2019 • Stephan Oepen, Omri Abend, Jan Hajic, Daniel Hershcovich, Marco Kuhlmann, Tim O{'}Gorman, Nianwen Xue, Jayeol Chun, Milan Straka, Zdenka Uresova

The 2019 Shared Task at the Conference for Computational Language Learning (CoNLL) was devoted to Meaning Representation Parsing (MRP) across frameworks.

Sentence

Paper
Add Code

\'UFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task

1 code implementation • CONLL 2019 • Milan Straka, Jana Strakov{\'a}

We present a system description of our contribution to the CoNLL 2019 shared task, CrossFramework Meaning Representation Parsing (MRP 2019).

Dependency Parsing Lemmatization +3

Paper
Code

ÚFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task

1 code implementation • 24 Oct 2019 • Milan Straka, Jana Straková

We present a system description of our contribution to the CoNLL 2019 shared task, Cross-Framework Meaning Representation Parsing (MRP 2019).

Dependency Parsing Lemmatization +3

Paper
Code

Predicting popularity of EV charging infrastructure from GIS data

no code implementations • 6 Oct 2019 • Milan Straka, Pasquale De Falco, Gabriella Ferruzzi, Daniela Proto, Gijs van der Poel, Shahab Khormali, Ľuboš Buzna

The availability of charging infrastructure is essential for large-scale adoption of electric vehicles (EV).

Binary Classification regression

Paper
Add Code

Grammatical Error Correction in Low-Resource Scenarios

1 code implementation • WS 2019 • Jakub Náplava, Milan Straka

Grammatical error correction in English is a long studied problem with many existing systems and datasets.

Ranked #2 on Grammatical Error Correction on Falko-MERLIN (using extra training data)

Grammatical Error Correction Machine Translation +1

Paper
Code

CUNI System for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction

no code implementations • WS 2019 • Jakub Náplava, Milan Straka

In this paper, we describe our systems submitted to the Building Educational Applications (BEA) 2019 Shared Task (Bryant et al., 2019).

Grammatical Error Correction NMT

Paper
Add Code

Czech Text Processing with Contextual Embeddings: POS Tagging, Lemmatization, Parsing and NER

no code implementations • 8 Sep 2019 • Milan Straka, Jana Straková, Jan Hajič

We evaluate two meth ods for precomputing such embeddings, BERT and Flair, on four Czech text processing tasks: part-of-speech (POS) tagging, lemmatization, dependency pars ing and named entity recognition (NER).

Dependency Parsing Lemmatization +6

Paper
Add Code

Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing

no code implementations • 20 Aug 2019 • Milan Straka, Jana Straková, Jan Hajič

We present an extensive evaluation of three recently proposed methods for contextualized embeddings on 89 corpora in 54 languages of the Universal Dependencies 2. 3 in three tasks: POS tagging, lemmatization, and dependency parsing.

Ranked #1 on Dependency Parsing on Universal Dependencies

Dependency Parsing Lemmatization +3

Paper
Add Code

UDPipe at SIGMORPHON 2019: Contextualized Embeddings, Regularization with Morphological Categories, Corpora Merging

no code implementations • WS 2019 • Milan Straka, Jana Straková, Jan Hajič

In the morphological analysis, our system placed tightly second: our morphological analysis accuracy was 93. 19, the winning system's 93. 23.

Lemmatization Morphological Analysis +1

Paper
Add Code

Neural Architectures for Nested NER through Linearization

1 code implementation • ACL 2019 • Jana Straková, Milan Straka, Jan Hajič

We propose two neural network architectures for nested named entity recognition (NER), a setting in which named entities may overlap and also be labeled with more than one label.

Ranked #3 on Nested Mention Recognition on ACE 2005

Hard Attention named-entity-recognition +4

Paper
Code

75 Languages, 1 Model: Parsing Universal Dependencies Universally

3 code implementations • IJCNLP 2019 • Dan Kondratyuk, Milan Straka

We present UDify, a multilingual multi-task model capable of accurately predicting universal part-of-speech, morphological features, lemmas, and dependency trees simultaneously for all 124 Universal Dependencies treebanks across 75 languages.

Ranked #2 on Dependency Parsing on French GSD

Dependency Parsing Zero-Shot Learning

217

Paper
Code

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs

1 code implementation • EMNLP 2018 • Daniel Kondratyuk, Tom{\'a}{\v{s}} Gaven{\v{c}}iak, Milan Straka, Jan Haji{\v{c}}

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings.

Lemmatization Machine Translation +4

Paper
Code

UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task

no code implementations • CONLL 2018 • Milan Straka

UDPipe is a trainable pipeline which performs sentence segmentation, tokenization, POS tagging, lemmatization and dependency parsing.

Ranked #6 on Dependency Parsing on Universal Dependencies

Dependency Parsing Lemmatization +5

Paper
Add Code

CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

no code implementations • CONLL 2018 • Daniel Zeman, Jan Haji{\v{c}}, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, Slav Petrov

Every year, the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.

Dependency Parsing Morphological Analysis +1

Paper
Add Code

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs

2 code implementations • 10 Aug 2018 • Daniel Kondratyuk, Tomáš Gavenčiak, Milan Straka, Jan Hajič

Lemmatization Part-Of-Speech Tagging +1

Paper
Code

Using Adversarial Examples in Natural Language Processing

no code implementations • LREC 2018 • Petr B{\v{e}}lohl{\'a}vek, Ond{\v{r}}ej Pl{\'a}tek, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Milan Straka

Image Classification

Paper
Add Code

Diacritics Restoration Using Neural Networks

1 code implementation • LREC 2018 • Jakub N{\'a}plava, Milan Straka, Pavel Stra{\v{n}}{\'a}k, Jan Haji{\v{c}}

Ranked #2 on Czech Text Diacritization on Multilingual Dataset for Training and Evaluating Diacritics Restoration Systems

Croatian Text Diacritization Czech Text Diacritization +10

Paper
Code

SumeCzech: Large Czech News-Based Summarization Dataset

no code implementations • LREC 2018 • Milan Straka, Nikita Mediankin, Tom Kocmi, Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Vojt{\v{e}}ch Hude{\v{c}}ek, Jan Haji{\v{c}}

Document Summarization Machine Translation +1

Paper
Add Code

CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

no code implementations • CONLL 2017 • Daniel Zeman, Martin Popel, Milan Straka, Jan Haji{\v{c}}, Joakim Nivre, Filip Ginter, Juhani Luotolahti, Sampo Pyysalo, Slav Petrov, Martin Potthast, Francis Tyers, Elena Badmaeva, Memduh Gokirmak, Anna Nedoluzhko, Silvie Cinkov{\'a}, Jan Haji{\v{c}} jr., Jaroslava Hlav{\'a}{\v{c}}ov{\'a}, V{\'a}clava Kettnerov{\'a}, Zde{\v{n}}ka Ure{\v{s}}ov{\'a}, Jenna Kanerva, Stina Ojala, Anna Missil{\"a}, Christopher D. Manning, Sebastian Schuster, Siva Reddy, Dima Taji, Nizar Habash, Herman Leung, Marie-Catherine de Marneffe, Manuela Sanguinetti, Maria Simi, Hiroshi Kanayama, Valeria de Paiva, Kira Droganova, H{\'e}ctor Mart{\'\i}nez Alonso, {\c{C}}a{\u{g}}r{\i} {\c{C}}{\"o}ltekin, Umut Sulubacak, Hans Uszkoreit, Vivien Macketanz, Aljoscha Burchardt, Kim Harris, Katrin Marheinecke, Georg Rehm, Tolga Kayadelen, Mohammed Attia, Ali Elkahky, Zhuoran Yu, Emily Pitler, Saran Lertpradit, M, Michael l, Jesse Kirchner, Hector Fern Alcalde, ez, Jana Strnadov{\'a}, Esha Banerjee, Ruli Manurung, Antonio Stella, Atsuko Shimada, Sookyoung Kwak, Gustavo Mendon{\c{c}}a, L, Tatiana o, Rattima Nitisaroj, Josie Li

The Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.

Dependency Parsing

Paper
Add Code

Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe

no code implementations • CONLL 2017 • Milan Straka, Jana Strakov{\'a}

A multilingual pipeline performing these steps can be trained using the Universal Dependencies project, which contains annotations of the described tasks for 50 languages in the latest release UD 2. 0.

Dependency Parsing Lemmatization +2

Paper
Add Code

Neural Networks for Multi-Word Expression Detection

no code implementations • WS 2017 • Natalia Klyueva, Antoine Doucet, Milan Straka

In this paper we describe the MUMULS system that participated to the 2017 shared task on automatic identification of verbal multiword expressions (VMWEs).

Machine Translation

Paper
Add Code

UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing

no code implementations • LREC 2016 • Milan Straka, Jan Haji{\v{c}}, Jana Strakov{\'a}

Automatic natural language processing of large texts often presents recurring challenges in multiple languages: even for most advanced tasks, the texts are first processed by basic processing steps {--} from tokenization to parsing.

Dependency Parsing Lemmatization +4

Paper
Add Code

Merging Data Resources for Inflectional and Derivational Morphology in Czech

no code implementations • LREC 2016 • Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Magda {\v{S}}ev{\v{c}}{\'\i}kov{\'a}, Milan Straka, Jon{\'a}{\v{s}} Vidra, Ad{\'e}la Limbursk{\'a}

The paper deals with merging two complementary resources of morphological data previously existing for Czech, namely the inflectional dictionary MorfFlex CZ and the recently developed lexical network DeriNet.

Lemmatization Morphological Analysis

Paper
Add Code

Open-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition

no code implementations • ACL 2014 • Jana Strakov{\'a}, Milan Straka, Jan Haji{\v{c}}

Lemmatization Morphological Analysis +6

Paper
Add Code

Stop-probability estimates computed on a large corpus improve Unsupervised Dependency Parsing

no code implementations • ACL 2013 • David Mare{\v{c}}ek, Milan Straka

Unsupervised Dependency Parsing

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.