Search Results for author: Milan Straka

Found 45 papers, 18 papers with code

ÚFAL LatinPipe at EvaLatin 2024: Morphosyntactic Analysis of Latin

1 code implementation8 Apr 2024 Milan Straka, Jana Straková, Federica Gamba

Our system consists of a fine-tuned concatenation of base and large pre-trained LMs, with a dot-product attention head for parsing and softmax classification heads for morphology to jointly learn both dependency parsing and morphological analysis.

Dependency Parsing Morphological Analysis

Practical End-to-End Optical Music Recognition for Pianoform Music

1 code implementation20 Mar 2024 Jiří Mayer, Milan Straka, Jan Hajič jr., Pavel Pecina

(c) We train and fine-tune an end-to-end model to serve as a baseline on the dataset and employ the TEDn metric to evaluate the model.

Benchmarking

ÚFAL CorPipe at CRAC 2023: Larger Context Improves Multilingual Coreference Resolution

1 code implementation24 Nov 2023 Milan Straka

We present CorPipe, the winning entry to the CRAC 2023 Shared Task on Multilingual Coreference Resolution.

coreference-resolution Language Modelling

DaMuEL: A Large Multilingual Dataset for Entity Linking

no code implementations15 Jun 2023 David Kubeša, Milan Straka

The dataset contains 27. 9M named entities in the knowledge base and 12. 3G tokens from Wikipedia texts.

Entity Linking

Quality and Efficiency of Manual Annotation: Pre-annotation Bias

no code implementations LREC 2022 Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková, Jan Hajič

This paper presents an analysis of annotation using an automatic pre-annotation for a mid-level annotation complexity task -- dependency syntax annotation.

Czech Grammar Error Correction with a Large and Diverse Corpus

no code implementations14 Jan 2022 Jakub Náplava, Milan Straka, Jana Straková, Alexandr Rosen

We introduce a large and diverse Czech corpus annotated for grammatical error correction (GEC) with the aim to contribute to the still scarce data resources in this domain for languages other than English.

Grammatical Error Correction

Character Transformations for Non-Autoregressive GEC Tagging

1 code implementation WNUT (ACL) 2021 Milan Straka, Jakub Náplava, Jana Straková

We propose a character-based nonautoregressive GEC approach, with automatically generated character transformations.

ÚFAL at MultiLexNorm 2021: Improving Multilingual Lexical Normalization by Fine-tuning ByT5

1 code implementation WNUT (ACL) 2021 David Samuel, Milan Straka

We present the winning entry to the Multilingual Lexical Normalization (MultiLexNorm) shared task at W-NUT 2021 (van der Goot et al., 2021a), which evaluates lexical-normalization systems on 12 social media datasets in 11 languages.

Dependency Parsing Language Modelling +1

Understanding Model Robustness to User-generated Noisy Texts

1 code implementation WNUT (ACL) 2021 Jakub Náplava, Martin Popel, Milan Straka, Jana Straková

We also compare two approaches to address the performance drop: a) training the NLP models with noised data generated by our framework; and b) reducing the input noise with external system for natural language correction.

Grammatical Error Correction Machine Translation +5

A matrix approach to detect temporal behavioral patterns at electric vehicle charging stations

no code implementations18 Feb 2021 Milan Straka, Lucia Piatriková, Peter van Bokhoven, Ľuboš Buzna

Based on the electric vehicle (EV) arrival times and the duration of EV connection to the charging station, we identify charging patterns and derive groups of charging stations with similar charging patterns applying two approaches.

Clustering

Reading Comprehension in Czech via Machine Translation and Cross-lingual Transfer

no code implementations3 Jul 2020 Kateřina Macková, Milan Straka

We report that a XLM-RoBERTa model trained on English data and evaluated on Czech achieves very competitive performance, only approximately 2 percent points worse than a~model trained on the translated Czech data.

Cross-Lingual Transfer Machine Translation +2

Prague Dependency Treebank -- Consolidated 1.0

no code implementations5 Jun 2020 Jan Hajič, Eduard Bejček, Jaroslava Hlaváčová, Marie Mikulová, Milan Straka, Jan Štěpánek, Barbora Štěpánková

We present a richly annotated and genre-diversified language resource, the Prague Dependency Treebank-Consolidated 1. 0 (PDT-C 1. 0), the purpose of which is - as it always been the case for the family of the Prague Dependency Treebanks - to serve both as a training data for various types of NLP tasks as well as for linguistically-oriented research.

Translation

UDPipe at EvaLatin 2020: Contextualized Embeddings and Treebank Embeddings

no code implementations LREC 2020 Milan Straka, Jana Straková

We present our contribution to the EvaLatin shared task, which is the first evaluation campaign devoted to the evaluation of NLP tools for Latin.

Lemmatization POS +1

Explaining the distribution of energy consumption at slow charging infrastructure for electric vehicles from socio-economic data

no code implementations2 Jun 2020 Milan Straka, Rui Carvalho, Gijs van der Poel, Ľuboš Buzna

We identified the most influential features correlated with energy consumption, indicating that the spatial context of the charging infrastructure affects its utilization pattern.

regression Variable Selection

Prague Dependency Treebank - Consolidated 1.0

no code implementations LREC 2020 Jan Haji{\v{c}}, Eduard Bej{\v{c}}ek, Jaroslava Hlavacova, Marie Mikulov{\'a}, Milan Straka, Jan {\v{S}}t{\v{e}}p{\'a}nek, Barbora {\v{S}}t{\v{e}}p{\'a}nkov{\'a}

We present a richly annotated and genre-diversified language resource, the Prague Dependency Treebank-Consolidated 1. 0 (PDT-C 1. 0), the purpose of which is - as it always been the case for the family of the Prague Dependency Treebanks - to serve both as a training data for various types of NLP tasks as well as for linguistically-oriented research.

Translation

MRP 2019: Cross-Framework Meaning Representation Parsing

no code implementations CONLL 2019 Stephan Oepen, Omri Abend, Jan Hajic, Daniel Hershcovich, Marco Kuhlmann, Tim O{'}Gorman, Nianwen Xue, Jayeol Chun, Milan Straka, Zdenka Uresova

The 2019 Shared Task at the Conference for Computational Language Learning (CoNLL) was devoted to Meaning Representation Parsing (MRP) across frameworks.

Sentence

\'UFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task

1 code implementation CONLL 2019 Milan Straka, Jana Strakov{\'a}

We present a system description of our contribution to the CoNLL 2019 shared task, CrossFramework Meaning Representation Parsing (MRP 2019).

Dependency Parsing Lemmatization +3

ÚFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task

1 code implementation24 Oct 2019 Milan Straka, Jana Straková

We present a system description of our contribution to the CoNLL 2019 shared task, Cross-Framework Meaning Representation Parsing (MRP 2019).

Dependency Parsing Lemmatization +3

Grammatical Error Correction in Low-Resource Scenarios

1 code implementation WS 2019 Jakub Náplava, Milan Straka

Grammatical error correction in English is a long studied problem with many existing systems and datasets.

Ranked #2 on Grammatical Error Correction on Falko-MERLIN (using extra training data)

Grammatical Error Correction Machine Translation +1

CUNI System for the Building Educational Applications 2019 Shared Task: Grammatical Error Correction

no code implementations WS 2019 Jakub Náplava, Milan Straka

In this paper, we describe our systems submitted to the Building Educational Applications (BEA) 2019 Shared Task (Bryant et al., 2019).

Grammatical Error Correction NMT

Czech Text Processing with Contextual Embeddings: POS Tagging, Lemmatization, Parsing and NER

no code implementations8 Sep 2019 Milan Straka, Jana Straková, Jan Hajič

We evaluate two meth ods for precomputing such embeddings, BERT and Flair, on four Czech text processing tasks: part-of-speech (POS) tagging, lemmatization, dependency pars ing and named entity recognition (NER).

Dependency Parsing Lemmatization +6

Evaluating Contextualized Embeddings on 54 Languages in POS Tagging, Lemmatization and Dependency Parsing

no code implementations20 Aug 2019 Milan Straka, Jana Straková, Jan Hajič

We present an extensive evaluation of three recently proposed methods for contextualized embeddings on 89 corpora in 54 languages of the Universal Dependencies 2. 3 in three tasks: POS tagging, lemmatization, and dependency parsing.

Dependency Parsing Lemmatization +3

Neural Architectures for Nested NER through Linearization

1 code implementation ACL 2019 Jana Straková, Milan Straka, Jan Hajič

We propose two neural network architectures for nested named entity recognition (NER), a setting in which named entities may overlap and also be labeled with more than one label.

Hard Attention named-entity-recognition +4

75 Languages, 1 Model: Parsing Universal Dependencies Universally

3 code implementations IJCNLP 2019 Dan Kondratyuk, Milan Straka

We present UDify, a multilingual multi-task model capable of accurately predicting universal part-of-speech, morphological features, lemmas, and dependency trees simultaneously for all 124 Universal Dependencies treebanks across 75 languages.

Dependency Parsing Zero-Shot Learning

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs

1 code implementation EMNLP 2018 Daniel Kondratyuk, Tom{\'a}{\v{s}} Gaven{\v{c}}iak, Milan Straka, Jan Haji{\v{c}}

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings.

Lemmatization Machine Translation +4

UDPipe 2.0 Prototype at CoNLL 2018 UD Shared Task

no code implementations CONLL 2018 Milan Straka

UDPipe is a trainable pipeline which performs sentence segmentation, tokenization, POS tagging, lemmatization and dependency parsing.

Dependency Parsing Lemmatization +5

CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies

no code implementations CONLL 2018 Daniel Zeman, Jan Haji{\v{c}}, Martin Popel, Martin Potthast, Milan Straka, Filip Ginter, Joakim Nivre, Slav Petrov

Every year, the Conference on Computational Natural Language Learning (CoNLL) features a shared task, in which participants train and test their learning systems on the same data sets.

Dependency Parsing Morphological Analysis +1

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs

2 code implementations10 Aug 2018 Daniel Kondratyuk, Tomáš Gavenčiak, Milan Straka, Jan Hajič

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings.

Lemmatization Part-Of-Speech Tagging +1

Tokenizing, POS Tagging, Lemmatizing and Parsing UD 2.0 with UDPipe

no code implementations CONLL 2017 Milan Straka, Jana Strakov{\'a}

A multilingual pipeline performing these steps can be trained using the Universal Dependencies project, which contains annotations of the described tasks for 50 languages in the latest release UD 2. 0.

Dependency Parsing Lemmatization +2

Neural Networks for Multi-Word Expression Detection

no code implementations WS 2017 Natalia Klyueva, Antoine Doucet, Milan Straka

In this paper we describe the MUMULS system that participated to the 2017 shared task on automatic identification of verbal multiword expressions (VMWEs).

Machine Translation

UDPipe: Trainable Pipeline for Processing CoNLL-U Files Performing Tokenization, Morphological Analysis, POS Tagging and Parsing

no code implementations LREC 2016 Milan Straka, Jan Haji{\v{c}}, Jana Strakov{\'a}

Automatic natural language processing of large texts often presents recurring challenges in multiple languages: even for most advanced tasks, the texts are first processed by basic processing steps {--} from tokenization to parsing.

Dependency Parsing Lemmatization +4

Merging Data Resources for Inflectional and Derivational Morphology in Czech

no code implementations LREC 2016 Zden{\v{e}}k {\v{Z}}abokrtsk{\'y}, Magda {\v{S}}ev{\v{c}}{\'\i}kov{\'a}, Milan Straka, Jon{\'a}{\v{s}} Vidra, Ad{\'e}la Limbursk{\'a}

The paper deals with merging two complementary resources of morphological data previously existing for Czech, namely the inflectional dictionary MorfFlex CZ and the recently developed lexical network DeriNet.

Lemmatization Morphological Analysis

Cannot find the paper you are looking for? You can Submit a new open access paper.