Lemmatization
62 papers with code • 0 benchmarks • 3 datasets
Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.
Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks
Benchmarks
These leaderboards are used to track progress in Lemmatization
Libraries
Use these libraries to find Lemmatization models and implementationsMost implemented papers
CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology
This paper presents the submission by the CMU-01 team to the SIGMORPHON 2019 task 2 of Morphological Analysis and Lemmatization in Context.
Cross-Lingual Lemmatization and Morphology Tagging with Two-Stage Multilingual BERT Fine-Tuning
We present our CHARLES-SAARLAND system for the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology, in task 2, Morphological Analysis and Lemmatization in Context.
Morpheus: A Neural Network for Jointly Learning Contextual Lemmatization and Morphological Tagging
In this study, we present Morpheus, a joint contextual lemmatizer and morphological tagger.
Unsupervised Lemmatization as Embeddings-Based Word Clustering
We focus on the task of unsupervised lemmatization, i. e. grouping together inflected forms of one word under one label (a lemma) without the use of annotated training data.
ÚFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task
We present a system description of our contribution to the CoNLL 2019 shared task, Cross-Framework Meaning Representation Parsing (MRP 2019).
\'UFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task
We present a system description of our contribution to the CoNLL 2019 shared task, CrossFramework Meaning Representation Parsing (MRP 2019).
Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models
In this paper, we present the first publicly available part-of-speech and morphologically tagged corpus for the Albanian language, as well as a neural morphological tagger and lemmatizer trained on it.
Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings
Recent research has demonstrated that vector space models of semantics can reflect undesirable biases in human culture.
The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs
In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations.
Tagging and parsing of multidomain collections
In this paper we describe our submission to GramEval2020 competition on morphological tagging, lemmatization and dependency parsing.