Lemmatization

61 papers with code • 0 benchmarks • 3 datasets

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Libraries

Use these libraries to find Lemmatization models and implementations
3 papers
149

Most implemented papers

CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology

Aditi138/MorphologicalAnalysis WS 2019

This paper presents the submission by the CMU-01 team to the SIGMORPHON 2019 task 2 of Morphological Analysis and Lemmatization in Context.

Cross-Lingual Lemmatization and Morphology Tagging with Two-Stage Multilingual BERT Fine-Tuning

hyperparticle/udify WS 2019

We present our CHARLES-SAARLAND system for the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology, in task 2, Morphological Analysis and Lemmatization in Context.

Morpheus: A Neural Network for Jointly Learning Contextual Lemmatization and Morphological Tagging

erayyildiz/Morpheus WS 2019

In this study, we present Morpheus, a joint contextual lemmatizer and morphological tagger.

Unsupervised Lemmatization as Embeddings-Based Word Clustering

ptakopysk/lemata 22 Aug 2019

We focus on the task of unsupervised lemmatization, i. e. grouping together inflected forms of one word under one label (a lemma) without the use of annotated training data.

ÚFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task

ufal/mrpipe-conll2019 24 Oct 2019

We present a system description of our contribution to the CoNLL 2019 shared task, Cross-Framework Meaning Representation Parsing (MRP 2019).

\'UFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task

ufal/mrpipe-conll2019 CONLL 2019

We present a system description of our contribution to the CoNLL 2019 shared task, CrossFramework Meaning Representation Parsing (MRP 2019).

Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models

NeldaKote/Albanian-POS 2 Dec 2019

In this paper, we present the first publicly available part-of-speech and morphologically tagged corpus for the Albanian language, as well as a neural morphological tagger and lemmatizer trained on it.

Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings

kmccurdy/w2v-gender 18 May 2020

Recent research has demonstrated that vector space models of semantics can reflect undesirable biases in human culture.

The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

texttechnologylab/SemioGraph 21 May 2020

In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations.

Tagging and parsing of multidomain collections

AlexeySorokin/GramEval2020 Proceedings of the International Conference “Dialogue 2020” 2020

In this paper we describe our submission to GramEval2020 competition on morphological tagging, lemmatization and dependency parsing.