Lemmatization

62 papers with code • 0 benchmarks • 3 datasets

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Benchmarks

Add a Result

These leaderboards are used to track progress in Lemmatization

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Lemmatization models and implementations

huspacy/huspacy

3 papers

149

Datasets

Most implemented papers

Most implemented Social Latest No code

CMU-01 at the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology

Aditi138/MorphologicalAnalysis • WS 2019

This paper presents the submission by the CMU-01 team to the SIGMORPHON 2019 task 2 of Morphological Analysis and Lemmatization in Context.

Paper
Code

Cross-Lingual Lemmatization and Morphology Tagging with Two-Stage Multilingual BERT Fine-Tuning

hyperparticle/udify • • WS 2019

We present our CHARLES-SAARLAND system for the SIGMORPHON 2019 Shared Task on Crosslinguality and Context in Morphology, in task 2, Morphological Analysis and Lemmatization in Context.

Paper
Code

Morpheus: A Neural Network for Jointly Learning Contextual Lemmatization and Morphological Tagging

erayyildiz/Morpheus • • WS 2019

In this study, we present Morpheus, a joint contextual lemmatizer and morphological tagger.

Paper
Code

Unsupervised Lemmatization as Embeddings-Based Word Clustering

ptakopysk/lemata • 22 Aug 2019

We focus on the task of unsupervised lemmatization, i. e. grouping together inflected forms of one word under one label (a lemma) without the use of annotated training data.

Paper
Code

ÚFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task

ufal/mrpipe-conll2019 • 24 Oct 2019

We present a system description of our contribution to the CoNLL 2019 shared task, Cross-Framework Meaning Representation Parsing (MRP 2019).

Paper
Code

\'UFAL MRPipe at MRP 2019: UDPipe Goes Semantic in the Meaning Representation Parsing Shared Task

ufal/mrpipe-conll2019 • CONLL 2019

We present a system description of our contribution to the CoNLL 2019 shared task, CrossFramework Meaning Representation Parsing (MRP 2019).

Paper
Code

Morphological Tagging and Lemmatization of Albanian: A Manually Annotated Corpus and Neural Models

NeldaKote/Albanian-POS • 2 Dec 2019

In this paper, we present the first publicly available part-of-speech and morphologically tagged corpus for the Albanian language, as well as a neural morphological tagger and lemmatizer trained on it.

Paper
Code

Grammatical gender associations outweigh topical gender bias in crosslinguistic word embeddings

kmccurdy/w2v-gender • 18 May 2020

Recent research has demonstrated that vector space models of semantics can reflect undesirable biases in human culture.

Paper
Code

The Frankfurt Latin Lexicon: From Morphological Expansion and Word Embeddings to SemioGraphs

texttechnologylab/SemioGraph • 21 May 2020

In this article we present the Frankfurt Latin Lexicon (FLL), a lexical resource for Medieval Latin that is used both for the lemmatization of Latin texts and for the post-editing of lemmatizations.

Paper
Code

Tagging and parsing of multidomain collections

AlexeySorokin/GramEval2020 • Proceedings of the International Conference “Dialogue 2020” 2020

In this paper we describe our submission to GramEval2020 competition on morphological tagging, lemmatization and dependency parsing.

Paper
Code

Lemmatization

Benchmarks Add a Result

Libraries

Datasets

Most implemented papers

Content

Benchmarks

Add a Result