Lemmatization

61 papers with code • 0 benchmarks • 3 datasets

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Libraries

Use these libraries to find Lemmatization models and implementations
3 papers
144

Most implemented papers

Urdu Summary Corpus

humsha/USCorpus LREC 2016

This paper reports the construction of a benchmark corpus for Urdu summaries (abstracts) to facilitate the development and evaluation of single document summarization systems for Urdu language.

An Automated Text Categorization Framework based on Hyperparameter Optimization

INGEOTEC/microTC 6 Apr 2017

The compared datasets include several problems like topic and polarity classification, spam detection, user profiling and authorship attribution.

IUCM at SemEval-2018 Task 11: Similar-Topic Texts as a Comprehension Knowledge Source

sonyareznikova/semeval2018task11 SEMEVAL 2018

This paper describes the IUCM entry at SemEval-2018 Task 11, on machine comprehension using commonsense knowledge.

Resource-Size matters: Improving Neural Named Entity Recognition with Optimized Large Corpora

FID-Biodiversity/GermanWordEmbeddings-NER 26 Jul 2018

This study improves the performance of neural named entity recognition by a margin of up to 11% in F-score on the example of a low-resource language like German, thereby outperforming existing baselines and establishing a new state-of-the-art on each single open-source dataset.

Neural Transition-based String Transduction for Limited-Resource Setting in Morphology

ZurichNLP/coling2018-neural-transition-based-morphology COLING 2018

We present a neural transition-based model that uses a simple set of edit actions (copy, delete, insert) for morphological transduction tasks such as inflection generation, lemmatization, and reinflection.

From Text to Lexicon: Bridging the Gap between Word Embeddings and Lexical Resources

UKPLab/coling2018-wcs COLING 2018

We examine the effect of lemmatization and POS typing on word embedding performance in a novel resource-based evaluation scenario, as well as on standard similarity benchmarks.