Lemmatization

26 papers with code · Natural Language Processing

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Benchmarks

No evaluation results yet. Help compare methods by submit evaluation metrics.

Greatest papers with code

Top2Vec: Distributed Representations of Topics

19 Aug 2020ddangelov/Top2Vec

Distributed representations of documents and words have gained popularity due to their ability to capture semantics of words and documents.

LEMMATIZATION SEMANTIC SIMILARITY SEMANTIC TEXTUAL SIMILARITY

NLP-Cube: End-to-End Raw Text Processing With Neural Networks

CONLL 2018 adobe/NLP-Cube

We introduce NLP-Cube: an end-to-end Natural Language Processing framework, evaluated in CoNLL{'}s {``}Multilingual Parsing from Raw Text to Universal Dependencies 2018{''} Shared Task.

LEMMATIZATION TOKENIZATION

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs

EMNLP 2018 hyperparticle/LemmaTag

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings.

LEMMATIZATION MACHINE TRANSLATION PART-OF-SPEECH TAGGING SEMANTIC ROLE LABELING SENTIMENT ANALYSIS

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically-Rich Languages with BRNNs

10 Aug 2018hyperparticle/LemmaTag

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings.

LEMMATIZATION PART-OF-SPEECH TAGGING

Revisiting NMT for Normalization of Early English Letters

WS 2019 mikahama/natas

This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus.

LEMMATIZATION MACHINE TRANSLATION