Lemmatization

61 papers with code • 0 benchmarks • 3 datasets

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Libraries

Use these libraries to find Lemmatization models and implementations
3 papers
148

Most implemented papers

Imitation Learning for Neural Morphological String Transduction

ZurichNLP/emnlp2018-imitation-learning-for-neural-morphology EMNLP 2018

We employ imitation learning to train a neural transition-based string transducer for morphological tasks such as inflection generation and lemmatization.

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs

hyperparticle/LemmaTag EMNLP 2018

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings.

Joint Learning of POS and Dependencies for Multilingual Universal Dependency Parsing

bcmi220/joint_stackptr CONLL 2018

This paper describes the system of team LeisureX in the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies.

Tree-Stack LSTM in Transition Based Dependency Parsing

kirnap/ku-dependency-parser2 CONLL 2018

We introduce tree-stack LSTM to model state of a transition based parser with recurrent neural networks.

NLP-Cube: End-to-End Raw Text Processing With Neural Networks

adobe/NLP-Cube CONLL 2018

We introduce NLP-Cube: an end-to-end Natural Language Processing framework, evaluated in CoNLL{'}s {``}Multilingual Parsing from Raw Text to Universal Dependencies 2018{''} Shared Task.

Training Data Augmentation for Context-Sensitive Neural Lemmatization Using Inflection Tables and Raw Text

tomsbergmanis/data_augumentation_um_wiki 2 Apr 2019

Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form.

Morphological parsing of low‑resource languages

AlexeySorokin/NeuralMorphoTagger1 Dialogue 2019 conference 2019

It this paper we study morphological parsing and lemmatization on the material of Evenk and Selkup language.

Revisiting NMT for Normalization of Early English Letters

mikahama/natas WS 2019

This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus.

Training Data Augmentation for Context-Sensitive Neural Lemmatizer Using Inflection Tables and Raw Text

tomsbergmanis/data_augumentation_um_wiki NAACL 2019

Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form.