Lemmatization
61 papers with code • 0 benchmarks • 3 datasets
Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.
Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks
Benchmarks
These leaderboards are used to track progress in Lemmatization
Libraries
Use these libraries to find Lemmatization models and implementationsMost implemented papers
Imitation Learning for Neural Morphological String Transduction
We employ imitation learning to train a neural transition-based string transducer for morphological tasks such as inflection generation and lemmatization.
Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks
This paper describes our submission to CoNLL 2018 UD Shared Task.
LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs
We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings.
Joint Learning of POS and Dependencies for Multilingual Universal Dependency Parsing
This paper describes the system of team LeisureX in the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies.
Tree-Stack LSTM in Transition Based Dependency Parsing
We introduce tree-stack LSTM to model state of a transition based parser with recurrent neural networks.
NLP-Cube: End-to-End Raw Text Processing With Neural Networks
We introduce NLP-Cube: an end-to-end Natural Language Processing framework, evaluated in CoNLL{'}s {``}Multilingual Parsing from Raw Text to Universal Dependencies 2018{''} Shared Task.
Training Data Augmentation for Context-Sensitive Neural Lemmatization Using Inflection Tables and Raw Text
Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form.
Morphological parsing of low‑resource languages
It this paper we study morphological parsing and lemmatization on the material of Evenk and Selkup language.
Revisiting NMT for Normalization of Early English Letters
This paper studies the use of NMT (neural machine translation) as a normalization method for an early English letter corpus.
Training Data Augmentation for Context-Sensitive Neural Lemmatizer Using Inflection Tables and Raw Text
Lemmatization aims to reduce the sparse data problem by relating the inflected forms of a word to its dictionary form.