Lemmatization

61 papers with code • 0 benchmarks • 3 datasets

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Benchmarks

Add a Result

These leaderboards are used to track progress in Lemmatization

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Libraries

Use these libraries to find Lemmatization models and implementations

huspacy/huspacy

3 papers

148

Datasets

Most implemented papers

Most implemented Social Latest No code

Imitation Learning for Neural Morphological String Transduction

ZurichNLP/emnlp2018-imitation-learning-for-neural-morphology • EMNLP 2018

We employ imitation learning to train a neural transition-based string transducer for morphological tasks such as inflection generation and lemmatization.

Paper
Code

Towards JointUD: Part-of-speech Tagging and Lemmatization using Recurrent Neural Networks

YerevaNN/JointUD • • CONLL 2018

This paper describes our submission to CoNLL 2018 UD Shared Task.

Paper
Code

LemmaTag: Jointly Tagging and Lemmatizing for Morphologically Rich Languages with BRNNs

hyperparticle/LemmaTag • • EMNLP 2018

We present LemmaTag, a featureless neural network architecture that jointly generates part-of-speech tags and lemmas for sentences by using bidirectional RNNs with character-level and word-level embeddings.

Paper
Code

Joint Learning of POS and Dependencies for Multilingual Universal Dependency Parsing

bcmi220/joint_stackptr • • CONLL 2018

This paper describes the system of team LeisureX in the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies.

Paper
Code

Tree-Stack LSTM in Transition Based Dependency Parsing

kirnap/ku-dependency-parser2 • CONLL 2018

We introduce tree-stack LSTM to model state of a transition based parser with recurrent neural networks.

Paper
Code

NLP-Cube: End-to-End Raw Text Processing With Neural Networks

adobe/NLP-Cube • • CONLL 2018

We introduce NLP-Cube: an end-to-end Natural Language Processing framework, evaluated in CoNLL{'}s {``}Multilingual Parsing from Raw Text to Universal Dependencies 2018{''} Shared Task.

Paper
Code