Lemmatization

61 papers with code • 0 benchmarks • 3 datasets

Lemmatization is a process of determining a base or dictionary form (lemma) for a given surface form. Especially for languages with rich morphology it is important to be able to normalize words into their base forms to better support for example search engines and linguistic studies. Main difficulties in Lemmatization arise from encountering previously unseen words during inference time as well as disambiguating ambiguous surface forms which can be inflected variants of several different base forms depending on the context.

Source: Universal Lemmatizer: A Sequence to Sequence Model for Lemmatizing Universal Dependencies Treebanks

Libraries

Use these libraries to find Lemmatization models and implementations
3 papers
148

Latest papers with no code

On the Role of Morphological Information for Contextual Lemmatization

no code yet • 1 Feb 2023

Given that the process to obtain a lemma from an inflected word can be explained by looking at its morphosyntactic category, including fine-grained morphosyntactic information to train contextual lemmatizers has become common practice, without considering whether that is the optimum in terms of downstream performance.

Automated Identification of Disaster News For Crisis Management Using Machine Learning

no code yet • 24 Jan 2023

A lot of news sources picked up on Typhoon Rai (also known locally as Typhoon Odette), along with fake news outlets.

H2-Golden-Retriever: Methodology and Tool for an Evidence-Based Hydrogen Research Grantsmanship

no code yet • 16 Nov 2022

The Knowledge Graph module was used for the generation of meaningful entities and their relationships, trends and patterns in relevant H2 papers, thanks to an ontology of the hydrogen production domain.

Development of a rule-based lemmatization algorithm through Finite State Machine for Uzbek language

no code yet • 28 Oct 2022

This lemmatization consists of the general rules and a part of speech data of the Uzbek language, affixes, classification of affixes, removing affixes on the basis of the finite state machine for each class, as well as a definition of this word lemma.

Arabic Word-level Readability Visualization for Assisted Text Simplification

no code yet • 19 Oct 2022

This demo paper presents a Google Docs add-on for automatic Arabic word-level readability visualization.

Social Media Personal Event Notifier Using NLP and Machine Learning

no code yet • 10 Oct 2022

Social media apps have become very promising and omnipresent in daily life.

Context based lemmatizer for Polish language

no code yet • 23 Jul 2022

In computational linguistics, lemmatisation is the algorithmic process of determining the lemma of a word based on its intended meaning.

TArC: Tunisian Arabish Corpus First complete release

no code yet • 11 Jul 2022

In this paper we present the final result of a project on Tunisian Arabic encoded in Arabizi, the Latin-based writing system for digital conversations.

The 2021 Urdu Fake News Detection Task using Supervised Machine Learning and Feature Combinations

no code yet • 6 Apr 2022

Our submitted results ranked fifth in the competition.

Abusive and Threatening Language Detection in Urdu using Supervised Machine Learning and Feature Combinations

no code yet • 6 Apr 2022

This paper reports a non-exhaustive list of experiments that allowed us to reach the submitted results.