Transliteration

46 papers with code • 0 benchmarks • 5 datasets

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Benchmarks

Add a Result

These leaderboards are used to track progress in Transliteration

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Most implemented papers

Most implemented Social Latest No code

Processing South Asian Languages Written in the Latin Script: the Dakshina Dataset

google-research-datasets/dakshina • LREC 2020

This paper describes the Dakshina dataset, a new resource consisting of text in both the Latin and native scripts for 12 South Asian languages.

Paper
Code

Leveraging Multilingual News Websites for Building a Kurdish Parallel Corpus

KurdishBLARK/InterdialectCorpus • 4 Oct 2020

We present a corpus containing 12, 327 translation pairs in the two major dialects of Kurdish, Sorani and Kurmanji.

Paper
Code

A Large-scale Evaluation of Neural Machine Transliteration for Indic Languages

anoopkunchukuttan/indic_transiteration_analysis • EACL 2021

We take up the task of large-scale evaluation of neural machine transliteration between English and Indic languages, with a focus on multilingual transliteration to utilize orthographic similarity between Indian languages.

Paper
Code

On Biasing Transformer Attention Towards Monotonicity

ZurichNLP/monotonicity_loss • NAACL 2021

Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining.

Paper
Code

Neural String Edit Distance

jlibovicky/neural-string-edit-distance • • spnlp (ACL) 2022

We propose the neural string edit distance model for string-pair matching and string transduction based on learnable string edit distance.

Paper
Code

Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages Study

yashkhem1/RelateLM • • ACL 2021

RelateLM uses transliteration to convert the unseen script of limited LRL text into the script of a Related Prominent Language (RPL) (Hindi in our case).

Paper
Code

Specializing Multilingual Language Models: An Empirical Study

ethch18/specializing-multilingual • • EMNLP (MRL) 2021

Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations.

Paper
Code

Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts

chaarangan/odl-tamil-sn • 24 Aug 2021

The experimental results showed that ULMFiT is the best model for this task.

Paper
Code

Cross-Lingual Text Classification of Transliterated Hindi and Malayalam

jitinkrishnan/transliteration-hindi-malayalam • • 31 Aug 2021

Transliteration is very common on social media, but transliterated text is not adequately handled by modern neural models for various NLP tasks.

Paper
Code

Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan Languages

ibm/indo-aryan-language-family-model • EMNLP 2021

We hypothesize and validate that multilingual fine-tuning of pre-trained language models can yield better performance on downstream NLP applications, compared to models fine-tuned on individual languages.

Paper
Code

Transliteration

Benchmarks Add a Result

Datasets

Most implemented papers

Content

Benchmarks

Add a Result