Transliteration

45 papers with code • 0 benchmarks • 5 datasets

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Does Transliteration Help Multilingual Language Modeling?

ibraheem-moosa/xlm-indic 29 Jan 2022

We empirically measure the effect of transliteration on MLLMs in this context.

2
29 Jan 2022

IIITT@Dravidian-CodeMix-FIRE2021: Transliterate or translate? Sentiment analysis of code-mixed text in Dravidian languages

karthikpuranik11/fire2021 15 Nov 2021

This research paper bestows a tiny contribution to this research in the form of sentiment analysis of code-mixed social media comments in the popular Dravidian languages Kannada, Tamil and Malayalam.

0
15 Nov 2021

Role of Language Relatedness in Multilingual Fine-tuning of Language Models: A Case Study in Indo-Aryan Languages

ibm/indo-aryan-language-family-model EMNLP 2021

We hypothesize and validate that multilingual fine-tuning of pre-trained language models can yield better performance on downstream NLP applications, compared to models fine-tuned on individual languages.

0
22 Sep 2021

Cross-Lingual Text Classification of Transliterated Hindi and Malayalam

jitinkrishnan/transliteration-hindi-malayalam 31 Aug 2021

Transliteration is very common on social media, but transliterated text is not adequately handled by modern neural models for various NLP tasks.

1
31 Aug 2021

Towards Offensive Language Identification for Tamil Code-Mixed YouTube Comments and Posts

chaarangan/odl-tamil-sn 24 Aug 2021

The experimental results showed that ULMFiT is the best model for this task.

0
24 Aug 2021

Specializing Multilingual Language Models: An Empirical Study

ethch18/specializing-multilingual EMNLP (MRL) 2021

Pretrained multilingual language models have become a common tool in transferring NLP capabilities to low-resource languages, often with adaptations.

1
16 Jun 2021

Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages Study

yashkhem1/RelateLM ACL 2021

RelateLM uses transliteration to convert the unseen script of limited LRL text into the script of a Related Prominent Language (RPL) (Hindi in our case).

3
07 Jun 2021

Sub-Character Tokenization for Chinese Pretrained Language Models

thunlp/subchartokenization 1 Jun 2021

2) Pronunciation-based SubChar tokenizers can encode Chinese homophones into the same transliteration sequences and produce the same tokenization output, hence being robust to homophone typos.

31
01 Jun 2021

Neural String Edit Distance

jlibovicky/neural-string-edit-distance spnlp (ACL) 2022

We propose the neural string edit distance model for string-pair matching and string transduction based on learnable string edit distance.

3
16 Apr 2021

On Biasing Transformer Attention Towards Monotonicity

ZurichNLP/monotonicity_loss NAACL 2021

Many sequence-to-sequence tasks in natural language processing are roughly monotonic in the alignment between source and target sequence, and previous work has facilitated or enforced learning of monotonic attention behavior via specialized attention functions or pretraining.

4
08 Apr 2021