Transliteration

45 papers with code • 0 benchmarks • 5 datasets

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Cross-Lingual Transfer from Related Languages: Treating Low-Resource Maltese as Multilingual Code-Switching

mlrs/malti 30 Jan 2024

Although multilingual language models exhibit impressive cross-lingual transfer capabilities on unseen languages, the performance on downstream tasks is impacted when there is a script disparity with the languages used in the multilingual model's pre-training data.

1
30 Jan 2024

TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models

cisnlp/translico 12 Jan 2024

As a result, mPLMs present a script barrier: representations from different scripts are located in different subspaces, which is a strong indicator of why crosslingual transfer involving languages of different scripts shows sub-optimal performance.

4
12 Jan 2024

Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models

arbml/taqyim 28 Jun 2023

Large language models (LLMs) have demonstrated impressive performance on various downstream tasks without requiring fine-tuning, including ChatGPT, a chat-based model built on top of LLMs such as GPT-3. 5 and GPT-4.

18
28 Jun 2023

DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning

edwardclem/deepscribe 2 Jun 2023

The end-to-end pipeline achieves a top-5 classification accuracy of 0. 80.

7
02 Jun 2023

Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration

is2ai/turkictts 25 May 2023

This work aims to build a multilingual text-to-speech (TTS) synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.

38
25 May 2023

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

google-research/xtreme-up 19 May 2023

We evaluate commonly used models on the benchmark.

50
19 May 2023

Beyond Arabic: Software for Perso-Arabic Script Manipulation

google-research/nisaba 26 Jan 2023

This paper presents an open-source software library that provides a set of finite-state transducer (FST) components and corresponding utilities for manipulating the writing systems of languages that use the Perso-Arabic script.

28
26 Jan 2023

A machine transliteration tool between Uzbek alphabets

ulugbeksalaev/uztransliterator 19 May 2022

Machine transliteration, as defined in this paper, is a process of automatically transforming written script of words from a source alphabet into words of another target alphabet within the same language, while preserving their meaning, as well as pronunciation.

5
19 May 2022

Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users

ai4bharat/indicllmsuite 6 May 2022

Transliteration is very important in the Indian language context due to the usage of multiple scripts and the widespread use of romanized inputs.

67
06 May 2022

ParaNames: A Massively Multilingual Entity Name Corpus

bltlab/paranames NAACL (SIGTYP) 2022

We demonstrate an application of ParaNames by training a multilingual model for canonical name translation to and from English.

22
28 Feb 2022