Transliteration

45 papers with code • 0 benchmarks • 5 datasets

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Benchmarks

Add a Result

These leaderboards are used to track progress in Transliteration

No evaluation results yet. Help compare methods by submitting evaluation metrics.

Datasets

Latest papers

Most implemented Social Latest No code

Cross-Lingual Transfer from Related Languages: Treating Low-Resource Maltese as Multilingual Code-Switching

mlrs/malti • 30 Jan 2024

Although multilingual language models exhibit impressive cross-lingual transfer capabilities on unseen languages, the performance on downstream tasks is impacted when there is a script disparity with the languages used in the multilingual model's pre-training data.

30 Jan 2024

Paper
Code

TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models

cisnlp/translico • • 12 Jan 2024

As a result, mPLMs present a script barrier: representations from different scripts are located in different subspaces, which is a strong indicator of why crosslingual transfer involving languages of different scripts shows sub-optimal performance.

12 Jan 2024

Paper
Code

Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models

arbml/taqyim • 28 Jun 2023

Large language models (LLMs) have demonstrated impressive performance on various downstream tasks without requiring fine-tuning, including ChatGPT, a chat-based model built on top of LLMs such as GPT-3. 5 and GPT-4.

28 Jun 2023

Paper
Code

DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning

edwardclem/deepscribe • • 2 Jun 2023

The end-to-end pipeline achieves a top-5 classification accuracy of 0. 80.

02 Jun 2023

Paper
Code

Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration

is2ai/turkictts • • 25 May 2023

This work aims to build a multilingual text-to-speech (TTS) synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.

25 May 2023

Paper
Code

XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages

google-research/xtreme-up • • 19 May 2023

We evaluate commonly used models on the benchmark.

19 May 2023

Paper
Code

Beyond Arabic: Software for Perso-Arabic Script Manipulation

google-research/nisaba • 26 Jan 2023

This paper presents an open-source software library that provides a set of finite-state transducer (FST) components and corresponding utilities for manipulating the writing systems of languages that use the Perso-Arabic script.

26 Jan 2023

Paper
Code

A machine transliteration tool between Uzbek alphabets

ulugbeksalaev/uztransliterator • 19 May 2022

Machine transliteration, as defined in this paper, is a process of automatically transforming written script of words from a source alphabet into words of another target alphabet within the same language, while preserving their meaning, as well as pronunciation.

19 May 2022

Paper
Code