Transliteration
45 papers with code • 0 benchmarks • 5 datasets
Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.
For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.
Benchmarks
These leaderboards are used to track progress in Transliteration
Latest papers
Cross-Lingual Transfer from Related Languages: Treating Low-Resource Maltese as Multilingual Code-Switching
Although multilingual language models exhibit impressive cross-lingual transfer capabilities on unseen languages, the performance on downstream tasks is impacted when there is a script disparity with the languages used in the multilingual model's pre-training data.
TransliCo: A Contrastive Learning Framework to Address the Script Barrier in Multilingual Pretrained Language Models
As a result, mPLMs present a script barrier: representations from different scripts are located in different subspaces, which is a strong indicator of why crosslingual transfer involving languages of different scripts shows sub-optimal performance.
Taqyim: Evaluating Arabic NLP Tasks Using ChatGPT Models
Large language models (LLMs) have demonstrated impressive performance on various downstream tasks without requiring fine-tuning, including ChatGPT, a chat-based model built on top of LLMs such as GPT-3. 5 and GPT-4.
DeepScribe: Localization and Classification of Elamite Cuneiform Signs Via Deep Learning
The end-to-end pipeline achieves a top-5 classification accuracy of 0. 80.
Multilingual Text-to-Speech Synthesis for Turkic Languages Using Transliteration
This work aims to build a multilingual text-to-speech (TTS) synthesis system for ten lower-resourced Turkic languages: Azerbaijani, Bashkir, Kazakh, Kyrgyz, Sakha, Tatar, Turkish, Turkmen, Uyghur, and Uzbek.
XTREME-UP: A User-Centric Scarce-Data Benchmark for Under-Represented Languages
We evaluate commonly used models on the benchmark.
Beyond Arabic: Software for Perso-Arabic Script Manipulation
This paper presents an open-source software library that provides a set of finite-state transducer (FST) components and corresponding utilities for manipulating the writing systems of languages that use the Perso-Arabic script.
A machine transliteration tool between Uzbek alphabets
Machine transliteration, as defined in this paper, is a process of automatically transforming written script of words from a source alphabet into words of another target alphabet within the same language, while preserving their meaning, as well as pronunciation.
Aksharantar: Open Indic-language Transliteration datasets and models for the Next Billion Users
Transliteration is very important in the Indian language context due to the usage of multiple scripts and the widespread use of romanized inputs.
ParaNames: A Massively Multilingual Entity Name Corpus
We demonstrate an application of ParaNames by training a multilingual model for canonical name translation to and from English.