Transliteration

45 papers with code • 0 benchmarks • 5 datasets

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Latest papers with no code

Charles Translator: A Machine Translation System between Ukrainian and Czech

no code yet • 10 Apr 2024

We present Charles Translator, a machine translation system between Ukrainian and Czech, developed as part of a society-wide effort to mitigate the impact of the Russian-Ukrainian war on individuals and society.

Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs

no code yet • 8 Mar 2024

As means to reduce the number of tokens processed by the LLM, we consider code-mixing, translation, and transliteration of LRLs to HRLs.

Training a Bilingual Language Model by Mapping Tokens onto a Shared Character Space

no code yet • 25 Feb 2024

We train a bilingual Arabic-Hebrew language model using a transliterated version of Arabic texts in Hebrew, to ensure both languages are represented in the same script.

Language Detection for Transliterated Content

no code yet • 9 Jan 2024

The comprehensive exploration of transliteration dynamics supported by innovative approaches and cutting edge technologies like BERT, positions our research at the forefront of addressing unique challenges in the linguistic landscape of digital communication.

Code-Mixed Text to Speech Synthesis under Low-Resource Constraints

no code yet • 2 Dec 2023

We further present an exhaustive evaluation of single-speaker adaptation and multi-speaker training with Tacotron2 + Waveglow setup to show that the former approach works better.

Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment

no code yet • 7 Nov 2023

The International Phonetic Alphabet (IPA) is indispensable in language learning and understanding, aiding users in accurate pronunciation and comprehension.

BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls of Large Language Models on Bengali NLP

no code yet • 22 Sep 2023

Large Language Models (LLMs) have emerged as one of the most important breakthroughs in NLP for their impressive skills in language generation and other language-specific tasks.

Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages

no code yet • 10 Aug 2023

Pivot based neural machine translation is preferred over a single-encoder model for most settings despite the increased training and evaluation time.

Multilingual Neural Machine Translation System for Indic to Indic Languages

no code yet • 22 Jun 2023

To achieve this, English- Indic (EN-IL) models are also developed, with and without the usage of related languages.

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

no code yet • 14 Jun 2023

The results show that any source language ASR model can be used for a low-resource target language recognition followed by proposed mapping model.