Transliteration
45 papers with code • 0 benchmarks • 5 datasets
Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.
For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.
Benchmarks
These leaderboards are used to track progress in Transliteration
Latest papers with no code
Charles Translator: A Machine Translation System between Ukrainian and Czech
We present Charles Translator, a machine translation system between Ukrainian and Czech, developed as part of a society-wide effort to mitigate the impact of the Russian-Ukrainian war on individuals and society.
Cost-Performance Optimization for Processing Low-Resource Language Tasks Using Commercial LLMs
As means to reduce the number of tokens processed by the LLM, we consider code-mixing, translation, and transliteration of LRLs to HRLs.
Training a Bilingual Language Model by Mapping Tokens onto a Shared Character Space
We train a bilingual Arabic-Hebrew language model using a transliterated version of Arabic texts in Hebrew, to ensure both languages are represented in the same script.
Language Detection for Transliterated Content
The comprehensive exploration of transliteration dynamics supported by innovative approaches and cutting edge technologies like BERT, positions our research at the forefront of addressing unique challenges in the linguistic landscape of digital communication.
Code-Mixed Text to Speech Synthesis under Low-Resource Constraints
We further present an exhaustive evaluation of single-speaker adaptation and multi-speaker training with Tacotron2 + Waveglow setup to show that the former approach works better.
Character-Level Bangla Text-to-IPA Transcription Using Transformer Architecture with Sequence Alignment
The International Phonetic Alphabet (IPA) is indispensable in language learning and understanding, aiding users in accurate pronunciation and comprehension.
BenLLMEval: A Comprehensive Evaluation into the Potentials and Pitfalls of Large Language Models on Bengali NLP
Large Language Models (LLMs) have emerged as one of the most important breakthroughs in NLP for their impressive skills in language generation and other language-specific tasks.
Exploring Linguistic Similarity and Zero-Shot Learning for Multilingual Translation of Dravidian Languages
Pivot based neural machine translation is preferred over a single-encoder model for most settings despite the increased training and evaluation time.
Multilingual Neural Machine Translation System for Indic to Indic Languages
To achieve this, English- Indic (EN-IL) models are also developed, with and without the usage of related languages.
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
The results show that any source language ASR model can be used for a low-resource target language recognition followed by proposed mapping model.