Transliteration
45 papers with code • 0 benchmarks • 5 datasets
Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.
For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.
Benchmarks
These leaderboards are used to track progress in Transliteration
Latest papers with no code
Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition
The results show that any source language ASR model can be used for a low-resource target language recognition followed by proposed mapping model.
Towards Transliteration between Sindhi Scripts from Devanagari to Perso-Arabic
In this paper, we have shown a script conversion (transliteration) technique that converts Sindhi text in the Devanagari script to the Perso-Arabic script.
Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages
Multilingual language models have shown impressive cross-lingual transfer ability across a diverse set of languages and tasks.
Romanization-based Large-scale Adaptation of Multilingual Language Models
In order to boost the capacity of mPLMs to deal with low-resource and unseen languages, we explore the potential of leveraging transliteration on a massive scale.
Unsupervised Language agnostic WER Standardization
However, WER fails to provide a fair evaluation of human perceived quality in presence of spelling variations, abbreviations, or compound words arising out of agglutination.
EPIK: Eliminating multi-model Pipelines with Knowledge-distillation
The EPIK model has been distilled from the MATra model using this technique of knowledge distillation.
Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition
Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition.
Towards Zero-Shot Code-Switched Speech Recognition
In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS speech data is available for training.
DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common Label Set
We also propose a novel architecture called Encoder-Decoder-Decoder for building multilingual systems that use both CLS and native script labels.
Gui at MixMT 2022 : English-Hinglish: An MT approach for translation of code mixed data
The first task dealt with both Roman and Devanagari script as we had monolingual data in both English and Hindi whereas the second task only had data in Roman script.