Transliteration

45 papers with code • 0 benchmarks • 5 datasets

Transliteration is a mechanism for converting a word in a source (foreign) language to a target language, and often adopts approaches from machine translation. In machine translation, the objective is to preserve the semantic meaning of the utterance as much as possible while following the syntactic structure in the target language. In Transliteration, the objective is to preserve the original pronunciation of the source word as much as possible while following the phonological structures of the target language.

For example, the city’s name “Manchester” has become well known by people of languages other than English. These new words are often named entities that are important in cross-lingual information retrieval, information extraction, machine translation, and often present out-of-vocabulary challenges to spoken language technologies such as automatic speech recognition, spoken keyword search, and text-to-speech.

Source: Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Latest papers with no code

Learning Cross-lingual Mappings for Data Augmentation to Improve Low-Resource Speech Recognition

no code yet • 14 Jun 2023

The results show that any source language ASR model can be used for a low-resource target language recognition followed by proposed mapping model.

Towards Transliteration between Sindhi Scripts from Devanagari to Perso-Arabic

no code yet • 12 May 2023

In this paper, we have shown a script conversion (transliteration) technique that converts Sindhi text in the Devanagari script to the Perso-Arabic script.

Investigating Lexical Sharing in Multilingual Machine Translation for Indian Languages

no code yet • 4 May 2023

Multilingual language models have shown impressive cross-lingual transfer ability across a diverse set of languages and tasks.

Romanization-based Large-scale Adaptation of Multilingual Language Models

no code yet • 18 Apr 2023

In order to boost the capacity of mPLMs to deal with low-resource and unseen languages, we explore the potential of leveraging transliteration on a massive scale.

Unsupervised Language agnostic WER Standardization

no code yet • 9 Mar 2023

However, WER fails to provide a fair evaluation of human perceived quality in presence of spelling variations, abbreviations, or compound words arising out of agglutination.

EPIK: Eliminating multi-model Pipelines with Knowledge-distillation

no code yet • 27 Nov 2022

The EPIK model has been distilled from the MATra model using this technique of knowledge distillation.

Benchmarking Evaluation Metrics for Code-Switching Automatic Speech Recognition

no code yet • 22 Nov 2022

Code-switching poses a number of challenges and opportunities for multilingual automatic speech recognition.

Towards Zero-Shot Code-Switched Speech Recognition

no code yet • 2 Nov 2022

In this work, we seek to build effective code-switched (CS) automatic speech recognition systems (ASR) under the zero-shot setting where no transcribed CS speech data is available for training.

DuDe: Dual-Decoder Multilingual ASR for Indian Languages using Common Label Set

no code yet • 30 Oct 2022

We also propose a novel architecture called Encoder-Decoder-Decoder for building multilingual systems that use both CLS and native script labels.

Gui at MixMT 2022 : English-Hinglish: An MT approach for translation of code mixed data

no code yet • 21 Oct 2022

The first task dealt with both Roman and Devanagari script as we had monolingual data in both English and Hindi whereas the second task only had data in Roman script.