Search Results for author: Cibu Johny

Found 8 papers, 4 papers with code

Extensions to Brahmic script processing within the Nisaba library: new scripts, languages and utilities

no code implementations LREC 2022 Alexander Gutkin, Cibu Johny, Raiomond Doctor, Lawrence Wolf-Sonkin, Brian Roark

The Brahmic family of scripts is used to record some of the most spoken languages in the world and is arguably the most diverse family of writing systems.

Transliteration

Criteria for Useful Automatic Romanization in South Asian Languages

no code implementations LREC 2022 Isin Demirsahin, Cibu Johny, Alexander Gutkin, Brian Roark

This paper presents a number of possible criteria for systems that transliterate South Asian languages from their native scripts into the Latin script, a process known as romanization.

Beyond Arabic: Software for Perso-Arabic Script Manipulation

1 code implementation26 Jan 2023 Alexander Gutkin, Cibu Johny, Raiomond Doctor, Brian Roark, Richard Sproat

This paper presents an open-source software library that provides a set of finite-state transducer (FST) components and corresponding utilities for manipulating the writing systems of languages that use the Perso-Arabic script.

Transliteration

Graphemic Normalization of the Perso-Arabic Script

1 code implementation21 Oct 2022 Raiomond Doctor, Alexander Gutkin, Cibu Johny, Brian Roark, Richard Sproat

Since its original appearance in 1991, the Perso-Arabic script representation in Unicode has grown from 169 to over 440 atomic isolated characters spread over several code pages representing standard letters, various diacritics and punctuation for the original Arabic and numerous other regional orthographic traditions.

Language Modelling Machine Translation

Open-source Multi-speaker Speech Corpora for Building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu Speech Synthesis Systems

no code implementations LREC 2020 Fei He, Shan-Hui Cathy Chu, Oddur Kjartansson, Clara Rivera, Anna Katanova, Alex Gutkin, er, Isin Demirsahin, Cibu Johny, Martin Jansche, Supheakmungkol Sarin, Knot Pipatsrisawat

We present free high quality multi-speaker speech corpora for Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu, which are six of the twenty two official languages of India spoken by 374 million native speakers.

Speech Synthesis

Cannot find the paper you are looking for? You can Submit a new open access paper.