no code implementations • LREC 2022 • Alexander Gutkin, Cibu Johny, Raiomond Doctor, Lawrence Wolf-Sonkin, Brian Roark
The Brahmic family of scripts is used to record some of the most spoken languages in the world and is arguably the most diverse family of writing systems.
1 code implementation • 26 Jan 2023 • Alexander Gutkin, Cibu Johny, Raiomond Doctor, Brian Roark, Richard Sproat
This paper presents an open-source software library that provides a set of finite-state transducer (FST) components and corresponding utilities for manipulating the writing systems of languages that use the Perso-Arabic script.
1 code implementation • 21 Oct 2022 • Raiomond Doctor, Alexander Gutkin, Cibu Johny, Brian Roark, Richard Sproat
Since its original appearance in 1991, the Perso-Arabic script representation in Unicode has grown from 169 to over 440 atomic isolated characters spread over several code pages representing standard letters, various diacritics and punctuation for the original Arabic and numerous other regional orthographic traditions.