Search Results for author: Arturo Oncevay

Found 25 papers, 6 papers with code

The University of Edinburgh’s English-Tamil and English-Inuktitut Submissions to the WMT20 News Translation Task

no code implementations WMT (EMNLP) 2020 Rachel Bawden, Alexandra Birch, Radina Dobreva, Arturo Oncevay, Antonio Valerio Miceli Barone, Philip Williams

We describe the University of Edinburgh’s submissions to the WMT20 news translation shared task for the low resource language pair English-Tamil and the mid-resource language pair English-Inuktitut.

Language Modelling Machine Translation +1

Peru is Multilingual, Its Machine Translation Should Be Too?

1 code implementation NAACL (AmericasNLP) 2021 Arturo Oncevay

Peru is a multilingual country with a long history of contact between the indigenous languages and Spanish.

Machine Translation Translation

CLD² Language Documentation Meets Natural Language Processing for Revitalising Endangered Languages

no code implementations ComputEL (ACL) 2022 Roberto Zariquiey, Arturo Oncevay, Javier Vera

Language revitalisation should not be understood as a direct outcome of language documentation, which is mainly focused on the creation of language repositories.

UniMorph 4.0: Universal Morphology

no code implementations LREC 2022 Khuyagbaatar Batsuren, Omer Goldman, Salam Khalifa, Nizar Habash, Witold Kieraś, Gábor Bella, Brian Leonard, Garrett Nicolai, Kyle Gorman, Yustinus Ghanggo Ate, Maria Ryskina, Sabrina J. Mielke, Elena Budianskaya, Charbel El-Khaissi, Tiago Pimentel, Michael Gasser, William Lane, Mohit Raj, Matt Coler, Jaime Rafael Montoya Samame, Delio Siticonatzi Camaiteri, Benoît Sagot, Esaú Zumaeta Rojas, Didier López Francis, Arturo Oncevay, Juan López Bautista, Gema Celeste Silva Villegas, Lucas Torroba Hennigen, Adam Ek, David Guriel, Peter Dirix, Jean-Philippe Bernardy, Andrey Scherbakov, Aziyana Bayyr-ool, Antonios Anastasopoulos, Roberto Zariquiey, Karina Sheifer, Sofya Ganieva, Hilaria Cruz, Ritván Karahóǧa, Stella Markantonatou, George Pavlidis, Matvey Plugaryov, Elena Klyachko, Ali Salehi, Candy Angulo, Jatayu Baxi, Andrew Krizhanovsky, Natalia Krizhanovskaya, Elizabeth Salesky, Clara Vania, Sardana Ivanova, Jennifer White, Rowan Hall Maudslay, Josef Valvoda, Ran Zmigrod, Paula Czarnowska, Irene Nikkarinen, Aelita Salchak, Brijesh Bhatt, Christopher Straughn, Zoey Liu, Jonathan North Washington, Yuval Pinter, Duygu Ataman, Marcin Wolinski, Totok Suhardijanto, Anna Yablonskaya, Niklas Stoehr, Hossep Dolatian, Zahroh Nuriah, Shyam Ratan, Francis M. Tyers, Edoardo M. Ponti, Grant Aiton, Aryaman Arora, Richard J. Hatcher, Ritesh Kumar, Jeremiah Young, Daria Rodionova, Anastasia Yemelina, Taras Andrushko, Igor Marchenko, Polina Mashkovtseva, Alexandra Serova, Emily Prud'hommeaux, Maria Nepomniashchaya, Fausto Giunchiglia, Eleanor Chodroff, Mans Hulden, Miikka Silfverberg, Arya D. McCarthy, David Yarowsky, Ryan Cotterell, Reut Tsarfaty, Ekaterina Vylomova

The project comprises two major thrusts: a language-independent feature schema for rich morphological annotation and a type-level resource of annotated data in diverse languages realizing that schema.

Morphological Inflection

BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

no code implementations Findings (ACL) 2022 Manuel Mager, Arturo Oncevay, Elisabeth Mager, Katharina Kann, Ngoc Thang Vu

Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation.

Machine Translation Segmentation +1

Revisiting Neural Language Modelling with Syllables

no code implementations24 Oct 2020 Arturo Oncevay, Kervy Rivas Rojas

Language modelling is regularly analysed at word, subword or character units, but syllables are seldom used.

Language Modelling

Monolingual corpus creation and evaluation of truly low-resource languages from Peru

no code implementations WS 2020 Gina Bustamante, Arturo Oncevay

We introduce new monolingual corpora for four indigenous and endangered languages from Peru: Shipibo-konibo, Ashaninka, Yanesha and Yine.

Language Modelling

No Data to Crawl? Monolingual Corpus Creation from PDF Files of Truly low-Resource Languages in Peru

no code implementations LREC 2020 Gina Bustamante, Arturo Oncevay, Roberto Zariquiey

We introduce new monolingual corpora for four indigenous and endangered languages from Peru: Shipibo-konibo, Ashaninka, Yanesha and Yine.

Language Modelling

Bridging Linguistic Typology and Multilingual Machine Translation with Multi-View Language Representations

1 code implementation EMNLP 2020 Arturo Oncevay, Barry Haddow, Alexandra Birch

Sparse language vectors from linguistic typology databases and learned embeddings from tasks like multilingual machine translation have been investigated in isolation, without analysing how they could benefit from each other's language characterisation.

Clustering Machine Translation +1

CSI Peru News: finding the culprit, victim and location in news articles

no code implementations WS 2019 Gina Bustamante, Arturo Oncevay

We introduce a shift on the DS method over the domain of crime-related news from Peru, attempting to find the culprit, victim and location of a crime description from a RE perspective.

Toward Universal Dependencies for Shipibo-Konibo

no code implementations WS 2018 Alonso Vasquez, Renzo Ego Aguirre, C Angulo, y, John Miller, Claudia Villanueva, {\v{Z}}eljko Agi{\'c}, Roberto Zariquiey, Arturo Oncevay

We present an initial version of the Universal Dependencies (UD) treebank for Shipibo-Konibo, the first South American, Amazonian, Panoan and Peruvian language with a resource built under UD.

Dependency Parsing Machine Translation

Corpus Creation and Initial SMT Experiments between Spanish and Shipibo-konibo

no code implementations RANLP 2017 Ana-Paula Galarreta, Andr{\'e}s Melgar, Arturo Oncevay

In this paper, we present the first attempts to develop a machine translation (MT) system between Spanish and Shipibo-konibo (es-shp).

Machine Translation Translation

Spell-Checking based on Syllabification and Character-level Graphs for a Peruvian Agglutinative Language

no code implementations WS 2017 Carlo Alva, Arturo Oncevay

In this way, this spelling corrector is being developed based on two steps: an automatic rule-based syllabification method and a character-level graph to detect the degree of error in a misspelled word.

Cannot find the paper you are looking for? You can Submit a new open access paper.