Search Results for author: Katharina Kann

Found 73 papers, 13 papers with code

Machine Translation Between High-resource Languages in a Language Documentation Setting

no code implementations FieldMatters (COLING) 2022 Katharina Kann, Abteen Ebrahimi, Kristine Stenzel, Alexis Palmer

This translation task is challenging for multiple reasons: (1) the data is out-of-domain with respect to the MT system’s training data, (2) much of the data is conversational, (3) existing translations include non-standard and uncommon expressions, often reflecting properties of the documented language, and (4) the data includes borrowings from other regional languages.

Machine Translation Translation +1

IGT2P: From Interlinear Glossed Texts to Paradigms

no code implementations EMNLP 2020 Sarah Moeller, Ling Liu, Changbing Yang, Katharina Kann, Mans Hulden

An intermediate step in the linguistic analysis of an under-documented language is to find and organize inflected forms that are attested in natural speech.

POS

Findings of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering

no code implementations ACL (SIGMORPHON) 2021 Adam Wiemerslage, Arya D. McCarthy, Alexander Erdmann, Garrett Nicolai, Manex Agirrezabal, Miikka Silfverberg, Mans Hulden, Katharina Kann

We describe the second SIGMORPHON shared task on unsupervised morphology: the goal of the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering is to cluster word types from a raw text corpus into paradigms.

Clustering

Paradigm Clustering with Weighted Edit Distance

no code implementations ACL (SIGMORPHON) 2021 Andrew Gerlach, Adam Wiemerslage, Katharina Kann

This paper describes our system for the SIGMORPHON 2021 Shared Task on Unsupervised Morphological Paradigm Clustering, which asks participants to group inflected forms together according their underlying lemma without the aid of annotated training data.

Clustering LEMMA +1

Morphological Processing of Low-Resource Languages: Where We Are and What’s Next

no code implementations Findings (ACL) 2022 Adam Wiemerslage, Miikka Silfverberg, Changbing Yang, Arya McCarthy, Garrett Nicolai, Eliana Colunga, Katharina Kann

Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages.

On the Automatic Generation and Simplification of Children's Stories

no code implementations27 Oct 2023 Maria Valentini, Jennifer Weber, Jesus Salcido, Téa Wright, Eliana Colunga, Katharina Kann

With recent advances in large language models (LLMs), the concept of automatically generating children's educational materials has become increasingly realistic.

Lexical Simplification

An Investigation of Noise in Morphological Inflection

1 code implementation26 May 2023 Adam Wiemerslage, Changbing Yang, Garrett Nicolai, Miikka Silfverberg, Katharina Kann

We aim at closing this gap by investigating the types of noise encountered within a pipeline for truly unsupervised morphological paradigm completion and its impact on morphological inflection systems: First, we propose an error taxonomy and annotation pipeline for inflection training data.

Language Modelling Masked Language Modeling +1

Mind the Knowledge Gap: A Survey of Knowledge-enhanced Dialogue Systems

no code implementations19 Dec 2022 Sagi Shaier, Lawrence Hunter, Katharina Kann

Many dialogue systems (DSs) lack characteristics humans have, such as emotion perception, factuality, and informativeness.

Informativeness

A Major Obstacle for NLP Research: Let's Talk about Time Allocation!

no code implementations30 Nov 2022 Katharina Kann, Shiran Dudy, Arya D. McCarthy

The field of natural language processing (NLP) has grown over the last few years: conferences have become larger, we have published an incredible amount of papers, and state-of-the-art research has been implemented in a large variety of customer-facing products.

A Comprehensive Comparison of Neural Networks as Cognitive Models of Inflection

no code implementations22 Oct 2022 Adam Wiemerslage, Shiran Dudy, Katharina Kann

Neural networks have long been at the center of a debate around the cognitive mechanism by which humans process inflectional morphology.

Morphological Inflection

Morphological Processing of Low-Resource Languages: Where We Are and What's Next

no code implementations16 Mar 2022 Adam Wiemerslage, Miikka Silfverberg, Changbing Yang, Arya D. McCarthy, Garrett Nicolai, Eliana Colunga, Katharina Kann

Automatic morphological processing can aid downstream natural language processing applications, especially for low-resource languages, and assist language documentation efforts for endangered languages.

BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

no code implementations Findings (ACL) 2022 Manuel Mager, Arturo Oncevay, Elisabeth Mager, Katharina Kann, Ngoc Thang Vu

Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation.

Machine Translation Segmentation +1

Findings of the LoResMT 2021 Shared Task on COVID and Sign Language for Low-resource Languages

no code implementations MTSummit 2021 Atul Kr. Ojha, Chao-Hong Liu, Katharina Kann, John Ortega, Sheetal Shatam, Theodorus Fransen

Maximum system performance was computed using BLEU and follow as 36. 0 for English--Irish, 34. 6 for Irish--English, 24. 2 for English--Marathi, and 31. 3 for Marathi--English.

Machine Translation Translation

Don't Rule Out Monolingual Speakers: A Method For Crowdsourcing Machine Translation Data

no code implementations ACL 2021 Rajat Bhatnagar, Ananya Ganesh, Katharina Kann

Based on the insight that humans pay specific attention to movements, we use graphics interchange formats (GIFs) as a pivot to collect parallel sentences from monolingual annotators.

Machine Translation Sentence +1

What Would a Teacher Do? Predicting Future Talk Moves

no code implementations Findings (ACL) 2021 Ananya Ganesh, Martha Palmer, Katharina Kann

Recent advances in natural language processing (NLP) have the ability to transform how classroom learning takes place.

Question Answering

PROST: Physical Reasoning of Objects through Space and Time

1 code implementation7 Jun 2021 Stéphane Aroca-Ouellette, Cory Paik, Alessandro Roncone, Katharina Kann

We present a new probing dataset named PROST: Physical Reasoning about Objects Through Space and Time.

Multiple-choice

How to Adapt Your Pretrained Multilingual Model to 1600 Languages

no code implementations ACL 2021 Abteen Ebrahimi, Katharina Kann

Pretrained multilingual models (PMMs) enable zero-shot learning via cross-lingual transfer, performing best for languages seen during pretraining.

Cross-Lingual Transfer NER +3

CLiMP: A Benchmark for Chinese Language Model Evaluation

no code implementations EACL 2021 Beilei Xiang, Changbing Yang, Yu Li, Alex Warstadt, Katharina Kann

CLiMP consists of sets of 1, 000 minimal pairs (MPs) for 16 syntactic contrasts in Mandarin, covering 9 major Mandarin linguistic phenomena.

Language Modelling

Acrostic Poem Generation

no code implementations EMNLP 2020 Rajat Agarwal, Katharina Kann

We propose a new task in the area of computational creativity: acrostic poem generation in English.

Language Modelling

Frustratingly Easy Multilingual Grapheme-to-Phoneme Conversion

no code implementations WS 2020 Nikhil Prabhu, Katharina Kann

In this paper, we describe two CU-Boulder submissions to the SIGMORPHON 2020 Task 1 on multilingual grapheme-to-phoneme conversion (G2P).

The IMS--CUBoulder System for the SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion

no code implementations WS 2020 Manuel Mager, Katharina Kann

In this paper, we present the systems of the University of Stuttgart IMS and the University of Colorado Boulder (IMS--CUBoulder) for SIGMORPHON 2020 Task 2 on unsupervised morphological paradigm completion (Kann et al., 2020).

Task 2

The NYU-CUBoulder Systems for SIGMORPHON 2020 Task 0 and Task 2

no code implementations WS 2020 Assaf Singer, Katharina Kann

Second, as inflected forms share most characters with the lemma, we further propose a pointer-generator transformer model to allow easy copying of input characters.

LEMMA Morphological Inflection +1

The SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion

no code implementations WS 2020 Katharina Kann, Arya McCarthy, Garrett Nicolai, Mans Hulden

In this paper, we describe the findings of the SIGMORPHON 2020 shared task on unsupervised morphological paradigm completion (SIGMORPHON 2020 Task 2), a novel task in the field of inflectional morphology.

LEMMA Task 2

Self-Training for Unsupervised Parsing with PRPN

no code implementations WS 2020 Anhad Mohananey, Katharina Kann, Samuel R. Bowman

To be able to use our model's predictions during training, we extend a recent neural UP architecture, the PRPN (Shen et al., 2018a) such that it can be trained in a semi-supervised fashion.

Language Modelling

English Intermediate-Task Training Improves Zero-Shot Cross-Lingual Transfer Too

no code implementations Asian Chapter of the Association for Computational Linguistics 2020 Jason Phang, Iacer Calixto, Phu Mon Htut, Yada Pruksachatkun, Haokun Liu, Clara Vania, Katharina Kann, Samuel R. Bowman

Intermediate-task training---fine-tuning a pretrained model on an intermediate task before fine-tuning again on the target task---often improves model performance substantially on language understanding tasks in monolingual English settings.

Question Answering Retrieval +3

The IMS-CUBoulder System for the SIGMORPHON 2020 Shared Task on Unsupervised Morphological Paradigm Completion

no code implementations25 May 2020 Manuel Mager, Katharina Kann

In this paper, we present the systems of the University of Stuttgart IMS and the University of Colorado Boulder (IMS-CUBoulder) for SIGMORPHON 2020 Task 2 on unsupervised morphological paradigm completion (Kann et al., 2020).

Task 2

Learning to Learn Morphological Inflection for Resource-Poor Languages

no code implementations28 Apr 2020 Katharina Kann, Samuel R. Bowman, Kyunghyun Cho

We propose to cast the task of morphological inflection - mapping a lemma to an indicated inflected form - for resource-poor languages as a meta-learning problem.

Cross-Lingual Transfer LEMMA +2

Weakly Supervised POS Taggers Perform Poorly on Truly Low-Resource Languages

no code implementations28 Apr 2020 Katharina Kann, Ophélie Lacroix, Anders Søgaard

Part-of-speech (POS) taggers for low-resource languages which are exclusively based on various forms of weak supervision - e. g., cross-lingual transfer, type-level supervision, or a combination thereof - have been reported to perform almost as well as supervised ones.

Cross-Lingual Transfer POS +1

Neural Unsupervised Parsing Beyond English

no code implementations WS 2019 Katharina Kann, Anhad Mohananey, Samuel R. Bowman, Kyunghyun Cho

Recently, neural network models which automatically infer syntactic structure from raw text have started to achieve promising results.

Acquisition of Inflectional Morphology in Artificial Neural Networks With Prior Knowledge

no code implementations SCiL 2020 Katharina Kann

How does knowledge of one language's morphology influence learning of inflection rules in a second one?

Towards Realistic Practices In Low-Resource Natural Language Processing: The Development Set

no code implementations IJCNLP 2019 Katharina Kann, Kyunghyun Cho, Samuel R. Bowman

Here, we aim to answer the following questions: Does using a development set for early stopping in the low-resource setting influence results as compared to a more realistic alternative, where the number of training epochs is tuned on development languages?

Transductive Auxiliary Task Self-Training for Neural Multi-Task Models

no code implementations WS 2019 Johannes Bjerva, Katharina Kann, Isabelle Augenstein

Multi-task learning and self-training are two common ways to improve a machine learning model's performance in settings with limited training data.

Multi-Task Learning

Subword-Level Language Identification for Intra-Word Code-Switching

no code implementations NAACL 2019 Manuel Mager, Özlem Çetinoğlu, Katharina Kann

Language identification for code-switching (CS), the phenomenon of alternating between two or more languages in conversations, has traditionally been approached under the assumption of a single language per token.

Language Identification

Verb Argument Structure Alternations in Word and Sentence Embeddings

no code implementations WS 2019 Katharina Kann, Alex Warstadt, Adina Williams, Samuel R. Bowman

For converging evidence, we further construct LaVA, a corresponding word-level dataset, and investigate whether the same syntactic features can be extracted from word embeddings.

Sentence Sentence Embedding +2

The CoNLL--SIGMORPHON 2018 Shared Task: Universal Morphological Reinflection

no code implementations CONLL 2018 Ryan Cotterell, Christo Kirov, John Sylak-Glassman, Géraldine Walther, Ekaterina Vylomova, Arya D. McCarthy, Katharina Kann, Sabrina J. Mielke, Garrett Nicolai, Miikka Silfverberg, David Yarowsky, Jason Eisner, Mans Hulden

Apart from extending the number of languages involved in earlier supervised tasks of generating inflected forms, this year the shared task also featured a new second task which asked participants to inflect words in sentential context, similar to a cloze task.

LEMMA Task 2

Sentence-Level Fluency Evaluation: References Help, But Can Be Spared!

no code implementations CONLL 2018 Katharina Kann, Sascha Rothe, Katja Filippova

Motivated by recent findings on the probabilistic modeling of acceptability judgments, we propose syntactic log-odds ratio (SLOR), a normalized language model score, as a metric for referenceless fluency evaluation of natural language generation output at the sentence level.

Language Modelling Sentence +1

Fortification of Neural Morphological Segmentation Models for Polysynthetic Minimal-Resource Languages

no code implementations NAACL 2018 Katharina Kann, Manuel Mager, Ivan Meza-Ruiz, Hinrich Schütze

Morphological segmentation for polysynthetic languages is challenging, because a word may consist of many individual morphemes and training data can be extremely scarce.

Cross-Lingual Transfer Data Augmentation +1

Unlabeled Data for Morphological Generation With Character-Based Sequence-to-Sequence Models

no code implementations WS 2017 Katharina Kann, Hinrich Schütze

We present a semi-supervised way of training a character-based encoder-decoder recurrent neural network for morphological reinflection, the task of generating one inflected word form from another.

One-Shot Neural Cross-Lingual Transfer for Paradigm Completion

no code implementations ACL 2017 Katharina Kann, Ryan Cotterell, Hinrich Schütze

We present a novel cross-lingual transfer method for paradigm completion, the task of mapping a lemma to its inflected forms, using a neural encoder-decoder model, the state of the art for the monolingual task.

Cross-Lingual Transfer LEMMA +1

Comparative Study of CNN and RNN for Natural Language Processing

4 code implementations7 Feb 2017 Wenpeng Yin, Katharina Kann, Mo Yu, Hinrich Schütze

Deep neural networks (DNN) have revolutionized the field of natural language processing (NLP).

Position

Neural Multi-Source Morphological Reinflection

no code implementations EACL 2017 Katharina Kann, Ryan Cotterell, Hinrich Schütze

We explore the task of multi-source morphological reinflection, which generalizes the standard, single-source version.

LEMMA TAG

Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection

1 code implementation ACL 2016 Katharina Kann, Hinrich Schütze

Morphological reinflection is the task of generating a target form given a source form, a source tag and a target tag.

TAG

Cannot find the paper you are looking for? You can Submit a new open access paper.