Search Results for author: David Thulke

Found 11 papers, 8 papers with code

ClimateGPT: Towards AI Synthesizing Interdisciplinary Research on Climate Change

1 code implementation • 17 Jan 2024 • David Thulke, Yingbo Gao, Petrus Pelser, Rein Brune, Rricha Jalota, Floris Fok, Michael Ramos, Ian van Wyk, Abdallah Nasir, Hayden Goldstein, Taylor Tragemann, Katie Nguyen, Ariana Fowler, Andrew Stanco, Jon Gabriel, Jordan Taylor, Dean Moro, Evgenii Tsymbalov, Juliette de Waal, Evgeny Matusov, Mudar Yaghi, Mohammad Shihadah, Hermann Ney, Christian Dugast, Jonathan Dotan, Daniel Erasmus

To increase the accessibility of our model to non-English speakers, we propose to make use of cascaded machine translation and show that this approach can perform comparably to natively multilingual models while being easier to scale to a large number of languages.

Machine Translation Retrieval

Paper
Code

Exploring Spoken Named Entity Recognition: A Cross-Lingual Perspective

1 code implementation • 3 Jul 2023 • Moncef Benaicha, David Thulke, M. A. Tuğtekin Turan

Recent advancements in Named Entity Recognition (NER) have significantly improved the identification of entities in textual data.

Cross-Lingual Transfer named-entity-recognition +4

Paper
Code

Task-oriented Document-Grounded Dialog Systems by HLTPR@RWTH for DSTC9 and DSTC10

no code implementations • 14 Apr 2023 • David Thulke, Nico Daheim, Christian Dugast, Hermann Ney

This paper summarizes our contributions to the document-grounded dialog tasks at the 9th and 10th Dialog System Technology Challenges (DSTC9 and DSTC10).

Automatic Speech Recognition Data Augmentation +2

Paper
Add Code

Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token

1 code implementation • 9 Nov 2022 • Baohao Liao, David Thulke, Sanjika Hewavitharana, Hermann Ney, Christof Monz

We show: (1) [MASK]s can indeed be appended at a later layer, being disentangled from the word embedding; (2) The gathering of contextualized information from unmasked tokens can be conducted with a few layers.

Paper
Code

Controllable Factuality in Document-Grounded Dialog Systems Using a Noisy Channel Model

1 code implementation • 31 Oct 2022 • Nico Daheim, David Thulke, Christian Dugast, Hermann Ney

In this work, we present a model for document-grounded response generation in dialog that is decomposed into two components according to Bayes theorem.

Response Generation

Paper
Code

Does Joint Training Really Help Cascaded Speech Translation?

1 code implementation • 24 Oct 2022 • Viet Anh Khoa Tran, David Thulke, Yingbo Gao, Christian Herold, Hermann Ney

Currently, in speech translation, the straightforward approach - cascading a recognition system with a translation system - delivers state-of-the-art results.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Adapting Document-Grounded Dialog Systems to Spoken Conversations using Data Augmentation and a Noisy Channel Model

1 code implementation • 16 Dec 2021 • David Thulke, Nico Daheim, Christian Dugast, Hermann Ney

This paper summarizes our submission to Task 2 of the second track of the 10th Dialog System Technology Challenge (DSTC10) "Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations".

Data Augmentation Task 2

Paper
Code

Investigation on Data Adaptation Techniques for Neural Named Entity Recognition

no code implementations • ACL 2021 • Evgeniia Tokarchuk, David Thulke, Weiyue Wang, Christian Dugast, Hermann Ney

Data processing is an important step in various natural language processing tasks.

Data Augmentation named-entity-recognition +2

Paper
Add Code

Cascaded Span Extraction and Response Generation for Document-Grounded Dialog

1 code implementation • ACL (dialdoc) 2021 • Nico Daheim, David Thulke, Christian Dugast, Hermann Ney

For the second subtask, we use a cascaded model which grounds the response prediction on the predicted span instead of the full document.

Response Generation valid

Paper
Code

On Sampling-Based Training Criteria for Neural Language Modeling

no code implementations • 21 Apr 2021 • Yingbo Gao, David Thulke, Alexander Gerstenberger, Khoa Viet Tran, Ralf Schlüter, Hermann Ney

As the vocabulary size of modern word-based language models becomes ever larger, many sampling-based training criteria are proposed and investigated.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Efficient Retrieval Augmented Generation from Unstructured Knowledge for Task-Oriented Dialog

1 code implementation • 9 Feb 2021 • David Thulke, Nico Daheim, Christian Dugast, Hermann Ney

This paper summarizes our work on the first track of the ninth Dialog System Technology Challenge (DSTC 9), "Beyond Domain APIs: Task-oriented Conversational Modeling with Unstructured Knowledge Access".

Retrieval

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.