Search Results for author: Ngoc Thang Vu

Found 104 papers, 24 papers with code

“It seemed like an annoying woman”: On the Perception and Ethical Considerations of Affective Language in Text-Based Conversational Agents

no code implementations CoNLL (EMNLP) 2021 Lindsey Vanderlyn, Gianna Weber, Michael Neumann, Dirk Väth, Sarina Meyer, Ngoc Thang Vu

Based on statistical and qualitative analysis of the responses, we found language style played an important role in how human-like participants perceived a dialog agent as well as how likable.

Chatbot

Teaching a Multilingual Large Language Model to Understand Multilingual Speech via Multi-Instructional Training

1 code implementation16 Apr 2024 Pavel Denisov, Ngoc Thang Vu

Our zero-shot evaluation results confirm the robustness of our approach across multiple tasks, including speech translation and multilingual spoken language understanding, thereby opening new avenues for applying LLMs in the speech domain.

Intrinsic Subgraph Generation for Interpretable Graph based Visual Question Answering

1 code implementation26 Mar 2024 Pascal Tilli, Ngoc Thang Vu

In this work, we introduce an interpretable approach for graph-based VQA and demonstrate competitive performance on the GQA dataset.

Decision Making Explainable artificial intelligence +3

Towards a Zero-Data, Controllable, Adaptive Dialog System

no code implementations26 Mar 2024 Dirk Väth, Lindsey Vanderlyn, Ngoc Thang Vu

Conversational Tree Search (V\"ath et al., 2023) is a recent approach to controllable dialog systems, where domain experts shape the behavior of a Reinforcement Learning agent through a dialog tree.

Language Modelling Large Language Model +1

Explaining Pre-Trained Language Models with Attribution Scores: An Analysis in Low-Resource Settings

no code implementations8 Mar 2024 Wei Zhou, Heike Adel, Hendrik Schuff, Ngoc Thang Vu

Attribution scores indicate the importance of different input parts and can, thus, explain model behaviour.

The IMS Toucan System for the Blizzard Challenge 2023

1 code implementation26 Oct 2023 Florian Lux, Julia Koch, Sarina Meyer, Thomas Bott, Nadja Schauffler, Pavel Denisov, Antje Schweitzer, Ngoc Thang Vu

For our contribution to the Blizzard Challenge 2023, we improved on the system we submitted to the Blizzard Challenge 2021.

Controllable Generation of Artificial Speaker Embeddings through Discovery of Principal Directions

no code implementations26 Oct 2023 Florian Lux, Pascal Tilli, Sarina Meyer, Ngoc Thang Vu

Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available.

Speech Synthesis

Data Augmentation Techniques for Machine Translation of Code-Switched Texts: A Comparative Study

no code implementations23 Oct 2023 Injy Hamed, Nizar Habash, Ngoc Thang Vu

Linguistic theories and random lexical replacement prove to be effective in the lack of CSW parallel data, where both approaches achieve similar results.

Data Augmentation Machine Translation +2

Leveraging Multilingual Self-Supervised Pretrained Models for Sequence-to-Sequence End-to-End Spoken Language Understanding

1 code implementation9 Oct 2023 Pavel Denisov, Ngoc Thang Vu

A number of methods have been proposed for End-to-End Spoken Language Understanding (E2E-SLU) using pretrained models, however their evaluation often lacks multilingual setup and tasks that require prediction of lexical fillers, such as slot filling.

slot-filling Slot Filling +3

Neighboring Words Affect Human Interpretation of Saliency Explanations

1 code implementation4 May 2023 Alon Jacovi, Hendrik Schuff, Heike Adel, Ngoc Thang Vu, Yoav Goldberg

Word-level saliency explanations ("heat maps over words") are often used to communicate feature-attribution in text-based models.

Modeling Speaker-Listener Interaction for Backchannel Prediction

no code implementations10 Apr 2023 Daniel Ortega, Sarina Meyer, Antje Schweitzer, Ngoc Thang Vu

We present our latest findings on backchannel modeling novelly motivated by the canonical use of the minimal responses Yeah and Uh-huh in English and their correspondent tokens in German, and the effect of encoding the speaker-listener interaction.

Oh, Jeez! or Uh-huh? A Listener-aware Backchannel Predictor on ASR Transcriptions

no code implementations10 Apr 2023 Daniel Ortega, Chia-Yu Li, Ngoc Thang Vu

This paper presents our latest investigation on modeling backchannel in conversations.

Conversational Tree Search: A New Hybrid Dialog Task

1 code implementation17 Mar 2023 Dirk Väth, Lindsey Vanderlyn, Ngoc Thang Vu

Conversational interfaces provide a flexible and easy way for users to seek information that may otherwise be difficult or inconvenient to obtain.

Information Retrieval Navigate +1

Low-Resource Multilingual and Zero-Shot Multispeaker TTS

1 code implementation21 Oct 2022 Florian Lux, Julia Koch, Ngoc Thang Vu

While neural methods for text-to-speech (TTS) have shown great advances in modeling multiple speakers, even in zero-shot settings, the amount of data needed for those approaches is generally not feasible for the vast majority of the world's over 6, 000 spoken languages.

Meta-Learning Voice Cloning

Improving Semi-supervised End-to-end Automatic Speech Recognition using CycleGAN and Inter-domain Losses

no code implementations20 Oct 2022 Chia-Yu Li, Ngoc Thang Vu

In this paper, we exploit the advantages from both inter-domain loss and CycleGAN to achieve better shared representation of unpaired speech and text inputs and thus improve the speech-to-text mapping.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Challenges in Explanation Quality Evaluation

no code implementations13 Oct 2022 Hendrik Schuff, Heike Adel, Peng Qi, Ngoc Thang Vu

This approach assumes that explanations which reach higher proxy scores will also provide a greater benefit to human users.

Question Answering

Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy

1 code implementation13 Oct 2022 Sarina Meyer, Pascal Tilli, Pavel Denisov, Florian Lux, Julia Koch, Ngoc Thang Vu

In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings.

Generative Adversarial Network

The Who in Code-Switching: A Case Study for Predicting Egyptian Arabic-English Code-Switching Levels based on Character Profiles

no code implementations31 Jul 2022 Injy Hamed, Alia El Bolock, Cornelia Herbert, Slim Abdennadher, Ngoc Thang Vu

Given that the factors giving rise to CS vary from one country to the other, as well as from one person to the other, CS is found to be a speaker-dependant behaviour, where the frequency by which the foreign language is embedded differs across speakers.

Speaker Anonymization with Phonetic Intermediate Representations

1 code implementation11 Jul 2022 Sarina Meyer, Florian Lux, Pavel Denisov, Julia Koch, Pascal Tilli, Ngoc Thang Vu

In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Exact Prosody Cloning in Zero-Shot Multispeaker Text-to-Speech

2 code implementations24 Jun 2022 Florian Lux, Julia Koch, Ngoc Thang Vu

The cloning of a speaker's voice using an untranscribed reference sample is one of the great advances of modern neural text-to-speech (TTS) methods.

Investigating Lexical Replacements for Arabic-English Code-Switched Data Augmentation

no code implementations25 May 2022 Injy Hamed, Nizar Habash, Slim Abdennadher, Ngoc Thang Vu

Results show that using a predictive model results in more natural CS sentences compared to the random approach, as reported in human judgements.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

BPE vs. Morphological Segmentation: A Case Study on Machine Translation of Four Polysynthetic Languages

no code implementations Findings (ACL) 2022 Manuel Mager, Arturo Oncevay, Elisabeth Mager, Katharina Kann, Ngoc Thang Vu

Morphologically-rich polysynthetic languages present a challenge for NLP systems due to data sparsity, and a common strategy to handle this issue is to apply subword segmentation.

Machine Translation Segmentation +1

Language-Agnostic Meta-Learning for Low-Resource Text-to-Speech with Articulatory Features

1 code implementation ACL 2022 Florian Lux, Ngoc Thang Vu

While neural text-to-speech systems perform remarkably well in high-resource scenarios, they cannot be applied to the majority of the over 6, 000 spoken languages in the world due to a lack of appropriate training data.

Meta-Learning

Human Interpretation of Saliency-based Explanation Over Text

1 code implementation27 Jan 2022 Hendrik Schuff, Alon Jacovi, Heike Adel, Yoav Goldberg, Ngoc Thang Vu

In this work, we focus on this question through a study of saliency-based explanations over textual data.

Investigation of Densely Connected Convolutional Networks with Domain Adversarial Learning for Noise Robust Speech Recognition

no code implementations19 Dec 2021 Chia Yu Li, Ngoc Thang Vu

We investigate densely connected convolutional networks (DenseNets) and their extension with domain adversarial training for noise robust speech recognition.

Robust Speech Recognition speech-recognition

Integrating Knowledge in End-to-End Automatic Speech Recognition for Mandarin-English Code-Switching

no code implementations19 Dec 2021 Chia-Yu Li, Ngoc Thang Vu

Code-Switching (CS) is a common linguistic phenomenon in multilingual communities that consists of switching between languages while speaking.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Predicting User Code-Switching Level from Sociological and Psychological Profiles

no code implementations13 Dec 2021 Injy Hamed, Alia El Bolock, Nader Rizk, Cornelia Herbert, Slim Abdennadher, Ngoc Thang Vu

Multilingual speakers tend to alternate between languages within a conversation, a phenomenon referred to as "code-switching" (CS).

ESPnet-SLU: Advancing Spoken Language Understanding through ESPnet

2 code implementations29 Nov 2021 Siddhant Arora, Siddharth Dalmia, Pavel Denisov, Xuankai Chang, Yushi Ueda, Yifan Peng, Yuekai Zhang, Sujay Kumar, Karthik Ganesan, Brian Yan, Ngoc Thang Vu, Alan W Black, Shinji Watanabe

However, there are few open source toolkits that can be used to generate reproducible results on different Spoken Language Understanding (SLU) benchmarks.

Spoken Language Understanding

Beyond Accuracy: A Consolidated Tool for Visual Question Answering Benchmarking

1 code implementation EMNLP (ACL) 2021 Dirk Väth, Pascal Tilli, Ngoc Thang Vu

On the way towards general Visual Question Answering (VQA) systems that are able to answer arbitrary questions, the need arises for evaluation beyond single-metric leaderboards for specific datasets.

Benchmarking Question Answering +1

Does External Knowledge Help Explainable Natural Language Inference? Automatic Evaluation vs. Human Ratings

1 code implementation EMNLP (BlackboxNLP) 2021 Hendrik Schuff, Hsiu-Yu Yang, Heike Adel, Ngoc Thang Vu

For this, we investigate different sources of external knowledge and evaluate the performance of our models on in-domain data as well as on special transfer datasets that are designed to assess fine-grained reasoning capabilities.

Natural Language Inference

Thought Flow Nets: From Single Predictions to Trains of Model Thought

no code implementations26 Jul 2021 Hendrik Schuff, Heike Adel, Ngoc Thang Vu

In addition, we conduct a qualitative analysis of thought flow correction patterns and explore how thought flow predictions affect human users within a crowdsourcing study.

Question Answering

Few-shot Learning for Slot Tagging with Attentive Relational Network

no code implementations EACL 2021 Cennet Oguz, Ngoc Thang Vu

Metric-based learning is a well-known family of methods for few-shot learning, especially in computer vision.

Few-Shot Learning

Investigations on Audiovisual Emotion Recognition in Noisy Conditions

no code implementations2 Mar 2021 Michael Neumann, Ngoc Thang Vu

In this paper we explore audiovisual emotion recognition under noisy acoustic conditions with a focus on speech features.

Speech Emotion Recognition

Meta-Learning for improving rare word recognition in end-to-end ASR

no code implementations25 Feb 2021 Florian Lux, Ngoc Thang Vu

We propose a new method of generating meaningful embeddings for speech, changes to four commonly used meta learning approaches to enable them to perform keyword spotting in continuous signals and an approach of combining their outcomes into an end-to-end automatic speech recognition system to improve rare word recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning

no code implementations COLING 2020 Daniel Grießhaber, Johannes Maucher, Ngoc Thang Vu

Recently, leveraging pre-trained Transformer based language models in down stream, task specific models has advanced state of the art results in natural language understanding tasks.

Active Learning Language Modelling +1

Interpreting Attention Models with Human Visual Attention in Machine Reading Comprehension

no code implementations CONLL 2020 Ekta Sood, Simon Tannert, Diego Frassinelli, Andreas Bulling, Ngoc Thang Vu

We compare state of the art networks based on long short-term memory (LSTM), convolutional neural models (CNN) and XLNet Transformer architectures.

Machine Reading Comprehension

F1 is Not Enough! Models and Evaluation Towards User-Centered Explainable Question Answering

1 code implementation EMNLP 2020 Hendrik Schuff, Heike Adel, Ngoc Thang Vu

The user study shows that our models increase the ability of the users to judge the correctness of the system and that scores like F1 are not enough to estimate the usefulness of a model in a practical setting with human users.

Model Selection Question Answering

Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning

no code implementations3 Jul 2020 Pavel Denisov, Ngoc Thang Vu

Spoken language understanding is typically based on pipeline architectures including speech recognition and natural language understanding steps.

Natural Language Understanding speech-recognition +2

Ensemble Self-Training for Low-Resource Languages: Grapheme-to-Phoneme Conversion and Morphological Inflection

no code implementations WS 2020 Xiang Yu, Ngoc Thang Vu, Jonas Kuhn

We present an iterative data augmentation framework, which trains and searches for an optimal ensemble and simultaneously annotates new training data in a self-training style.

Data Augmentation Morphological Inflection

ADVISER: A Toolkit for Developing Multi-modal, Multi-domain and Socially-engaged Conversational Agents

1 code implementation ACL 2020 Chia-Yu Li, Daniel Ortega, Dirk Väth, Florian Lux, Lindsey Vanderlyn, Maximilian Schmidt, Michael Neumann, Moritz Völkel, Pavel Denisov, Sabrina Jenne, Zorica Kacarevic, Ngoc Thang Vu

We present ADVISER - an open-source, multi-domain dialog system toolkit that enables the development of multi-modal (incorporating speech, text and vision), socially-engaged (e. g. emotion recognition, engagement level prediction and backchanneling) conversational agents.

BIG-bench Machine Learning Emotion Recognition

Head-First Linearization with Tree-Structured Representation

no code implementations WS 2019 Xiang Yu, Agnieszka Falenska, Ngoc Thang Vu, Jonas Kuhn

We present a dependency tree linearization model with two novel components: (1) a tree-structured encoder based on bidirectional Tree-LSTM that propagates information first bottom-up then top-down, which allows each token to access information from the entire tree; and (2) a linguistically motivated head-first decoder that emphasizes the central role of the head and linearizes the subtree by incrementally attaching the dependents on both sides of the head.

To Combine or Not To Combine? A Rainbow Deep Reinforcement Learning Agent for Dialog Policies

no code implementations WS 2019 Dirk V{\"a}th, Ngoc Thang Vu

In this paper, we explore state-of-the-art deep reinforcement learning methods for dialog policy training such as prioritized experience replay, double deep Q-Networks, dueling network architectures and distributional learning.

reinforcement-learning Reinforcement Learning (RL)

IMS-Speech: A Speech to Text Tool

no code implementations13 Aug 2019 Pavel Denisov, Ngoc Thang Vu

We present the IMS-Speech, a web based tool for German and English speech transcription aiming to facilitate research in various disciplines which require accesses to lexical information in spoken language materials.

Ranked #4 on Speech Recognition on TUDA (using extra training data)

speech-recognition Speech Recognition

Learning the Dyck Language with Attention-based Seq2Seq Models

no code implementations WS 2019 Xiang Yu, Ngoc Thang Vu, Jonas Kuhn

The generalized Dyck language has been used to analyze the ability of Recurrent Neural Networks (RNNs) to learn context-free grammars (CFGs).

Approximate Dynamic Oracle for Dependency Parsing with Reinforcement Learning

no code implementations WS 2018 Xiang Yu, Ngoc Thang Vu, Jonas Kuhn

We present a general approach with reinforcement learning (RL) to approximate dynamic oracles for transition systems where exact dynamic oracles are difficult to derive.

Dependency Parsing Imitation Learning +4

Sequence-to-Sequence Models for Data-to-Text Natural Language Generation: Word- vs. Character-based Processing and Output Diversity

no code implementations WS 2018 Glorianna Jagfeld, Sabrina Jenne, Ngoc Thang Vu

We present a comparison of word-based and character-based sequence-to-sequence models for data-to-text natural language generation, which generate natural language descriptions for structured inputs.

Text Generation

Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension

1 code implementation CONLL 2018 Matthias Blohm, Glorianna Jagfeld, Ekta Sood, Xiang Yu, Ngoc Thang Vu

We propose a machine reading comprehension model based on the compare-aggregate framework with two-staged attention that achieves state-of-the-art results on the MovieQA question answering dataset.

Machine Reading Comprehension Question Answering

Densely Connected Convolutional Networks for Speech Recognition

no code implementations10 Aug 2018 Chia Yu Li, Ngoc Thang Vu

This paper presents our latest investigation on Densely Connected Convolutional Networks (DenseNets) for acoustic modelling (AM) in automatic speech recognition.

Acoustic Modelling Automatic Speech Recognition +2

Unsupervised Domain Adaptation by Adversarial Learning for Robust Speech Recognition

no code implementations30 Jul 2018 Pavel Denisov, Ngoc Thang Vu, Marc Ferras Font

In this paper, we investigate the use of adversarial learning for unsupervised adaptation to unseen recording conditions, more specifically, single microphone far-field speech.

Robust Speech Recognition speech-recognition +1

Low-Resource Text Classification using Domain-Adversarial Learning

no code implementations13 Jul 2018 Daniel Grießhaber, Ngoc Thang Vu, Johannes Maucher

Deep learning techniques have recently shown to be successful in many natural language processing tasks forming state-of-the-art systems.

General Classification text-classification +1

Effects of Word Embeddings on Neural Network-based Pitch Accent Detection

no code implementations14 May 2018 Sabrina Stehwien, Ngoc Thang Vu, Antje Schweitzer

Pitch accent detection often makes use of both acoustic and lexical features based on the fact that pitch accents tend to correlate with certain words.

Cross-corpus Word Embeddings

Investigations on End-to-End Audiovisual Fusion

no code implementations30 Apr 2018 Michael Wand, Ngoc Thang Vu, Juergen Schmidhuber

Audiovisual speech recognition (AVSR) is a method to alleviate the adverse effect of noise in the acoustic signal.

speech-recognition Speech Recognition

Introducing two Vietnamese Datasets for Evaluating Semantic Models of (Dis-)Similarity and Relatedness

no code implementations NAACL 2018 Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu

We present two novel datasets for the low-resource language Vietnamese to assess models of semantic similarity: ViCon comprises pairs of synonyms and antonyms across word classes, thus offering data to distinguish between similarity and dissimilarity.

Semantic Similarity Semantic Textual Similarity +1

Cross-lingual and Multilingual Speech Emotion Recognition on English and French

no code implementations1 Mar 2018 Michael Neumann, Ngoc Thang Vu

Research on multilingual speech emotion recognition faces the problem that most available speech corpora differ from each other in important ways, such as annotation methods or interaction scenarios.

Speech Emotion Recognition

Syntactic and Semantic Features For Code-Switching Factored Language Models

no code implementations4 Oct 2017 Heike Adel, Ngoc Thang Vu, Katrin Kirchhoff, Dominic Telaar, Tanja Schultz

The experimental results reveal that Brown word clusters, part-of-speech tags and open-class words are the most effective at reducing the perplexity of factored language models on the Mandarin-English Code-Switching corpus SEAME.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Improving coreference resolution with automatically predicted prosodic information

no code implementations WS 2017 Ina Rösiger, Sabrina Stehwien, Arndt Riester, Ngoc Thang Vu

Adding manually annotated prosodic information, specifically pitch accents and phrasing, to the typical text-based feature set for coreference resolution has previously been shown to have a positive effect on German data.

coreference-resolution

Encoding Word Confusion Networks with Recurrent Neural Networks for Dialog State Tracking

no code implementations WS 2017 Glorianna Jagfeld, Ngoc Thang Vu

This paper presents our novel method to encode word confusion networks, which can represent a rich hypothesis space of automatic speech recognition systems, via recurrent neural networks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

A General-Purpose Tagger with Convolutional Neural Networks

1 code implementation WS 2017 Xiang Yu, Agnieszka Faleńska, Ngoc Thang Vu

We present a general-purpose tagger based on convolutional neural networks (CNN), used for both composing word vectors and encoding context information.

Morphological Tagging Part-Of-Speech Tagging

Prosodic Event Recognition using Convolutional Neural Networks with Context Information

no code implementations2 Jun 2017 Sabrina Stehwien, Ngoc Thang Vu

This paper demonstrates the potential of convolutional neural networks (CNN) for detecting and classifying prosodic events on words, specifically pitch accents and phrase boundary tones, from frame-based acoustic features.

Position

Character Composition Model with Convolutional Neural Networks for Dependency Parsing on Morphologically Rich Languages

1 code implementation ACL 2017 Xiang Yu, Ngoc Thang Vu

We present a transition-based dependency parser that uses a convolutional neural network to compose word representations from characters.

Dependency Parsing Word Embeddings

Challenges of Computational Processing of Code-Switching

no code implementations WS 2016 Özlem Çetinoğlu, Sarah Schulz, Ngoc Thang Vu

This paper addresses challenges of Natural Language Processing (NLP) on non-canonical multilingual data in which two or more languages are mixed.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Sequential Convolutional Neural Networks for Slot Filling in Spoken Language Understanding

no code implementations24 Jun 2016 Ngoc Thang Vu

We investigate the usage of convolutional neural networks (CNNs) for the slot filling task in spoken language understanding.

General Classification slot-filling +2

Integrating Distributional Lexical Contrast into Word Embeddings for Antonym-Synonym Distinction

no code implementations ACL 2016 Kim Anh Nguyen, Sabine Schulte im Walde, Ngoc Thang Vu

We propose a novel vector representation that integrates lexical contrast into distributional vectors and strengthens the most salient features for determining degrees of word similarity.

Word Embeddings Word Similarity

Combining Recurrent and Convolutional Neural Networks for Relation Classification

no code implementations NAACL 2016 Ngoc Thang Vu, Heike Adel, Pankaj Gupta, Hinrich Schütze

This paper investigates two different neural architectures for the task of relation classification: convolutional neural networks and recurrent neural networks.

Classification General Classification +2

Cannot find the paper you are looking for? You can Submit a new open access paper.