1 code implementation • 26 Mar 2024 • Pascal Tilli, Ngoc Thang Vu
In this work, we introduce an interpretable approach for graph-based VQA and demonstrate competitive performance on the GQA dataset.
no code implementations • 26 Oct 2023 • Florian Lux, Pascal Tilli, Sarina Meyer, Ngoc Thang Vu
Customizing voice and speaking style in a speech synthesis system with intuitive and fine-grained controls is challenging, given that little data with appropriate labels is available.
1 code implementation • 13 Oct 2022 • Sarina Meyer, Pascal Tilli, Pavel Denisov, Florian Lux, Julia Koch, Ngoc Thang Vu
In order to protect the privacy of speech data, speaker anonymization aims for hiding the identity of a speaker by changing the voice in speech recordings.
1 code implementation • 11 Jul 2022 • Sarina Meyer, Florian Lux, Pavel Denisov, Julia Koch, Pascal Tilli, Ngoc Thang Vu
In this work, we propose a speaker anonymization pipeline that leverages high quality automatic speech recognition and synthesis systems to generate speech conditioned on phonetic transcriptions and anonymized speaker embeddings.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • EMNLP (ACL) 2021 • Dirk Väth, Pascal Tilli, Ngoc Thang Vu
On the way towards general Visual Question Answering (VQA) systems that are able to answer arbitrary questions, the need arises for evaluation beyond single-metric leaderboards for specific datasets.