Search Results for author: Gabriel Skantze

Found 34 papers, 10 papers with code

How “open” are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation

no code implementations SIGDIAL (ACL) 2021 A. Seza Doğruöz, Gabriel Skantze

To clarify the boundaries of “openness”, we conduct two studies: First, we classify the types of “speech events” encountered in a chatbot evaluation data set (i. e., Meena by Google) and find that these conversations mainly cover the “small talk” category and exclude the other speech event categories encountered in real life human-human communication.

Chatbot

Projection of Turn Completion in Incremental Spoken Dialogue Systems

no code implementations SIGDIAL (ACL) 2021 Erik Ekstedt, Gabriel Skantze

The ability to take turns in a fluent way (i. e., without long response delays or frequent interruptions) is a fundamental aspect of any spoken dialog system.

Language Modelling speech-recognition +2

Multilingual Turn-taking Prediction Using Voice Activity Projection

no code implementations11 Mar 2024 Koji Inoue, Bing'er Jiang, Erik Ekstedt, Tatsuya Kawahara, Gabriel Skantze

The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages.

An Analysis of User Behaviors for Objectively Evaluating Spoken Dialogue Systems

no code implementations10 Jan 2024 Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze

To address this issue, we propose a framework for indirectly but objectively evaluating systems based on users' behaviors.

Spoken Dialogue Systems

Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection

1 code implementation10 Jan 2024 Koji Inoue, Bing'er Jiang, Erik Ekstedt, Tatsuya Kawahara, Gabriel Skantze

A demonstration of a real-time and continuous turn-taking prediction system is presented.

Resolving References in Visually-Grounded Dialogue via Text Generation

1 code implementation23 Sep 2023 Bram Willemsen, Livia Qian, Gabriel Skantze

Vision-language models (VLMs) have shown to be effective at image retrieval based on simple text queries, but text-image retrieval based on conversational input remains a challenge.

Image Retrieval Language Modelling +3

Collecting Visually-Grounded Dialogue with A Game Of Sorts

1 code implementation LREC 2022 Bram Willemsen, Dmytro Kalpakchi, Gabriel Skantze

We address these concerns by introducing a collaborative image ranking task, a grounded agreement game we call "A Game Of Sorts".

Coreference Resolution Image Retrieval +6

Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors

no code implementations21 Aug 2023 Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze

This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors.

Using Large Language Models for Zero-Shot Natural Language Generation from Knowledge Graphs

1 code implementation14 Jul 2023 Agnes Axelsson, Gabriel Skantze

In any system that uses structured knowledge graph (KG) data as its underlying knowledge representation, KG-to-text generation is a useful tool for turning parts of the graph data into text that can be understood by humans.

KG-to-Text Generation Knowledge Graphs +1

Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis

no code implementations29 May 2023 Erik Ekstedt, Siyang Wang, Éva Székely, Joakim Gustafson, Gabriel Skantze

Turn-taking is a fundamental aspect of human communication where speakers convey their intention to either hold, or yield, their turn through prosodic cues.

Speech Synthesis

Response-conditioned Turn-taking Prediction

no code implementations3 May 2023 Bing'er Jiang, Erik Ekstedt, Gabriel Skantze

Treating the turn-prediction and response-ranking as a one-stage process, our findings suggest that our model can be used as an incremental response ranker, which can be applied in various settings.

Response Generation

What makes a good pause? Investigating the turn-holding effects of fillers

no code implementations3 May 2023 Bing'er Jiang, Erik Ekstedt, Gabriel Skantze

Filled pauses (or fillers), such as "uh" and "um", are frequent in spontaneous speech and can serve as a turn-holding cue for the listener, indicating that the current speaker is not done yet.

Position

The Open-domain Paradox for Chatbots: Common Ground as the Basis for Human-like Dialogue

no code implementations21 Mar 2023 Gabriel Skantze, A. Seza Doğruöz

There is a surge in interest in the development of open-domain chatbots, driven by the recent advancements of large language models.

Position

How "open" are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation

no code implementations24 Nov 2022 A. Seza Doğruöz, Gabriel Skantze

To clarify the boundaries of "openness", we conduct two studies: First, we classify the types of "speech events" encountered in a chatbot evaluation data set (i. e., Meena by Google) and find that these conversations mainly cover the "small talk" category and exclude the other speech event categories encountered in real life human-human communication.

Chatbot

How Much Does Prosody Help Turn-taking? Investigations using Voice Activity Projection Models

2 code implementations SIGDIAL (ACL) 2022 Erik Ekstedt, Gabriel Skantze

Turn-taking is a fundamental aspect of human communication and can be described as the ability to take turns, project upcoming turn shifts, and supply backchannels at appropriate locations throughout a conversation.

Voice Activity Projection: Self-supervised Learning of Turn-taking Events

3 code implementations19 May 2022 Erik Ekstedt, Gabriel Skantze

The modeling of turn-taking in dialog can be viewed as the modeling of the dynamics of voice activity of the interlocutors.

Self-Supervised Learning

CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings

1 code implementation15 Nov 2021 Gabriel Skantze, Bram Willemsen

This paper presents CoLLIE: a simple, yet effective model for continual learning of how language is grounded in vision.

Continual Learning Few-Shot Learning

TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog

1 code implementation Findings of the Association for Computational Linguistics 2020 Erik Ekstedt, Gabriel Skantze

Syntactic and pragmatic completeness is known to be important for turn-taking prediction, but so far machine learning models of turn-taking have used such linguistic information in a limited way.

Language Modelling

Modelling Adaptive Presentations in Human-Robot Interaction using Behaviour Trees

no code implementations WS 2019 Nils Axelsson, Gabriel Skantze

In dialogue, speakers continuously adapt their speech to accommodate the listener, based on the feedback they receive.

Using Lexical Alignment and Referring Ability to Address Data Sparsity in Situated Dialog Reference Resolution

no code implementations EMNLP 2018 Todd Shore, Gabriel Skantze

Referring to entities in situated dialog is a collaborative process, whereby interlocutors often expand, repair and/or replace referring expressions in an iterative process, converging on conceptual pacts of referring language use in doing so.

Referring Expression

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs

1 code implementation31 Aug 2018 Matthew Roddy, Gabriel Skantze, Naomi Harte

To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into turn-taking models.

Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs

1 code implementation29 Jun 2018 Matthew Roddy, Gabriel Skantze, Naomi Harte

The continuous predictions represent generalized turn-taking behaviors observed in the training data and can be applied to make decisions that are not just limited to end-of-turn detection.

Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks

no code implementations WS 2017 Gabriel Skantze

Previous models of turn-taking have mostly been trained for specific turn-taking decisions, such as discriminating between turn shifts and turn retention in pauses.

Feature Engineering Spoken Dialogue Systems

Cannot find the paper you are looking for? You can Submit a new open access paper.