Search Results for author: Gabriel Skantze

Found 34 papers, 10 papers with code

Annotation of Communicative Functions of Short Feedback Tokens in Switchboard

no code implementations • LREC 2022 • Carol Figueroa, Adaeze Adigwe, Magalie Ochs, Gabriel Skantze

There has been a lot of work on predicting the timing of feedback in conversational systems.

Paper
Add Code

How “open” are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation

no code implementations • SIGDIAL (ACL) 2021 • A. Seza Doğruöz, Gabriel Skantze

To clarify the boundaries of “openness”, we conduct two studies: First, we classify the types of “speech events” encountered in a chatbot evaluation data set (i. e., Meena by Google) and find that these conversations mainly cover the “small talk” category and exclude the other speech event categories encountered in real life human-human communication.

Chatbot

Paper
Add Code

Projection of Turn Completion in Incremental Spoken Dialogue Systems

no code implementations • SIGDIAL (ACL) 2021 • Erik Ekstedt, Gabriel Skantze

The ability to take turns in a fluent way (i. e., without long response delays or frequent interruptions) is a fundamental aspect of any spoken dialog system.

Language Modelling speech-recognition +2

Paper
Add Code

Multilingual Turn-taking Prediction Using Voice Activity Projection

no code implementations • 11 Mar 2024 • Koji Inoue, Bing'er Jiang, Erik Ekstedt, Tatsuya Kawahara, Gabriel Skantze

The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages.

Paper
Add Code

An Analysis of User Behaviors for Objectively Evaluating Spoken Dialogue Systems

no code implementations • 10 Jan 2024 • Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze

To address this issue, we propose a framework for indirectly but objectively evaluating systems based on users' behaviors.

Spoken Dialogue Systems

Paper
Add Code

Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection

1 code implementation • 10 Jan 2024 • Koji Inoue, Bing'er Jiang, Erik Ekstedt, Tatsuya Kawahara, Gabriel Skantze

A demonstration of a real-time and continuous turn-taking prediction system is presented.

Paper
Code

Resolving References in Visually-Grounded Dialogue via Text Generation

1 code implementation • 23 Sep 2023 • Bram Willemsen, Livia Qian, Gabriel Skantze

Vision-language models (VLMs) have shown to be effective at image retrieval based on simple text queries, but text-image retrieval based on conversational input remains a challenge.

Image Retrieval Language Modelling +3

Paper
Code

Collecting Visually-Grounded Dialogue with A Game Of Sorts

1 code implementation • LREC 2022 • Bram Willemsen, Dmytro Kalpakchi, Gabriel Skantze

We address these concerns by introducing a collaborative image ranking task, a grounded agreement game we call "A Game Of Sorts".

Coreference Resolution Image Retrieval +6

Paper
Code

Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors

no code implementations • 21 Aug 2023 • Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze

This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors.

Paper
Add Code

Using Large Language Models for Zero-Shot Natural Language Generation from Knowledge Graphs

1 code implementation • 14 Jul 2023 • Agnes Axelsson, Gabriel Skantze

In any system that uses structured knowledge graph (KG) data as its underlying knowledge representation, KG-to-text generation is a useful tool for turning parts of the graph data into text that can be understood by humans.

KG-to-Text Generation Knowledge Graphs +1

Paper
Code

Automatic Evaluation of Turn-taking Cues in Conversational Speech Synthesis

no code implementations • 29 May 2023 • Erik Ekstedt, Siyang Wang, Éva Székely, Joakim Gustafson, Gabriel Skantze

Turn-taking is a fundamental aspect of human communication where speakers convey their intention to either hold, or yield, their turn through prosodic cues.

Speech Synthesis

Paper
Add Code

Response-conditioned Turn-taking Prediction

no code implementations • 3 May 2023 • Bing'er Jiang, Erik Ekstedt, Gabriel Skantze

Treating the turn-prediction and response-ranking as a one-stage process, our findings suggest that our model can be used as an incremental response ranker, which can be applied in various settings.

Response Generation

Paper
Add Code

What makes a good pause? Investigating the turn-holding effects of fillers

no code implementations • 3 May 2023 • Bing'er Jiang, Erik Ekstedt, Gabriel Skantze

Filled pauses (or fillers), such as "uh" and "um", are frequent in spontaneous speech and can serve as a turn-holding cue for the listener, indicating that the current speaker is not done yet.

Position

Paper
Add Code

The Open-domain Paradox for Chatbots: Common Ground as the Basis for Human-like Dialogue

no code implementations • 21 Mar 2023 • Gabriel Skantze, A. Seza Doğruöz

There is a surge in interest in the development of open-domain chatbots, driven by the recent advancements of large language models.

Position

Paper
Add Code

How "open" are the conversations with open-domain chatbots? A proposal for Speech Event based evaluation

no code implementations • 24 Nov 2022 • A. Seza Doğruöz, Gabriel Skantze

To clarify the boundaries of "openness", we conduct two studies: First, we classify the types of "speech events" encountered in a chatbot evaluation data set (i. e., Meena by Google) and find that these conversations mainly cover the "small talk" category and exclude the other speech event categories encountered in real life human-human communication.

Chatbot

Paper
Add Code

How Much Does Prosody Help Turn-taking? Investigations using Voice Activity Projection Models

2 code implementations • SIGDIAL (ACL) 2022 • Erik Ekstedt, Gabriel Skantze

Turn-taking is a fundamental aspect of human communication and can be described as the ability to take turns, project upcoming turn shifts, and supply backchannels at appropriate locations throughout a conversation.

Paper
Code

Voice Activity Projection: Self-supervised Learning of Turn-taking Events

3 code implementations • 19 May 2022 • Erik Ekstedt, Gabriel Skantze

The modeling of turn-taking in dialog can be viewed as the modeling of the dynamics of voice activity of the interlocutors.

Self-Supervised Learning

Paper
Code

CoLLIE: Continual Learning of Language Grounding from Language-Image Embeddings

1 code implementation • 15 Nov 2021 • Gabriel Skantze, Bram Willemsen

This paper presents CoLLIE: a simple, yet effective model for continual learning of how language is grounded in vision.

Continual Learning Few-Shot Learning

Paper
Code

TurnGPT: a Transformer-based Language Model for Predicting Turn-taking in Spoken Dialog

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Erik Ekstedt, Gabriel Skantze

Syntactic and pragmatic completeness is known to be important for turn-taking prediction, but so far machine learning models of turn-taking have used such linguistic information in a limited way.

Language Modelling

Paper
Code

Modelling Adaptive Presentations in Human-Robot Interaction using Behaviour Trees

no code implementations • WS 2019 • Nils Axelsson, Gabriel Skantze

In dialogue, speakers continuously adapt their speech to accommodate the listener, based on the feedback they receive.

Paper
Add Code

Using Lexical Alignment and Referring Ability to Address Data Sparsity in Situated Dialog Reference Resolution

no code implementations • EMNLP 2018 • Todd Shore, Gabriel Skantze

Referring to entities in situated dialog is a collaborative process, whereby interlocutors often expand, repair and/or replace referring expressions in an iterative process, converging on conceptual pacts of referring language use in doing so.

Referring Expression

Paper
Add Code

Multimodal Continuous Turn-Taking Prediction Using Multiscale RNNs

1 code implementation • 31 Aug 2018 • Matthew Roddy, Gabriel Skantze, Naomi Harte

To design spoken dialog systems that can conduct fluid interactions it is desirable to incorporate cues from separate modalities into turn-taking models.

Paper
Code

Investigating Speech Features for Continuous Turn-Taking Prediction Using LSTMs

1 code implementation • 29 Jun 2018 • Matthew Roddy, Gabriel Skantze, Naomi Harte

The continuous predictions represent generalized turn-taking behaviors observed in the training data and can be applied to make decisions that are not just limited to end-of-turn detection.

Paper
Code

A Multimodal Corpus for Mutual Gaze and Joint Attention in Multiparty Situated Interaction

no code implementations • LREC 2018 • Dimosthenis Kontogiorgos, Vanya Avramova, Alex, Simon erson, Patrik Jonell, Catharine Oertel, Jonas Beskow, Gabriel Skantze, Joakim Gustafson

Mutual Gaze

Paper
Add Code

KTH Tangrams: A Dataset for Research on Alignment and Conceptual Pacts in Task-Oriented Dialogue

no code implementations • LREC 2018 • Todd Shore, Theofronia Androulakaki, Gabriel Skantze

Paper
Add Code

Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks

no code implementations • WS 2017 • Gabriel Skantze

Previous models of turn-taking have mostly been trained for specific turn-taking decisions, such as discriminating between turn shifts and turn retention in pauses.

Feature Engineering Spoken Dialogue Systems