Search Results for author: Tatsuya Kawahara

Found 55 papers, 12 papers with code

An Attentive Listening System with Android ERICA: Comparison of Autonomous and WOZ Interactions

no code implementations SIGDIAL (ACL) 2020 Koji Inoue, Divesh Lala, Kenta Yamamoto, Shizuka Nakamura, Katsuya Takanashi, Tatsuya Kawahara

The proposed system generates several types of listener responses: backchannels, repeats, elaborating questions, assessments, generic sentimental responses, and generic responses.

Dialogue Understanding

A multi-party attentive listening robot which stimulates involvement from side participants

no code implementations SIGDIAL (ACL) 2021 Koji Inoue, Hiromi Sakamoto, Kenta Yamamoto, Divesh Lala, Tatsuya Kawahara

We demonstrate the moderating abilities of a multi-party attentive listening robot system when multiple people are speaking in turns.

Simultaneous Job Interview System Using Multiple Semi-autonomous Agents

no code implementations SIGDIAL (ACL) 2022 Haruki Kawai, Yusuke Muraki, Kenta Yamamoto, Divesh Lala, Koji Inoue, Tatsuya Kawahara

We propose a simultaneous job interview system, where one interviewer can conduct one-on-one interviews with multiple applicants simultaneously by cooperating with the multiple autonomous job interview dialogue systems.

Dialogue Understanding Keyword Extraction +1

Multilingual Turn-taking Prediction Using Voice Activity Projection

no code implementations11 Mar 2024 Koji Inoue, Bing'er Jiang, Erik Ekstedt, Tatsuya Kawahara, Gabriel Skantze

The results show that a monolingual VAP model trained on one language does not make good predictions when applied to other languages.

Evaluation of a semi-autonomous attentive listening system with takeover prompting

no code implementations21 Feb 2024 Haruki Kawai, Divesh Lala, Koji Inoue, Keiko Ochi, Tatsuya Kawahara

To this end, we propose a semi-autonomous system, where a remote operator can take control of an autonomous attentive listening system in real-time.

Spoken Dialogue Systems

MOS-FAD: Improving Fake Audio Detection Via Automatic Mean Opinion Score Prediction

no code implementations24 Jan 2024 Wangjin Zhou, Zhengdong Yang, Chenhui Chu, Sheng Li, Raj Dabre, Yi Zhao, Tatsuya Kawahara

We propose MOS-FAD, where MOS can be leveraged at two key points in FAD: training data selection and model fusion.

FAD

Enhancing Personality Recognition in Dialogue by Data Augmentation and Heterogeneous Conversational Graph Networks

1 code implementation11 Jan 2024 Yahui Fu, Haiyue Song, Tianyu Zhao, Tatsuya Kawahara

Personality recognition is useful for enhancing robots' ability to tailor user-adaptive responses, thus fostering rich human-robot interactions.

Data Augmentation

Real-time and Continuous Turn-taking Prediction Using Voice Activity Projection

1 code implementation10 Jan 2024 Koji Inoue, Bing'er Jiang, Erik Ekstedt, Tatsuya Kawahara, Gabriel Skantze

A demonstration of a real-time and continuous turn-taking prediction system is presented.

An Analysis of User Behaviors for Objectively Evaluating Spoken Dialogue Systems

no code implementations10 Jan 2024 Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze

To address this issue, we propose a framework for indirectly but objectively evaluating systems based on users' behaviors.

Spoken Dialogue Systems

Towards Objective Evaluation of Socially-Situated Conversational Robots: Assessing Human-Likeness through Multimodal User Behaviors

no code implementations21 Aug 2023 Koji Inoue, Divesh Lala, Keiko Ochi, Tatsuya Kawahara, Gabriel Skantze

This paper tackles the challenging task of evaluating socially situated conversational robots and presents a novel objective evaluation approach that relies on multimodal user behaviors.

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM

1 code implementation8 Sep 2022 Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Distilling the Knowledge of BERT for CTC-based ASR

no code implementations5 Sep 2022 Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

In this study, we propose to distill the knowledge of BERT for CTC-based ASR, extending our previous study for attention-based ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

End-to-end Speech-to-Punctuated-Text Recognition

no code implementations7 Jul 2022 Jumon Nozaki, Tatsuya Kawahara, Kenkichi Ishizuka, Taiichi Hashimoto

We also propose to incorporate an auxiliary loss to train the model using the output of the intermediate layer and unpunctuated texts.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

ASR Rescoring and Confidence Estimation with ELECTRA

no code implementations5 Oct 2021 Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

We propose an ASR rescoring method for directly detecting errors with ELECTRA, which is originally a pre-training method for NLP tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring

no code implementations9 Sep 2021 Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.

Language Modelling Translation

VAD-free Streaming Hybrid CTC/Attention ASR for Unsegmented Recording

no code implementations15 Jul 2021 Hirofumi Inaguma, Tatsuya Kawahara

In this work, we propose novel decoding algorithms to enable streaming automatic speech recognition (ASR) on unsegmented long-form recordings without voice activity detection (VAD), based on monotonic chunkwise attention (MoChA) with an auxiliary connectionist temporal classification (CTC) objective.

Action Detection Activity Detection +3

ERICA: An Empathetic Android Companion for Covid-19 Quarantine

no code implementations SIGDIAL (ACL) 2021 Etsuko Ishii, Genta Indra Winata, Samuel Cahyawijaya, Divesh Lala, Tatsuya Kawahara, Pascale Fung

Over the past year, research in various domains, including Natural Language Processing (NLP), has been accelerated to fight against the COVID-19 pandemic, yet such research has just started on dialogue systems.

Alignment Knowledge Distillation for Online Streaming Attention-based Speech Recognition

no code implementations28 Feb 2021 Hirofumi Inaguma, Tatsuya Kawahara

We compare CTC-ST with several methods that distill alignment knowledge from a hybrid ASR system and show that the CTC-ST can achieve a comparable tradeoff of accuracy and latency without relying on external alignment information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Topic-relevant Response Generation using Optimal Transport for an Open-domain Dialog System

no code implementations COLING 2020 Shuying Zhang, Tianyu Zhao, Tatsuya Kawahara

The semantic constraint, which encourages a response to be semantically related to its context by regularizing the decoding objective function with semantic distance, is proposed.

Open-Domain Dialog Response Generation

Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder

no code implementations25 Oct 2020 Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

Fast inference speed is an important goal towards real-world deployment of speech translation (ST) systems.

Translation

Multi-Referenced Training for Dialogue Response Generation

1 code implementation SIGDIAL (ACL) 2021 Tianyu Zhao, Tatsuya Kawahara

In this work, we first analyze the training objective of dialogue models from the view of Kullback-Leibler divergence (KLD) and show that the gap between the real world probability distribution and the single-referenced data's probability distribution prevents the model from learning the one-to-many relations efficiently.

Response Generation

End-to-end Music-mixed Speech Recognition

1 code implementation27 Aug 2020 Jeongwoo Woo, Masato Mimura, Kazuyoshi Yoshii, Tatsuya Kawahara

The time-domain separation method outperformed a frequency-domain separation method, which reuses the phase information of the input mixture signal, both in simple cascading and joint training settings.

Audio and Speech Processing

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

1 code implementation9 Aug 2020 Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Experimental evaluations show that our method significantly improves the ASR performance from the seq2seq baseline on the Corpus of Spontaneous Japanese (CSJ).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Enhancing Monotonic Multihead Attention for Streaming ASR

1 code implementation19 May 2020 Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara

For streaming inference, all monotonic attention (MA) heads should learn proper alignments because the next token is not generated until all heads detect the corresponding token boundaries.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

CTC-synchronous Training for Monotonic Attention Model

1 code implementation10 May 2020 Hirofumi Inaguma, Masato Mimura, Tatsuya Kawahara

Monotonic chunkwise attention (MoChA) has been studied for the online streaming automatic speech recognition (ASR) based on a sequence-to-sequence framework.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

End-to-end speech-to-dialog-act recognition

no code implementations23 Apr 2020 Viet-Trung Dang, Tianyu Zhao, Sei Ueno, Hirofumi Inaguma, Tatsuya Kawahara

In the proposed model, the dialog act recognition network is conjunct with an acoustic-to-word ASR model at its latent layer before the softmax layer, which provides a distributed representation of word-level ASR decoding information.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Designing Precise and Robust Dialogue Response Evaluators

1 code implementation ACL 2020 Tianyu Zhao, Divesh Lala, Tatsuya Kawahara

Automatic dialogue response evaluator has been proposed as an alternative to automated metrics and human evaluation.

Semi-Supervised Multichannel Speech Enhancement With a Deep Speech Prior

1 code implementation IEEE/ACM Transactions on Audio, Speech, and Language Processing 2019 Kouhei Sekiguchi, Yoshiaki Bando, Aditya Arie Nugraha, Kazuyoshi Yoshii, Tatsuya Kawahara

To solve this problem, we replace a low-rank speech model with a deep generative speech model, i. e., formulate a probabilistic model of noisy speech by integrating a deep speech model, a low-rank noise model, and a full-rank or rank-1 model of spatial characteristics of speech and noise.

Speech Enhancement

Multilingual End-to-End Speech Translation

1 code implementation1 Oct 2019 Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Improving OOV Detection and Resolution with External Language Models in Acoustic-to-Word ASR

no code implementations22 Sep 2019 Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Moreover, the A2C model can be used to recover out-of-vocabulary (OOV) words that are not covered by the A2W model, but this requires accurate detection of OOV words.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Effective Incorporation of Speaker Information in Utterance Encoding in Dialog

no code implementations12 Jul 2019 Tianyu Zhao, Tatsuya Kawahara

In dialog studies, we often encode a dialog using a hierarchical encoder where each utterance is converted into an utterance vector, and then a sequence of utterance vectors is converted into a dialog vector.

Response Generation

Content Word-based Sentence Decoding and Evaluating for Open-domain Neural Response Generation

no code implementations31 May 2019 Tianyu Zhao, Shinsuke Mori, Tatsuya Kawahara

Various encoder-decoder models have been applied to response generation in open-domain dialogs, but a majority of conventional models directly learn a mapping from lexical input to lexical output without explicitly modeling intermediate representations.

Response Generation Sentence

Transfer learning of language-independent end-to-end ASR with language model fusion

no code implementations6 Nov 2018 Hirofumi Inaguma, Jaejin Cho, Murali Karthick Baskar, Tatsuya Kawahara, Shinji Watanabe

This work explores better adaptation methods to low-resource languages using an external language model (LM) under the framework of transfer learning.

Language Modelling Transfer Learning

Statistical Speech Enhancement Based on Probabilistic Integration of Variational Autoencoder and Non-Negative Matrix Factorization

no code implementations31 Oct 2017 Yoshiaki Bando, Masato Mimura, Katsutoshi Itoyama, Kazuyoshi Yoshii, Tatsuya Kawahara

This paper presents a statistical method of single-channel speech enhancement that uses a variational autoencoder (VAE) as a prior distribution on clean speech.

Speech Enhancement

Automatic Speech Recognition Errors as a Predictor of L2 Listening Difficulties

no code implementations WS 2016 Maryam Sadat Mirzaei, Kourosh Meshgi, Tatsuya Kawahara

To improve the choice of words in this system, and explore a better method to detect speech challenges, ASR errors were investigated as a model of the L2 listener, hypothesizing that some of these errors are similar to those of language learners{'} when transcribing the videos.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Cannot find the paper you are looking for? You can Submit a new open access paper.