no code implementations • LREC 2022 • Nobukatsu Hojo, Satoshi Kobashikawa, Saki Mizuno, Ryo Masumura
To investigate SPNOP, a corpus with various psychological measurements is beneficial because the interaction process of negotiation relates to many aspects of psychology.
no code implementations • COLING 2022 • Mana Ihori, Hiroshi Sato, Tomohiro Tanaka, Ryo Masumura
To model the task, we design a novel Japanese multi-perspective document revision dataset that simultaneously handles seven perspectives to improve the readability and clarity of a document.
no code implementations • ICCV 2023 • Satoshi Suzuki, Shin'ya Yamaguchi, Shoichiro Takeda, Sekitoshi Kanai, Naoki Makishima, Atsushi Ando, Ryo Masumura
This paper addresses the tradeoff between standard accuracy on clean examples and robustness against adversarial examples in deep neural networks (DNNs).
no code implementations • 4 Jun 2023 • Ryo Masumura, Naoki Makishima, Taiga Yamane, Yoshihiko Yamazaki, Saki Mizuno, Mana Ihori, Mihiro Uchida, Keita Suzuki, Hiroshi Sato, Tomohiro Tanaka, Akihiko Takashima, Satoshi Suzuki, Takafumi Moriya, Nobukatsu Hojo, Atsushi Ando
Target-speaker ASR systems are a promising way to only transcribe a target speaker's speech by enrolling the target speaker's information.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 25 May 2023 • Takafumi Moriya, Takanori Ashihara, Hiroshi Sato, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura
Experiments in three datasets confirm that RNNT trained with our SS approach achieves the best ASR performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 25 May 2023 • Takafumi Moriya, Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Takanori Ashihara, Kohei Matsuura, Tomohiro Tanaka, Ryo Masumura, Atsunori Ogawa, Taichi Asami
Neural transducer (RNNT)-based target-speaker speech recognition (TS-RNNT) directly transcribes a target speaker's voice from a multi-talker mixture.
no code implementations • 24 May 2023 • Hiroshi Sato, Ryo Masumura, Tsubasa Ochiai, Marc Delcroix, Takafumi Moriya, Takanori Ashihara, Kentaro Shinayama, Saki Mizuno, Mana Ihori, Tomohiro Tanaka, Nobukatsu Hojo
In this work, we propose a new SE training criterion that minimizes the distance between clean and enhanced signals in the feature representation of the SSL model to alleviate the mismatch.
no code implementations • 2 Mar 2023 • Kohei Matsuura, Takanori Ashihara, Takafumi Moriya, Tomohiro Tanaka, Atsunori Ogawa, Marc Delcroix, Ryo Masumura
The first technique is to utilize a text-to-speech (TTS) system to generate synthesized speech, which is used for E2E SSum training with the text summary.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 28 Oct 2022 • Atsushi Ando, Ryo Masumura, Akihiko Takashima, Satoshi Suzuki, Naoki Makishima, Keita Suzuki, Takafumi Moriya, Takanori Ashihara, Hiroshi Sato
This paper investigates the effectiveness and implementation of modality-specific large-scale pre-trained encoders for multimodal sentiment analysis~(MSA).
no code implementations • 16 Jun 2022 • Hiroshi Sato, Tsubasa Ochiai, Marc Delcroix, Keisuke Kinoshita, Takafumi Moriya, Naoki Makishima, Mana Ihori, Tomohiro Tanaka, Ryo Masumura
Experimental validation reveals the effectiveness of both worst-enrollment target training and SI-loss training to improve robustness against enrollment variations, by increasing speaker discriminability.
no code implementations • 21 Feb 2022 • Yoshihiro Yamazaki, Shota Orihashi, Ryo Masumura, Mihiro Uchida, Akihiko Takashima
There have been many attempts to build multimodal dialog systems that can respond to a question about given audio-visual information, and the representative task for such systems is the Audio Visual Scene-Aware Dialog (AVSD).
no code implementations • 24 Nov 2021 • Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura
To this end, the proposed method pre-trains the encoder by using a multilingual dataset that combines the resource-poor language's dataset and the resource-rich language's dataset to learn language-invariant knowledge for scene text recognition.
no code implementations • 22 Nov 2021 • Shota Orihashi, Yoshihiro Yamazaki, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Ryo Masumura
Dialogue sequence labeling is a supervised learning task that estimates labels for each utterance in the target dialogue document, and is useful for many applications such as dialogue act estimation.
no code implementations • 7 Jul 2021 • Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Shota Orihashi, Naoki Makishima
We propose a semi-supervised learning method for building end-to-end rich transcription-style automatic speech recognition (RT-ASR) systems from small-scale rich transcription-style and large-scale common transcription-style datasets.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 4 Jul 2021 • Ryo Masumura, Daiki Okamura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi
To address this problem, we propose unified autoregressive modeling for joint end-to-end multi-talker overlapped ASR and speaker attribute estimation.
no code implementations • 4 Jul 2021 • Tomohiro Tanaka, Ryo Masumura, Mana Ihori, Akihiko Takashima, Takafumi Moriya, Takanori Ashihara, Shota Orihashi, Naoki Makishima
However, the conventional method cannot take into account the relationships between these two different modal inputs because the input contexts are separately encoded for each modal.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 23 Jun 2021 • Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura
To execute multiple conversion tasks simultaneously without preparing matched datasets, our key idea is to distinguish individual conversion tasks using the on-off switch.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 2 Mar 2021 • Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi, Ryo Masumura
We present an audio-visual speech separation learning method that considers the correspondence between the separated signals and the visual signals to reflect the speech characteristics during training.
no code implementations • 16 Feb 2021 • Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi
We evaluate the effectiveness of the proposed model and proposed training method on Japanese discourse ASR tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 16 Feb 2021 • Ryo Masumura, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Takanori Ashihara
We also show that combining DML with the existing training techniques effectively improves ASR performance.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 16 Feb 2021 • Ryo Masumura, Naoki Makishima, Mana Ihori, Akihiko Takashima, Tomohiro Tanaka, Shota Orihashi
This paper presents a novel self-supervised learning method for handling conversational documents consisting of transcribed text of human-to-human conversations.
no code implementations • 15 Feb 2021 • Mana Ihori, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi, Ryo Masumura
However, these models require a large amount of paired data of spoken-style text and style normalized text, and it is difficult to prepare such a volume of data.
no code implementations • INLG (ACL) 2020 • Mana Ihori, Ryo Masumura, Naoki Makishima, Tomohiro Tanaka, Akihiko Takashima, Shota Orihashi
Thus, it is important to leverage memorized knowledge in the external LM for building the seq2seq model, since it is hard to prepare a large amount of paired data.
no code implementations • 1 Jul 2020 • Yuma Koizumi, Ryo Masumura, Kyosuke Nishida, Masahiro Yasuda, Shoichiro Saito
TRACKE estimates keywords, which comprise a word set corresponding to audio events/scenes in the input audio, and generates the caption while referring to the estimated keywords to reduce word-selection indeterminacy.
no code implementations • LREC 2020 • Mana Ihori, Akihiko Takashima, Ryo Masumura
Therefore, we created a new Japanese parallel corpus of spoken-style text and written-style text that can simultaneously handle general problems and Japanese-specific ones.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • LREC 2020 • Takashi Kodama, Ryuichiro Higashinaka, Koh Mitsuda, Ryo Masumura, Yushi Aono, Ryuta Nakamura, Noritake Adachi, Hidetoshi Kawabata
This paper concerns the problem of realizing consistent personalities in neural conversational modeling by using user generated question-answer pairs as training data.
no code implementations • LREC 2020 • Yuki Yamashita, Tomoki Koriyama, Yuki Saito, Shinnosuke Takamichi, Yusuke Ijima, Ryo Masumura, Hiroshi Saruwatari
In this paper, we investigate the effectiveness of using rich annotations in deep neural network (DNN)-based statistical speech synthesis.
no code implementations • EMNLP 2018 • Ryo Masumura, Yusuke Shinohara, Ryuichiro Higashinaka, Yushi Aono
This is achieved by introducing both language-specific networks shared among different tasks and task-specific networks shared among different languages.
no code implementations • COLING 2018 • Ryo Masumura, Tomohiro Tanaka, Ryuichiro Higashinaka, Hirokazu Masataki, Yushi Aono
In addition, in order to effectively transfer knowledge between different task data sets and different language data sets, this paper proposes a partially-shared modeling method that possesses both shared components and components specific to individual data sets.
no code implementations • WS 2018 • Ryo Masumura, Tomohiro Tanaka, Atsushi Ando, Ryo Ishii, Ryuichiro Higashinaka, Yushi Aono
This paper proposes a fully neural network based dialogue-context online end-of-turn detection method that can utilize long-range interactive information extracted from both speaker{'}s utterances and collocutor{'}s utterances.
no code implementations • IJCNLP 2017 • Ryo Masumura, Taichi Asami, Hirokazu Masataki, Kugatsu Sadamitsu, Kyosuke Nishida, Ryuichiro Higashinaka
In addition, this paper reveals relationships between hyperspherical QLMs and conventional QLMs.
no code implementations • IJCNLP 2017 • Itsumi Saito, Jun Suzuki, Kyosuke Nishida, Kugatsu Sadamitsu, Satoshi Kobashikawa, Ryo Masumura, Yuji Matsumoto, Junji Tomita
In this study, we investigated the effectiveness of augmented data for encoder-decoder-based neural normalization models.