Search Results for author: Hayato Futami

Found 14 papers, 3 papers with code

Phoneme-aware Encoding for Prefix-tree-based Contextual ASR

no code implementations • 15 Dec 2023 • Hayato Futami, Emiru Tsunoo, Yosuke Kashiwagi, Hiroaki Ogawa, Siddhant Arora, Shinji Watanabe

While the original TCPGen relies on grapheme-based encoding, we propose extending it with phoneme-aware encoding to better recognize words of unusual pronunciations.

speech-recognition Speech Recognition

Paper
Add Code

UniverSLU: Universal Spoken Language Understanding for Diverse Tasks with Natural Language Instructions

no code implementations • 4 Oct 2023 • Siddhant Arora, Hayato Futami, Jee-weon Jung, Yifan Peng, Roshan Sharma, Yosuke Kashiwagi, Emiru Tsunoo, Karen Livescu, Shinji Watanabe

Recent studies leverage large language models with multi-tasking capabilities, using natural language prompts to guide the model's behavior and surpassing performance of task-specific models.

Ranked #1 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Decoder-only Architecture for Speech Recognition with CTC Prompts and Text Data Augmentation

no code implementations • 16 Sep 2023 • Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

Because the decoder architecture is the same as an autoregressive LM, it is simple to enhance the model by leveraging external text data with LM training.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Integration of Frame- and Label-synchronous Beam Search for Streaming Encoder-decoder Speech Recognition

no code implementations • 24 Jul 2023 • Emiru Tsunoo, Hayato Futami, Yosuke Kashiwagi, Siddhant Arora, Shinji Watanabe

Although frame-based models, such as CTC and transducers, have an affinity for streaming automatic speech recognition, their decoding uses no future knowledge, which could lead to incorrect pruning.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

Integrating Pretrained ASR and LM to Perform Sequence Generation for Spoken Language Understanding

no code implementations • 20 Jul 2023 • Siddhant Arora, Hayato Futami, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe

There has been an increased interest in the integration of pretrained speech recognition (ASR) and language models (LM) into the SLU framework.

speech-recognition Speech Recognition +1

Paper
Add Code

Tensor decomposition for minimization of E2E SLU model toward on-device processing

no code implementations • 2 Jun 2023 • Yosuke Kashiwagi, Siddhant Arora, Hayato Futami, Jessica Huynh, Shih-Lun Wu, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe

We reduce the model size by applying tensor decomposition to the Conformer and E-Branchformer architectures used in our E2E SLU models.

speech-recognition Speech Recognition +2

Paper
Add Code

A Study on the Integration of Pipeline and E2E SLU systems for Spoken Semantic Parsing toward STOP Quality Challenge

no code implementations • 2 May 2023 • Siddhant Arora, Hayato Futami, Shih-Lun Wu, Jessica Huynh, Yifan Peng, Yosuke Kashiwagi, Emiru Tsunoo, Brian Yan, Shinji Watanabe

Recently there have been efforts to introduce new benchmark tasks for spoken language understanding (SLU), like semantic parsing.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge

no code implementations • 2 May 2023 • Hayato Futami, Jessica Huynh, Siddhant Arora, Shih-Lun Wu, Yosuke Kashiwagi, Yifan Peng, Brian Yan, Emiru Tsunoo, Shinji Watanabe

In the track, we adopt a pipeline approach of ASR and NLU.

Data Augmentation Domain Adaptation +2

Paper
Add Code

Joint Modelling of Spoken Language Understanding Tasks with Integrated Dialog History

no code implementations • 1 May 2023 • Siddhant Arora, Hayato Futami, Emiru Tsunoo, Brian Yan, Shinji Watanabe

Most human interactions occur in the form of spoken conversations where the semantic meaning of a given utterance depends on the context.

Spoken Language Understanding

Paper
Add Code

Streaming Joint Speech Recognition and Disfluency Detection

1 code implementation • 16 Nov 2022 • Hayato Futami, Emiru Tsunoo, Kentaro Shibata, Yosuke Kashiwagi, Takao Okuda, Siddhant Arora, Shinji Watanabe

In this study, we propose Transformer-based encoder-decoder models that jointly solve speech recognition and disfluency detection, which work in a streaming manner.

Language Modelling speech-recognition +1

Paper
Code

Non-autoregressive Error Correction for CTC-based ASR with Phone-conditioned Masked LM

1 code implementation • 8 Sep 2022 • Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Connectionist temporal classification (CTC) -based models are attractive in automatic speech recognition (ASR) because of their non-autoregressive nature.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Distilling the Knowledge of BERT for CTC-based ASR

no code implementations • 5 Sep 2022 • Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

In this study, we propose to distill the knowledge of BERT for CTC-based ASR, extending our previous study for attention-based ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

ASR Rescoring and Confidence Estimation with ELECTRA

no code implementations • 5 Oct 2021 • Hayato Futami, Hirofumi Inaguma, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

We propose an ASR rescoring method for directly detecting errors with ELECTRA, which is originally a pre-training method for NLP tasks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Distilling the Knowledge of BERT for Sequence-to-Sequence ASR

1 code implementation • 9 Aug 2020 • Hayato Futami, Hirofumi Inaguma, Sei Ueno, Masato Mimura, Shinsuke Sakai, Tatsuya Kawahara

Experimental evaluations show that our method significantly improves the ASR performance from the seq2seq baseline on the Corpus of Spontaneous Japanese (CSJ).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.