Search Results for author: Kei Sawada

Found 10 papers, 0 papers with code

Release of Pre-Trained Models for the Japanese Language

no code implementations2 Apr 2024 Kei Sawada, Tianyu Zhao, Makoto Shing, Kentaro Mitsui, Akio Kaga, Yukiya Hono, Toshiaki Wakatsuki, Koh Mitsuda

AI democratization aims to create a world in which the average person can utilize AI techniques.

An Integration of Pre-Trained Speech and Language Models for End-to-End Speech Recognition

no code implementations6 Dec 2023 Yukiya Hono, Koh Mitsuda, Tianyu Zhao, Kentaro Mitsui, Toshiaki Wakatsuki, Kei Sawada

Advances in machine learning have made it possible to perform various text and speech processing tasks, including automatic speech recognition (ASR), in an end-to-end (E2E) manner.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Towards human-like spoken dialogue generation between AI agents from written dialogue

no code implementations2 Oct 2023 Kentaro Mitsui, Yukiya Hono, Kei Sawada

The advent of large language models (LLMs) has made it possible to generate natural written dialogues between two agents.

Dialogue Generation

Focused Prefix Tuning for Controllable Text Generation

no code implementations1 Jun 2023 Congda Ma, Tianyu Zhao, Makoto Shing, Kei Sawada, Manabu Okumura

In a controllable text generation dataset, there exist unannotated attributes that could provide irrelevant learning signals to models that use it for training and thus degrade their performance.

Attribute Text Generation

UniFLG: Unified Facial Landmark Generator from Text or Speech

no code implementations28 Feb 2023 Kentaro Mitsui, Yukiya Hono, Kei Sawada

The two primary frameworks used for talking face generation comprise a text-driven framework, which generates synchronized speech and talking faces from text, and a speech-driven framework, which generates talking faces from speech.

Speech Synthesis Talking Face Generation

Text-Guided Scene Sketch-to-Photo Synthesis

no code implementations14 Feb 2023 AprilPyone MaungMaung, Makoto Shing, Kentaro Mitsui, Kei Sawada, Fumio Okura

To this end, we leverage knowledge from recent large-scale pre-trained generative models, resulting in text-guided sketch-to-photo synthesis without the need for reference images.

Self-Supervised Learning

MSR-NV: Neural Vocoder Using Multiple Sampling Rates

no code implementations28 Sep 2021 Kentaro Mitsui, Kei Sawada

In this study, we propose a method to handle multiple sampling rates in a single NV, called the MSR-NV.

Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis

no code implementations17 Sep 2020 Yukiya Hono, Kazuna Tsuboi, Kei Sawada, Kei Hashimoto, Keiichiro Oura, Yoshihiko Nankaku, Keiichi Tokuda

This framework consists of a multi-grained variational autoencoder, a conditional prior, and a multi-level auto-regressive latent converter to obtain the different time-resolution latent variables and sample the finer-level latent variables from the coarser-level ones by taking into account the input text.

Expressive Speech Synthesis Text-To-Speech Synthesis

Dance Revolution: Long-Term Dance Generation with Music via Curriculum Learning

no code implementations ICLR 2021 Ruozi Huang, Huang Hu, Wei Wu, Kei Sawada, Mi Zhang, Daxin Jiang

In this paper, we formalize the music-conditioned dance generation as a sequence-to-sequence learning problem and devise a novel seq2seq architecture to efficiently process long sequences of music features and capture the fine-grained correspondence between music and dance.

Motion Synthesis Pose Estimation

Cannot find the paper you are looking for? You can Submit a new open access paper.