no code implementations • 6 Sep 2023 • Hyungseob Lim, Kyungguen Byun, Sunkuk Moon, Erik Visser
Finally, content information extracted from the source speech and content-dependent target style embeddings are fed into a diffusion-based decoder to generate the converted speech mel-spectrogram.