no code implementations • 3 Apr 2024 • Jaehyeon Kim, Keon Lee, Seungjun Chung, Jaewoong Cho
With the emergence of neural audio codecs, which encode multiple streams of discrete tokens from audio, large language models have recently gained attention as a promising approach for zero-shot Text-to-Speech (TTS) synthesis.