no code implementations • 9 Jan 2024 • Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi
We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens.
no code implementations • 1 Nov 2023 • Ernie Chang, Pin-Jie Lin, Yang Li, Sidd Srinivasan, Gael Le Lan, David Kant, Yangyang Shi, Forrest Iandola, Vikas Chandra
We show that the framework enhanced the audio quality across the set of collected user prompts, which were edited with reference to the training captions as exemplars.
no code implementations • 19 Sep 2023 • Xinhao Mei, Varun Nagaraja, Gael Le Lan, Zhaoheng Ni, Ernie Chang, Yangyang Shi, Vikas Chandra
A prevalent problem in V2A generation is the misalignment of generated audio with the visible actions in the video.
no code implementations • 19 Sep 2023 • Zhaoheng Ni, Sravya Popuri, Ning Dong, Kohei Saijo, Xiaohui Zhang, Gael Le Lan, Yangyang Shi, Vikas Chandra, Changhan Wang
High-quality and intelligible speech is essential to text-to-speech (TTS) model training, however, obtaining high-quality data for low-resource languages is challenging and expensive.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 15 Sep 2023 • Yangyang Shi, Gael Le Lan, Varun Nagaraja, Zhaoheng Ni, Xinhao Mei, Ernie Chang, Forrest Iandola, Yang Liu, Vikas Chandra
This paper presents an innovative approach to enhance control over audio generation by emphasizing the alignment between audio and text representations during model training.
no code implementations • 15 Sep 2023 • Gael Le Lan, Varun Nagaraja, Ernie Chang, David Kant, Zhaoheng Ni, Yangyang Shi, Forrest Iandola, Vikas Chandra
In language modeling based music generation, a generated waveform is represented by a sequence of hierarchical token stacks that can be decoded either in an auto-regressive manner or in parallel, depending on the codebook patterns.
no code implementations • 14 Mar 2022 • Giuseppe Stragapede, Ruben Vera-Rodriguez, Ruben Tolosana, Aythami Morales, Alejandro Acien, Gael Le Lan
Current mobile user authentication systems based on PIN codes, fingerprint, and face recognition have several shortcomings.