no code implementations • 21 Feb 2024 • Heli Ben-Hamu, Omri Puny, Itai Gat, Brian Karrer, Uriel Singer, Yaron Lipman
Taming the generation outcome of state of the art Diffusion and Flow-Matching (FM) models without having to re-train a task-specific model unlocks a powerful tool for solving inverse problems, conditional generation, and controlled generation in general.
no code implementations • 8 Feb 2024 • Tu Anh Nguyen, Benjamin Muller, Bokai Yu, Marta R. Costa-Jussa, Maha Elbayad, Sravya Popuri, Paul-Ambroise Duquenne, Robin Algayres, Ruslan Mavlyutov, Itai Gat, Gabriel Synnaeve, Juan Pino, Benoit Sagot, Emmanuel Dupoux
We introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech.
no code implementations • 9 Jan 2024 • Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi
We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens.
1 code implementation • 28 Sep 2023 • Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi
The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model.
2 code implementations • 24 Aug 2023 • Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve
We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.
Ranked #27 on Code Generation on MBPP
no code implementations • 10 Aug 2023 • Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux
Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization).
2 code implementations • NeurIPS 2023 • Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez
We tackle the task of conditional music generation.
Ranked #4 on Text-to-Music Generation on MusicCaps
1 code implementation • NeurIPS 2023 • Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi
In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models.
1 code implementation • Interspeech 2023 • Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz
In this paper, we propose a novel method utilizing latent diffusion models trained for text-to-image-generation to generate images conditioned on audio recordings.
no code implementations • 21 May 2023 • Guy Lorberbom, Itai Gat, Yossi Adi, Alex Schwing, Tamir Hazan
We show that the current version of the forward-forward algorithm is suboptimal when considering information flow in the network, resulting in a lack of collaboration between layers of the network.
no code implementations • 12 Oct 2022 • Itai Gat, Yossi Adi, Alexander Schwing, Tamir Hazan
Generalization bounds which assess the difference between the true risk and the empirical risk, have been studied extensively.
no code implementations • 30 Sep 2022 • Itai Gat, Felix Kreuk, Tu Anh Nguyen, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, Yossi Adi
This work focuses on improving the robustness of discrete input representations for generative spoken language modeling.
1 code implementation • 12 Jun 2022 • Itai Gat, Nitay Calderon, Roi Reichart, Tamir Hazan
This work suggests a theoretical framework for model interpretability by measuring the contribution of relevant features to the functional entropy of the network with respect to the input.
no code implementations • 1 Mar 2022 • Hagai Aronowitz, Itai Gat, Edmilson Morais, Weizhong Zhu, Ron Hoory
Beyond that, a common engine should be capable of supporting distributed training with client in-house private data.
no code implementations • ICASSP 2022 • Edmilson Morais, Ron Hoory, Weizhong Zhu, Itai Gat, Matheus Damasceno, Hagai Aronowitz
Self-supervised pre-trained features have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of speech emotion recognition (SER) still need further investigation.
no code implementations • 2 Feb 2022 • Itai Gat, Hagai Aronowitz, Weizhong Zhu, Edmilson Morais, Ron Hoory
Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases.
Ranked #1 on Speech Emotion Recognition on IEMOCAP (AUC metric)
no code implementations • 9 Dec 2021 • Itai Gat, Guy Lorberbom, Idan Schwartz, Tamir Hazan
The success of deep neural nets heavily relies on their ability to encode complex relations between their input and their output.
1 code implementation • NeurIPS 2021 • Itai Gat, Idan Schwartz, Alexander Schwing
To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i. e., modalities.
no code implementations • ACL 2021 • Daniel Rosenberg, Itai Gat, Amir Feder, Roi Reichart
Deep learning algorithms have shown promising results in visual question answering (VQA) tasks, but a more careful look reveals that they often do not understand the rich signal they are being fed with.
1 code implementation • NeurIPS 2020 • Itai Gat, Idan Schwartz, Alexander Schwing, Tamir Hazan
However, regularization with the functional entropy is challenging.
Ranked #3 on Visual Question Answering (VQA) on VQA-CP