Search Results for author: Itai Gat

Found 20 papers, 8 papers with code

D-Flow: Differentiating through Flows for Controlled Generation

no code implementations • 21 Feb 2024 • Heli Ben-Hamu, Omri Puny, Itai Gat, Brian Karrer, Uriel Singer, Yaron Lipman

Taming the generation outcome of state of the art Diffusion and Flow-Matching (FM) models without having to re-train a task-specific model unlocks a powerful tool for solving inverse problems, conditional generation, and controlled generation in general.

Paper
Add Code

SpiRit-LM: Interleaved Spoken and Written Language Model

no code implementations • 8 Feb 2024 • Tu Anh Nguyen, Benjamin Muller, Bokai Yu, Marta R. Costa-Jussa, Maha Elbayad, Sravya Popuri, Paul-Ambroise Duquenne, Robin Algayres, Ruslan Mavlyutov, Itai Gat, Gabriel Synnaeve, Juan Pino, Benoit Sagot, Emmanuel Dupoux

We introduce SPIRIT-LM, a foundation multimodal language model that freely mixes text and speech.

Language Modelling

Paper
Add Code

Masked Audio Generation using a Single Non-Autoregressive Transformer

no code implementations • 9 Jan 2024 • Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens.

Audio Generation

Paper
Add Code

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

1 code implementation • 28 Sep 2023 • Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi

The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model.

Text-to-Video Generation Video Generation

Paper
Code

Code Llama: Open Foundation Models for Code

2 code implementations • 24 Aug 2023 • Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

Ranked #27 on Code Generation on MBPP

16k Code Generation +1

15,009

Paper
Code

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

no code implementations • 10 Aug 2023 • Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux

Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization).

Resynthesis Speech Synthesis

Paper
Add Code

Simple and Controllable Music Generation

2 code implementations • NeurIPS 2023 • Jade Copet, Felix Kreuk, Itai Gat, Tal Remez, David Kant, Gabriel Synnaeve, Yossi Adi, Alexandre Défossez

We tackle the task of conditional music generation.

Ranked #4 on Text-to-Music Generation on MusicCaps

Language Modelling Music Generation +1

19,653

Paper
Code

Textually Pretrained Speech Language Models

1 code implementation • NeurIPS 2023 • Michael Hassid, Tal Remez, Tu Anh Nguyen, Itai Gat, Alexis Conneau, Felix Kreuk, Jade Copet, Alexandre Defossez, Gabriel Synnaeve, Emmanuel Dupoux, Roy Schwartz, Yossi Adi

In this work, we propose TWIST, a method for training SpeechLMs using a warm-start from a pretrained textual language models.

Paper
Code

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

1 code implementation • Interspeech 2023 • Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz

In this paper, we propose a novel method utilizing latent diffusion models trained for text-to-image-generation to generate images conditioned on audio recordings.

audio-visual learning Text-to-Image Generation

Paper
Code

Layer Collaboration in the Forward-Forward Algorithm

no code implementations • 21 May 2023 • Guy Lorberbom, Itai Gat, Yossi Adi, Alex Schwing, Tamir Hazan

We show that the current version of the forward-forward algorithm is suboptimal when considering information flow in the network, resulting in a lack of collaboration between layers of the network.

Paper
Add Code

On the Importance of Gradient Norm in PAC-Bayesian Bounds

no code implementations • 12 Oct 2022 • Itai Gat, Yossi Adi, Alexander Schwing, Tamir Hazan

Generalization bounds which assess the difference between the true risk and the empirical risk, have been studied extensively.

Generalization Bounds

Paper
Add Code

Augmentation Invariant Discrete Representation for Generative Spoken Language Modeling

no code implementations • 30 Sep 2022 • Itai Gat, Felix Kreuk, Tu Anh Nguyen, Ann Lee, Jade Copet, Gabriel Synnaeve, Emmanuel Dupoux, Yossi Adi

This work focuses on improving the robustness of discrete input representations for generative spoken language modeling.

Language Modelling Speech-to-Speech Translation

Paper
Add Code

A Functional Information Perspective on Model Interpretation

1 code implementation • 12 Jun 2022 • Itai Gat, Nitay Calderon, Roi Reichart, Tamir Hazan

This work suggests a theoretical framework for model interpretability by measuring the contribution of relevant features to the functional entropy of the network with respect to the input.

Paper
Code

Towards a Common Speech Analysis Engine

no code implementations • 1 Mar 2022 • Hagai Aronowitz, Itai Gat, Edmilson Morais, Weizhong Zhu, Ron Hoory

Beyond that, a common engine should be capable of supporting distributed training with client in-house private data.

Emotion Recognition Language Identification +1

Paper
Add Code

Speech Emotion Recognition using Self-Supervised Features

no code implementations • ICASSP 2022 • Edmilson Morais, Ron Hoory, Weizhong Zhu, Itai Gat, Matheus Damasceno, Hagai Aronowitz

Self-supervised pre-trained features have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of speech emotion recognition (SER) still need further investigation.

Speech Emotion Recognition

Paper
Add Code

Speaker Normalization for Self-supervised Speech Emotion Recognition

no code implementations • 2 Feb 2022 • Itai Gat, Hagai Aronowitz, Weizhong Zhu, Edmilson Morais, Ron Hoory

Large speech emotion recognition datasets are hard to obtain, and small datasets may contain biases.

Ranked #1 on Speech Emotion Recognition on IEMOCAP (AUC metric)

Speech Emotion Recognition

Paper
Add Code

Latent Space Explanation by Intervention

no code implementations • 9 Dec 2021 • Itai Gat, Guy Lorberbom, Idan Schwartz, Tamir Hazan

The success of deep neural nets heavily relies on their ability to encode complex relations between their input and their output.

Paper
Add Code

Perceptual Score: What Data Modalities Does Your Model Perceive?

1 code implementation • NeurIPS 2021 • Itai Gat, Idan Schwartz, Alexander Schwing

To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i. e., modalities.

Question Answering Visual Dialog +1

Paper
Code

Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

no code implementations • ACL 2021 • Daniel Rosenberg, Itai Gat, Amir Feder, Roi Reichart

Deep learning algorithms have shown promising results in visual question answering (VQA) tasks, but a more careful look reveals that they often do not understand the rich signal they are being fed with.

Question Answering Visual Question Answering

Paper
Add Code

Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies

1 code implementation • NeurIPS 2020 • Itai Gat, Idan Schwartz, Alexander Schwing, Tamir Hazan

However, regularization with the functional entropy is challenging.

Ranked #3 on Visual Question Answering (VQA) on VQA-CP

Question Answering Visual Question Answering

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.