Search Results for author: Itai Gat

Found 20 papers, 8 papers with code

D-Flow: Differentiating through Flows for Controlled Generation

no code implementations21 Feb 2024 Heli Ben-Hamu, Omri Puny, Itai Gat, Brian Karrer, Uriel Singer, Yaron Lipman

Taming the generation outcome of state of the art Diffusion and Flow-Matching (FM) models without having to re-train a task-specific model unlocks a powerful tool for solving inverse problems, conditional generation, and controlled generation in general.

Masked Audio Generation using a Single Non-Autoregressive Transformer

no code implementations9 Jan 2024 Alon Ziv, Itai Gat, Gael Le Lan, Tal Remez, Felix Kreuk, Alexandre Défossez, Jade Copet, Gabriel Synnaeve, Yossi Adi

We introduce MAGNeT, a masked generative sequence modeling method that operates directly over several streams of audio tokens.

Audio Generation

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

1 code implementation28 Sep 2023 Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi

The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model.

Text-to-Video Generation Video Generation

Code Llama: Open Foundation Models for Code

2 code implementations24 Aug 2023 Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Romain Sauvestre, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Défossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Scialom, Gabriel Synnaeve

We release Code Llama, a family of large language models for code based on Llama 2 providing state-of-the-art performance among open models, infilling capabilities, support for large input contexts, and zero-shot instruction following ability for programming tasks.

16k Code Generation +1

EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis

no code implementations10 Aug 2023 Tu Anh Nguyen, Wei-Ning Hsu, Antony D'Avirro, Bowen Shi, Itai Gat, Maryam Fazel-Zarani, Tal Remez, Jade Copet, Gabriel Synnaeve, Michael Hassid, Felix Kreuk, Yossi Adi, Emmanuel Dupoux

Recent work has shown that it is possible to resynthesize high-quality speech based, not on text, but on low bitrate discrete units that have been learned in a self-supervised fashion and can therefore capture expressive aspects of speech that are hard to transcribe (prosody, voice styles, non-verbal vocalization).

Resynthesis Speech Synthesis

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

1 code implementation Interspeech 2023 Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz

In this paper, we propose a novel method utilizing latent diffusion models trained for text-to-image-generation to generate images conditioned on audio recordings.

audio-visual learning Text-to-Image Generation

Layer Collaboration in the Forward-Forward Algorithm

no code implementations21 May 2023 Guy Lorberbom, Itai Gat, Yossi Adi, Alex Schwing, Tamir Hazan

We show that the current version of the forward-forward algorithm is suboptimal when considering information flow in the network, resulting in a lack of collaboration between layers of the network.

On the Importance of Gradient Norm in PAC-Bayesian Bounds

no code implementations12 Oct 2022 Itai Gat, Yossi Adi, Alexander Schwing, Tamir Hazan

Generalization bounds which assess the difference between the true risk and the empirical risk, have been studied extensively.

Generalization Bounds

A Functional Information Perspective on Model Interpretation

1 code implementation12 Jun 2022 Itai Gat, Nitay Calderon, Roi Reichart, Tamir Hazan

This work suggests a theoretical framework for model interpretability by measuring the contribution of relevant features to the functional entropy of the network with respect to the input.

Towards a Common Speech Analysis Engine

no code implementations1 Mar 2022 Hagai Aronowitz, Itai Gat, Edmilson Morais, Weizhong Zhu, Ron Hoory

Beyond that, a common engine should be capable of supporting distributed training with client in-house private data.

Emotion Recognition Language Identification +1

Speech Emotion Recognition using Self-Supervised Features

no code implementations ICASSP 2022 Edmilson Morais, Ron Hoory, Weizhong Zhu, Itai Gat, Matheus Damasceno, Hagai Aronowitz

Self-supervised pre-trained features have consistently delivered state-of-art results in the field of natural language processing (NLP); however, their merits in the field of speech emotion recognition (SER) still need further investigation.

Speech Emotion Recognition

Latent Space Explanation by Intervention

no code implementations9 Dec 2021 Itai Gat, Guy Lorberbom, Idan Schwartz, Tamir Hazan

The success of deep neural nets heavily relies on their ability to encode complex relations between their input and their output.

Perceptual Score: What Data Modalities Does Your Model Perceive?

1 code implementation NeurIPS 2021 Itai Gat, Idan Schwartz, Alexander Schwing

To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i. e., modalities.

Question Answering Visual Dialog +1

Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions

no code implementations ACL 2021 Daniel Rosenberg, Itai Gat, Amir Feder, Roi Reichart

Deep learning algorithms have shown promising results in visual question answering (VQA) tasks, but a more careful look reveals that they often do not understand the rich signal they are being fed with.

Question Answering Visual Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.