Search Results for author: Idan Schwartz

Found 16 papers, 14 papers with code

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

1 code implementation28 Sep 2023 Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi

The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model.

Text-to-Video Generation Video Generation

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

1 code implementation Interspeech 2023 Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz

In this paper, we propose a novel method utilizing latent diffusion models trained for text-to-image-generation to generate images conditioned on audio recordings.

audio-visual learning Text-to-Image Generation

Discriminative Class Tokens for Text-to-Image Diffusion Models

1 code implementation ICCV 2023 Idan Schwartz, Vésteinn Snæbjarnarson, Hila Chefer, Ryan Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim

This approach has two disadvantages: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, affecting the quality and diversity of the generated images, or (ii) the input is a hard-coded label, as opposed to free-form text, limiting the control over the generated images.

Describing Sets of Images with Textual-PCA

1 code implementation21 Oct 2022 Oded Hupert, Idan Schwartz, Lior Wolf

We seek to semantically describe a set of images, capturing both the attributes of single images and the variations within the set.

Semantic Similarity Semantic Textual Similarity

Zero-Shot Video Captioning with Evolving Pseudo-Tokens

1 code implementation22 Jul 2022 Yoad Tewel, Yoav Shalev, Roy Nadler, Idan Schwartz, Lior Wolf

We introduce a zero-shot video captioning method that employs two frozen networks: the GPT-2 language model and the CLIP image-text matching model.

Image Captioning Image-text matching +6

Optimizing Relevance Maps of Vision Transformers Improves Robustness

1 code implementation2 Jun 2022 Hila Chefer, Idan Schwartz, Lior Wolf

It has been observed that visual classification models often rely mostly on the image background, neglecting the foreground, which hurts their robustness to distribution changes.

Image Classification Out-of-Distribution Generalization

Latent Space Explanation by Intervention

no code implementations9 Dec 2021 Itai Gat, Guy Lorberbom, Idan Schwartz, Tamir Hazan

The success of deep neural nets heavily relies on their ability to encode complex relations between their input and their output.

ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

1 code implementation CVPR 2022 Yoad Tewel, Yoav Shalev, Idan Schwartz, Lior Wolf

While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of generating caption given an image.

Contrastive Learning Descriptive +6

Perceptual Score: What Data Modalities Does Your Model Perceive?

1 code implementation NeurIPS 2021 Itai Gat, Idan Schwartz, Alexander Schwing

To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i. e., modalities.

Question Answering Visual Dialog +1

Video and Text Matching with Conditioned Embeddings

1 code implementation21 Oct 2021 Ameen Ali, Idan Schwartz, Tamir Hazan, Lior Wolf

Traditionally video and text matching is done by learning a shared embedding space and the encoding of one modality is independent of the other.

Machine Translation Sentence +4

Ordered Attention for Coherent Visual Storytelling

no code implementations4 Aug 2021 Tom Braude, Idan Schwartz, Alexander Schwing, Ariel Shamir

OIA models interactions between the sentence-corresponding image and important regions in other images of the sequence.

Sentence Visual Storytelling

Ensemble of MRR and NDCG models for Visual Dialog

1 code implementation NAACL 2021 Idan Schwartz

However, the NDCG metric favors the usually applicable uncertain answers such as `I don't know.

Visual Dialog

A Simple Baseline for Audio-Visual Scene-Aware Dialog

1 code implementation CVPR 2019 Idan Schwartz, Alexander G. Schwing, Tamir Hazan

The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems.

Scene-Aware Dialogue

Factor Graph Attention

1 code implementation CVPR 2019 Idan Schwartz, Seunghak Yu, Tamir Hazan, Alexander Schwing

We address this issue and develop a general attention mechanism for visual dialog which operates on any number of data utilities.

Graph Attention Question Answering +2

Cannot find the paper you are looking for? You can Submit a new open access paper.