Search Results for author: Idan Schwartz

Found 16 papers, 14 papers with code

Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation

1 code implementation • 28 Sep 2023 • Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi

The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model.

Text-to-Video Generation Video Generation

Paper
Code

AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation

1 code implementation • Interspeech 2023 • Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz

In this paper, we propose a novel method utilizing latent diffusion models trained for text-to-image-generation to generate images conditioned on audio recordings.

audio-visual learning Text-to-Image Generation

Paper
Code

Discriminative Class Tokens for Text-to-Image Diffusion Models

1 code implementation • ICCV 2023 • Idan Schwartz, Vésteinn Snæbjarnarson, Hila Chefer, Ryan Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim

This approach has two disadvantages: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, affecting the quality and diversity of the generated images, or (ii) the input is a hard-coded label, as opposed to free-form text, limiting the control over the generated images.

Paper
Code

Describing Sets of Images with Textual-PCA

1 code implementation • 21 Oct 2022 • Oded Hupert, Idan Schwartz, Lior Wolf

We seek to semantically describe a set of images, capturing both the attributes of single images and the variations within the set.

Paper
Code

Zero-Shot Video Captioning with Evolving Pseudo-Tokens

1 code implementation • 22 Jul 2022 • Yoad Tewel, Yoav Shalev, Roy Nadler, Idan Schwartz, Lior Wolf

We introduce a zero-shot video captioning method that employs two frozen networks: the GPT-2 language model and the CLIP image-text matching model.

Image Captioning Image-text matching +6

Paper
Code

Optimizing Relevance Maps of Vision Transformers Improves Robustness

1 code implementation • 2 Jun 2022 • Hila Chefer, Idan Schwartz, Lior Wolf

It has been observed that visual classification models often rely mostly on the image background, neglecting the foreground, which hurts their robustness to distribution changes.

Ranked #1 on Out-of-Distribution Generalization on ImageNet-W

Image Classification Out-of-Distribution Generalization

121

Paper
Code

Latent Space Explanation by Intervention

no code implementations • 9 Dec 2021 • Itai Gat, Guy Lorberbom, Idan Schwartz, Tamir Hazan

The success of deep neural nets heavily relies on their ability to encode complex relations between their input and their output.

Paper
Add Code

ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic

1 code implementation • CVPR 2022 • Yoad Tewel, Yoav Shalev, Idan Schwartz, Lior Wolf

While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of generating caption given an image.

Contrastive Learning Descriptive +6

249

Paper
Code

Perceptual Score: What Data Modalities Does Your Model Perceive?

1 code implementation • NeurIPS 2021 • Itai Gat, Idan Schwartz, Alexander Schwing

To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i. e., modalities.

Question Answering Visual Dialog +1

Paper
Code

Video and Text Matching with Conditioned Embeddings

1 code implementation • 21 Oct 2021 • Ameen Ali, Idan Schwartz, Tamir Hazan, Lior Wolf

Traditionally video and text matching is done by learning a shared embedding space and the encoding of one modality is independent of the other.

Ranked #1 on Video-Guided Machine Translation on VATEX English-to-Chinese

Machine Translation Sentence +4

Paper
Code

Ordered Attention for Coherent Visual Storytelling

no code implementations • 4 Aug 2021 • Tom Braude, Idan Schwartz, Alexander Schwing, Ariel Shamir

OIA models interactions between the sentence-corresponding image and important regions in other images of the sequence.

Sentence Visual Storytelling

Paper
Add Code

Ensemble of MRR and NDCG models for Visual Dialog

1 code implementation • NAACL 2021 • Idan Schwartz

However, the NDCG metric favors the usually applicable uncertain answers such as `I don't know.

Ranked #1 on Visual Dialog on VisDial v1.0 test-std

Visual Dialog

Paper
Code

Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies

1 code implementation • NeurIPS 2020 • Itai Gat, Idan Schwartz, Alexander Schwing, Tamir Hazan

However, regularization with the functional entropy is challenging.

Ranked #3 on Visual Question Answering (VQA) on VQA-CP

Question Answering Visual Question Answering

Paper
Code

A Simple Baseline for Audio-Visual Scene-Aware Dialog

1 code implementation • CVPR 2019 • Idan Schwartz, Alexander G. Schwing, Tamir Hazan

The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems.

Ranked #1 on Scene-Aware Dialogue on AVSD