1 code implementation • 28 Sep 2023 • Guy Yariv, Itai Gat, Sagie Benaim, Lior Wolf, Idan Schwartz, Yossi Adi
The proposed method is based on a lightweight adaptor network, which learns to map the audio-based representation to the input representation expected by the text-to-video generation model.
1 code implementation • Interspeech 2023 • Guy Yariv, Itai Gat, Lior Wolf, Yossi Adi, Idan Schwartz
In this paper, we propose a novel method utilizing latent diffusion models trained for text-to-image-generation to generate images conditioned on audio recordings.
1 code implementation • ICCV 2023 • Idan Schwartz, Vésteinn Snæbjarnarson, Hila Chefer, Ryan Cotterell, Serge Belongie, Lior Wolf, Sagie Benaim
This approach has two disadvantages: (i) supervised datasets are generally small compared to large-scale scraped text-image datasets on which text-to-image models are trained, affecting the quality and diversity of the generated images, or (ii) the input is a hard-coded label, as opposed to free-form text, limiting the control over the generated images.
1 code implementation • 21 Oct 2022 • Oded Hupert, Idan Schwartz, Lior Wolf
We seek to semantically describe a set of images, capturing both the attributes of single images and the variations within the set.
1 code implementation • 22 Jul 2022 • Yoad Tewel, Yoav Shalev, Roy Nadler, Idan Schwartz, Lior Wolf
We introduce a zero-shot video captioning method that employs two frozen networks: the GPT-2 language model and the CLIP image-text matching model.
1 code implementation • 2 Jun 2022 • Hila Chefer, Idan Schwartz, Lior Wolf
It has been observed that visual classification models often rely mostly on the image background, neglecting the foreground, which hurts their robustness to distribution changes.
Ranked #1 on Out-of-Distribution Generalization on ImageNet-W
no code implementations • 9 Dec 2021 • Itai Gat, Guy Lorberbom, Idan Schwartz, Tamir Hazan
The success of deep neural nets heavily relies on their ability to encode complex relations between their input and their output.
1 code implementation • CVPR 2022 • Yoad Tewel, Yoav Shalev, Idan Schwartz, Lior Wolf
While such models can provide a powerful score for matching and subsequent zero-shot tasks, they are not capable of generating caption given an image.
1 code implementation • NeurIPS 2021 • Itai Gat, Idan Schwartz, Alexander Schwing
To study and quantify this concern, we introduce the perceptual score, a metric that assesses the degree to which a model relies on the different subsets of the input features, i. e., modalities.
1 code implementation • 21 Oct 2021 • Ameen Ali, Idan Schwartz, Tamir Hazan, Lior Wolf
Traditionally video and text matching is done by learning a shared embedding space and the encoding of one modality is independent of the other.
no code implementations • 4 Aug 2021 • Tom Braude, Idan Schwartz, Alexander Schwing, Ariel Shamir
OIA models interactions between the sentence-corresponding image and important regions in other images of the sequence.
1 code implementation • NAACL 2021 • Idan Schwartz
However, the NDCG metric favors the usually applicable uncertain answers such as `I don't know.
Ranked #1 on Visual Dialog on VisDial v1.0 test-std
1 code implementation • NeurIPS 2020 • Itai Gat, Idan Schwartz, Alexander Schwing, Tamir Hazan
However, regularization with the functional entropy is challenging.
Ranked #3 on Visual Question Answering (VQA) on VQA-CP
1 code implementation • CVPR 2019 • Idan Schwartz, Alexander G. Schwing, Tamir Hazan
The recently proposed audio-visual scene-aware dialog task paves the way to a more data-driven way of learning virtual assistants, smart speakers and car navigation systems.
Ranked #1 on Scene-Aware Dialogue on AVSD
1 code implementation • CVPR 2019 • Idan Schwartz, Seunghak Yu, Tamir Hazan, Alexander Schwing
We address this issue and develop a general attention mechanism for visual dialog which operates on any number of data utilities.
Ranked #1 on Visual Dialog on VisDial v0.9 val
1 code implementation • NeurIPS 2017 • Idan Schwartz, Alexander G. Schwing, Tamir Hazan
The quest for algorithms that enable cognitive abilities is an important part of machine learning.