no code implementations • 20 Nov 2023 • Aishwarya Agarwal, Srikrishna Karanam, Tripti Shukla, Balaji Vasan Srinivasan
Another line of techniques expand the inversion space to learn multiple embeddings but they do this only along the layer dimension (e. g., one per layer of the DDPM model) or the timestep dimension (one for a set of timesteps in the denoising process), leading to suboptimal attribute disentanglement.
no code implementations • 1 Sep 2023 • K J Joseph, Prateksha Udhayanan, Tripti Shukla, Aishwarya Agarwal, Srikrishna Karanam, Koustava Goswami, Balaji Vasan Srinivasan
We hope our work would attract attention to this newly identified, pragmatic problem setting.
no code implementations • 26 Jun 2023 • Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan
Recent works in self-supervised learning have shown impressive results on single-object images, but they struggle to perform well on complex multi-object images as evidenced by their poor visual grounding.
no code implementations • ICCV 2023 • Aishwarya Agarwal, Srikrishna Karanam, K J Joseph, Apoorv Saxena, Koustava Goswami, Balaji Vasan Srinivasan
First, our attention segregation loss reduces the cross-attention overlap between attention maps of different concepts in the text prompt, thereby reducing the confusion/conflict among various concepts and the eventual capture of all concepts in the generated output.
no code implementations • NAACL 2021 • Hrituraj Singh, Anshul Nasery, Denil Mehta, Aishwarya Agarwal, Jatin Lamba, Balaji Vasan Srinivasan
In this paper, we propose a novel task - MIMOQA - Multimodal Input Multimodal Output Question Answering in which the output is also multimodal.