Search Results for author: Aishwarya Agarwal

Found 5 papers, 0 papers with code

An Image is Worth Multiple Words: Multi-attribute Inversion for Constrained Text-to-Image Synthesis

no code implementations • 20 Nov 2023 • Aishwarya Agarwal, Srikrishna Karanam, Tripti Shukla, Balaji Vasan Srinivasan

Another line of techniques expand the inversion space to learn multiple embeddings but they do this only along the layer dimension (e. g., one per layer of the DDPM model) or the timestep dimension (one for a set of timesteps in the denoising process), leading to suboptimal attribute disentanglement.

Attribute Denoising +2

Paper
Add Code

Iterative Multi-granular Image Editing using Diffusion Models

no code implementations • 1 Sep 2023 • K J Joseph, Prateksha Udhayanan, Tripti Shukla, Aishwarya Agarwal, Srikrishna Karanam, Koustava Goswami, Balaji Vasan Srinivasan

We hope our work would attract attention to this newly identified, pragmatic problem setting.

Image Generation

Paper
Add Code

Learning with Difference Attention for Visually Grounded Self-supervised Representations

no code implementations • 26 Jun 2023 • Aishwarya Agarwal, Srikrishna Karanam, Balaji Vasan Srinivasan

Recent works in self-supervised learning have shown impressive results on single-object images, but they struggle to perform well on complex multi-object images as evidenced by their poor visual grounding.

Self-Supervised Learning Visual Grounding

Paper
Add Code

A-STAR: Test-time Attention Segregation and Retention for Text-to-image Synthesis

no code implementations • ICCV 2023 • Aishwarya Agarwal, Srikrishna Karanam, K J Joseph, Apoorv Saxena, Koustava Goswami, Balaji Vasan Srinivasan

First, our attention segregation loss reduces the cross-attention overlap between attention maps of different concepts in the text prompt, thereby reducing the confusion/conflict among various concepts and the eventual capture of all concepts in the generated output.

Denoising Image Generation

Paper
Add Code

MIMOQA: Multimodal Input Multimodal Output Question Answering

no code implementations • NAACL 2021 • Hrituraj Singh, Anshul Nasery, Denil Mehta, Aishwarya Agarwal, Jatin Lamba, Balaji Vasan Srinivasan

In this paper, we propose a novel task - MIMOQA - Multimodal Input Multimodal Output Question Answering in which the output is also multimodal.

Question Answering Visual Question Answering

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.