no code implementations • 28 Feb 2023 • Wonwoong Cho, Hareesh Ravi, Midhun Harikumar, Vinh Khuc, Krishna Kumar Singh, Jingwan Lu, David I. Inouye, Ajinkya Kale
We rely on the inductive bias of the progressive denoising process of diffusion models to encode pose/layout information in the spatial structure mask and semantic/style information in the style code.
no code implementations • 23 Feb 2023 • Pranav Aggarwal, Hareesh Ravi, Naveen Marri, Sachin Kelkar, Fengbin Chen, Vinh Khuc, Midhun Harikumar, Ritiz Tambi, Sudharshan Reddy Kakumanu, Purvak Lapsiya, Alvin Ghouas, Sarah Saber, Malavika Ramprasad, Baldo Faieta, Ajinkya Kale
We observe that Diffusion Prior can be used in a memory and compute efficient way to constrain the generation to a specific domain without altering the larger Diffusion Decoder.
no code implementations • 15 Feb 2023 • Hareesh Ravi, Sachin Kelkar, Midhun Harikumar, Ajinkya Kale
We combine this with structure preserving edits on the image decoder using existing approaches such as reverse DDIM to perform text guided image editing.
1 code implementation • CVPR 2023 • Qiucheng Wu, Yujian Liu, Handong Zhao, Ajinkya Kale, Trung Bui, Tong Yu, Zhe Lin, Yang Zhang, Shiyu Chang
Based on this finding, we further propose a simple, light-weight image editing algorithm where the mixing weights of the two text embeddings are optimized for style matching and content preservation.
1 code implementation • Findings (NAACL) 2022 • Jaemin Cho, Seunghyun Yoon, Ajinkya Kale, Franck Dernoncourt, Trung Bui, Mohit Bansal
Toward more descriptive and distinctive caption generation, we propose using CLIP, a multimodal encoder trained on huge image-text pairs from web, to calculate multimodal similarity and use it as a reward function.
Ranked #26 on Image Captioning on COCO Captions
no code implementations • 10 Mar 2022 • Dan Ruta, Andrew Gilbert, Pranav Aggarwal, Naveen Marri, Ajinkya Kale, Jo Briggs, Chris Speed, Hailin Jin, Baldo Faieta, Alex Filipkowski, Zhe Lin, John Collomosse
We present StyleBabel, a unique open access dataset of natural language captions and free-form tags describing the artistic style of over 135K digital artworks, collected via a novel participatory method from experts studying at specialist art and design schools.
no code implementations • CVPR 2022 • Haoyu Ma, Handong Zhao, Zhe Lin, Ajinkya Kale, Zhangyang Wang, Tong Yu, Jiuxiang Gu, Sunav Choudhary, Xiaohui Xie
recommendation, and marketing services.
2 code implementations • 15 Sep 2021 • Pranav Aggarwal, Ritiz Tambi, Ajinkya Kale
There has been a recent spike in interest in multi-modal Language and Vision problems.
no code implementations • CVPR 2021 • Xin Yuan, Zhe Lin, Jason Kuen, Jianming Zhang, Yilin Wang, Michael Maire, Ajinkya Kale, Baldo Faieta
We first train our model on COCO and evaluate the learned visual representations on various downstream tasks including image classification, object detection, and instance segmentation.
1 code implementation • 24 Nov 2020 • Pranav Aggarwal, Ajinkya Kale
There has been a recent spike in interest in multi-modal Language and Vision problems.
no code implementations • 4 Oct 2020 • Aashish Kumar Misraa, Ajinkya Kale, Pranav Aggarwal, Ali Aminian
Most real world applications of image retrieval such as Adobe Stock, which is a marketplace for stock photography and illustrations, need a way for users to find images which are both visually (i. e. aesthetically) and conceptually (i. e. containing the same salient objects) as a query image.
no code implementations • LREC 2020 • Ritiz Tambi, Ajinkya Kale, Tracy Holloway King
Language identification is a well-known task for natural language documents.
no code implementations • 25 Jul 2017 • Ajinkya Kale, Thrivikrama Taula, Sanjika Hewavitharana, Amit Srivastava
Query Segmentation is one of the critical components for understanding users' search intent in Information Retrieval tasks.
no code implementations • 10 Jun 2017 • Fan Yang, Ajinkya Kale, Yury Bubnov, Leon Stein, Qiaosong Wang, Hadi Kiapour, Robinson Piramuthu
We harness the availability of large image collection of eBay listings and state-of-the-art deep learning techniques to perform visual search at scale.