no code implementations • 29 Jun 2023 • Abhirama Subramanyam Penamakuri, Manish Gupta, Mithun Das Gupta, Anand Mishra
We study visual question answering in a setting where the answer has to be mined from a pool of relevant and irrelevant images given as a context.
no code implementations • 16 Oct 2022 • Prajwal Gatti, Abhirama Subramanyam Penamakuri, Revant Teotia, Anand Mishra, Shubhashis Sengupta, Roshni Ramnani
To enable both commonsense and factual reasoning in the image search, we present a unified framework, namely Knowledge Retrieval-Augmented Multimodal Transformer (KRAMT), that treats the named visual entities in an image as a gateway to encyclopedic knowledge and leverages them along with natural language query to ground relevant knowledge.