1 code implementation • 26 Mar 2024 • Badri N. Patro, Vinay P. Namboodiri, Vijay S. Agneeswaran
Transformers used in vision have been investigated through diverse architectures - ViT, PVT, and Swin.
1 code implementation • 22 Mar 2024 • Badri N. Patro, Vijay S. Agneeswaran
Transformers have widely adopted attention networks for sequence mixing and MLPs for channel mixing, playing a pivotal role in achieving breakthroughs across domains.
1 code implementation • 13 Apr 2023 • Badri N. Patro, Vinay P. Namboodiri, Vijay Srinivas Agneeswaran
Vision transformers have been applied successfully for image recognition tasks.
1 code implementation • 16 Feb 2023 • Badri N. Patro, Vijay Srinivas Agneeswaran
Transformers are widely used for solving tasks in natural language processing, computer vision, speech, and music domains.
1 code implementation • 7 Mar 2022 • Abhishek Jha, Badri N. Patro, Luc van Gool, Tinne Tuytelaars
In this paper, we propose a novel regularization for VQA models, Constrained Optimization using Barlow's theory (COB), that improves the information content of the joint space by minimizing the redundancy.
no code implementations • 3 Feb 2021 • Vinod K Kurmi, Badri N. Patro, Venkatesh K. Subramanian, Vinay P. Namboodiri
We define distillation losses in terms of aleatoric uncertainty and self-attention.
1 code implementation • 23 Jan 2020 • Badri N. Patro, Shivansh Pate, Vinay P. Namboodiri
Our model explains the answers obtained through a VQA model by providing visual and textual explanations.
no code implementations • 23 Jan 2020 • Badri N. Patro, Vinod K. Kurmi, Sandeep Kumar, Vinay P. Namboodiri
This is a Bayesian framework and the results show a remarkable similarity to natural questions as validated by a human study.
no code implementations • 23 Jan 2020 • Badri N. Patro, Mayank Lunayach, Vinay P. Namboodiri
These have two-fold benefits: a) improvement in obtaining the certainty estimates that correlate better with misclassified samples and b) improved attention maps that provide state-of-the-art results in terms of correlation with human attention regions.
1 code implementation • 31 Dec 2019 • Badri N. Patro, Dev Chauhan, Vinod K. Kurmi, Vinay P. Namboodiri
One way to ensure this is by adding constraints for true paraphrase embeddings to be close and unrelated paraphrase candidate sentence embeddings to be far.
no code implementations • 19 Dec 2019 • Badri N. Patro, Vinay P. Namboodiri
Specifically, we incorporate exemplar based approaches and show that an exemplar based module can be incorporated in almost any of the deep learning architectures proposed in the literature and the addition of such a block results in improved performance for solving these tasks.
no code implementations • 19 Nov 2019 • Badri N. Patro, Anupriy, Vinay P. Namboodiri
It also results in a good improvement in rank correlation metric on the VQA task.
no code implementations • 14 Oct 2019 • Soumik Dasgupta, Badri N. Patro, Vinay P. Namboodiri
In this work, we show that Dynamic Attention helps in achieving grounding and also aids in the policy learning objective.
no code implementations • 13 Oct 2019 • Badri N. Patro, Shivansh Patel, Vinay P. Namboodiri
Particularly, in this work, we propose a new method Granular Multi-modal Attention, where we aim to particularly address the question of the right granularity at which one needs to attend while solving the Visual Dialog task.
no code implementations • 11 Sep 2019 • Badri N. Patro, Anupriy, Vinay P. Namboodiri
In this paper, we propose a probabilistic framework for solving the task of `Visual Dialog'.
Ranked #1 on Common Sense Reasoning on Visual Dialog v0.9
no code implementations • ICCV 2019 • Badri N. Patro, Mayank Lunayach, Shivansh Patel, Vinay P. Namboodiri
These have two-fold benefits: a) improvement in obtaining the certainty estimates that correlate better with misclassified samples and b) improved attention maps that provide state-of-the-art results in terms of correlation with human attention regions.
1 code implementation • EMNLP 2018 • Badri N. Patro, Sandeep Kumar, Vinod K. Kurmi, Vinay P. Namboodiri
Generating natural questions from an image is a semantic task that requires using visual and language modality to learn multimodal representations.
2 code implementations • COLING 2018 • Badri N. Patro, Vinod K. Kurmi, Sandeep Kumar, Vinay P. Namboodiri
One way to ensure this is by adding constraints for true paraphrase embeddings to be close and unrelated paraphrase candidate sentence embeddings to be far.