Search Results for author: Vijay Kumar BG

Found 5 papers, 4 papers with code

Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement

no code implementations • 6 Apr 2024 • Zaid Khan, Vijay Kumar BG, Samuel Schulter, Yun Fu, Manmohan Chandraker

We propose a method where we exploit existing annotations for a vision-language task to improvise a coarse reward signal for that task, treat the LLM as a policy, and apply reinforced self-training to improve the visual program synthesis ability of the LLM for that task.

object-detection Object Detection +4

Paper
Add Code

Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!

1 code implementation • CVPR 2023 • Zaid Khan, Vijay Kumar BG, Samuel Schulter, Xiang Yu, Yun Fu, Manmohan Chandraker

We introduce SelTDA (Self-Taught Data Augmentation), a strategy for finetuning large VLMs on small-scale VQA datasets.

counterfactual Data Augmentation +5

Paper
Code

Single-Stream Multi-Level Alignment for Vision-Language Pretraining

1 code implementation • 27 Mar 2022 • Zaid Khan, Vijay Kumar BG, Xiang Yu, Samuel Schulter, Manmohan Chandraker, Yun Fu

Self-supervised vision-language pretraining from pure images and text with a contrastive loss is effective, but ignores fine-grained alignment due to a dual-stream architecture that aligns image and text representations only on a global level.

Question Answering Referring Expression +4

Paper
Code

Large Scale Multimodal Classification Using an Ensemble of Transformer Models and Co-Attention

1 code implementation • 23 Nov 2020 • Varnith Chordia, Vijay Kumar BG

Accurate and efficient product classification is significant for E-commerce applications, as it enables various downstream tasks such as recommendation, retrieval, and pricing.

Classification General Classification +3

Paper
Code

Unsupervised CNN for Single View Depth Estimation: Geometry to the Rescue

2 code implementations • 16 Mar 2016 • Ravi Garg, Vijay Kumar BG, Gustavo Carneiro, Ian Reid

In this work we propose a unsupervised framework to learn a deep convolutional neural network for single view depth predic- tion, without requiring a pre-training stage or annotated ground truth depths.

Depth Estimation

238

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.