Search Results for author: Erhan Bas

Found 6 papers, 1 papers with code

Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding

no code implementations • 9 Jan 2024 • Yatong Bai, Utsav Garg, Apaar Shanker, Haoming Zhang, Samyak Parajuli, Erhan Bas, Isidora Filipovic, Amelia N. Chu, Eugenia D Fomitcheva, Elliot Branson, Aerin Kim, Somayeh Sojoudi, Kyunghyun Cho

Vision and vision-language applications of neural networks, such as image classification and captioning, rely on large-scale annotated datasets that require non-trivial data-collecting processes.

Image Captioning Image Classification +3

Paper
Add Code

On the Performance of Multimodal Language Models

no code implementations • 4 Oct 2023 • Utsav Garg, Erhan Bas

Instruction-tuned large language models (LLMs) have demonstrated promising zero-shot generalization capabilities across various downstream tasks.

Benchmarking Binary Classification +4

Paper
Add Code

Detecting and Preventing Hallucinations in Large Vision Language Models

1 code implementation • 11 Aug 2023 • Anisha Gunjal, Jihan Yin, Erhan Bas

We find that even the current state-of-the-art LVLMs (InstructBLIP) still contain a staggering 30 percent of the hallucinatory text in the form of non-existent objects, unfaithful descriptions, and inaccurate relationships.

16k Hallucination +2

Paper
Code

Masked Vision and Language Modeling for Multi-modal Representation Learning

no code implementations • 3 Aug 2022 • Gukyeong Kwon, Zhaowei Cai, Avinash Ravichandran, Erhan Bas, Rahul Bhotika, Stefano Soatto

Instead of developing masked language modeling (MLM) and masked image modeling (MIM) independently, we propose to build joint masked vision and language modeling, where the masked signal of one modality is reconstructed with the help from another modality.

Language Modelling Masked Language Modeling +1

Paper
Add Code

X-DETR: A Versatile Architecture for Instance-wise Vision-Language Tasks

no code implementations • 12 Apr 2022 • Zhaowei Cai, Gukyeong Kwon, Avinash Ravichandran, Erhan Bas, Zhuowen Tu, Rahul Bhotika, Stefano Soatto

In this paper, we study the challenging instance-wise vision-language tasks, where the free-form language is required to align with the objects instead of the whole image.

Paper
Add Code

End-to-End Piece-Wise Unwarping of Document Images

no code implementations • ICCV 2021 • Sagnik Das, Kunwar Yashraj Singh, Jon Wu, Erhan Bas, Vijay Mahadevan, Rahul Bhotika, Dimitris Samaras

Document unwarping attempts to undo the physical deformation of the paper and recover a 'flatbed' scanned document-image for downstream tasks such as OCR.

MS-SSIM Optical Character Recognition (OCR) +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.