Search Results for author: Adarsh Jha

Found 3 papers, 1 papers with code

V-Zen: Efficient GUI Understanding and Precise Grounding With A Novel Multimodal LLM

no code implementations24 May 2024 Abdur Rahman, Rajat Chawla, Muskaan Kumar, Arkajit Datta, Adarsh Jha, Mukunda NS, Ishaan Bhola

In the rapidly evolving landscape of AI research and application, Multimodal Large Language Models (MLLMs) have emerged as a transformative force, adept at interpreting and integrating information from diverse modalities such as text, images, and Graphical User Interfaces (GUIs).

Language Modelling Large Language Model

GUIDE: Graphical User Interface Data for Execution

no code implementations9 Apr 2024 Rajat Chawla, Adarsh Jha, Muskaan Kumar, Mukunda NS, Ishaan Bhola

The dataset's multi-platform nature and coverage of diverse websites enable the exploration of cross-interface capabilities in automation tasks.

Language Modelling Large Language Model +1

Veagle: Advancements in Multimodal Representation Learning

1 code implementation18 Jan 2024 Rajat Chawla, Arkajit Datta, Tushar Verma, Adarsh Jha, Anmol Gautam, Ayush Vatsal, Sukrit Chaterjee, Mukunda NS, Ishaan Bhola

In response to the limitations observed in current Vision Language Models (VLMs) and Multimodal Large Language Models (MLLMs), our proposed model Veagle, incorporates a unique mechanism inspired by the successes and insights of previous works.

Image Captioning Language Modelling +4

Cannot find the paper you are looking for? You can Submit a new open access paper.