Search Results for author: Jiayin Hu

Found 2 papers, 0 papers with code

VisionGPT: Vision-Language Understanding Agent Using Generalized Multimodal Framework

no code implementations • 14 Mar 2024 • Chris Kelly, Luhui Hu, Bang Yang, Yu Tian, Deshun Yang, Cindy Yang, Zaoshan Huang, Zihao Li, Jiayin Hu, Yuexian Zou

With the emergence of large language models (LLMs) and vision foundation models, how to combine the intelligence and capacity of these open-sourced or API-available models to achieve open-world visual perception remains an open question.

Language Modelling Large Language Model +2

Paper
Add Code

VisionGPT-3D: A Generalized Multimodal Agent for Enhanced 3D Vision Understanding

no code implementations • 14 Mar 2024 • Chris Kelly, Luhui Hu, Jiayin Hu, Yu Tian, Deshun Yang, Bang Yang, Cindy Yang, Zihao Li, Zaoshan Huang, Yuexian Zou

It seamlessly integrates various SOTA vision models and brings the automation in the selection of SOTA vision models, identifies the suitable 3D mesh creation algorithms corresponding to 2D depth maps analysis, generates optimal results based on diverse multimodal inputs such as text prompts.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.