Search Results for author: Deyao Zhu

Found 14 papers, 9 papers with code

MiniGPT4-Video: Advancing Multimodal LLMs for Video Understanding with Interleaved Visual-Textual Tokens

1 code implementation • 4 Apr 2024 • Kirolos Ataallah, Xiaoqian Shen, Eslam Abdelrahman, Essam Sleiman, Deyao Zhu, Jian Ding, Mohamed Elhoseiny

This paper introduces MiniGPT4-Video, a multimodal Large Language Model (LLM) designed specifically for video understanding.

Ranked #3 on Zero-Shot Video Question Answer on TVQA

Language Modelling Large Language Model +8

405

Paper
Code

MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning

1 code implementation • 14 Oct 2023 • Jun Chen, Deyao Zhu, Xiaoqian Shen, Xiang Li, Zechun Liu, Pengchuan Zhang, Raghuraman Krishnamoorthi, Vikas Chandra, Yunyang Xiong, Mohamed Elhoseiny

Motivated by this, we target to build a unified interface for completing many vision-language tasks including image description, visual question answering, and visual grounding, among others.

Ranked #10 on Visual Question Answering on BenchLMM

Language Modelling Large Language Model +4

25,032

Paper
Code

Exploring Open-Vocabulary Semantic Segmentation without Human Labels

no code implementations • 1 Jun 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Mohamed Elhoseiny, Sean Chang Culatana

Although acquired extensive knowledge of visual concepts, it is non-trivial to exploit knowledge from these VL models to the task of semantic segmentation, as they are usually trained at an image level.

Open Vocabulary Semantic Segmentation Segmentation +3

Paper
Add Code

MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models

5 code implementations • 20 Apr 2023 • Deyao Zhu, Jun Chen, Xiaoqian Shen, Xiang Li, Mohamed Elhoseiny

Our work, for the first time, uncovers that properly aligning the visual features with an advanced large language model can possess numerous advanced multi-modal abilities demonstrated by GPT-4, such as detailed image description generation and website creation from hand-drawn drafts.

Ranked #9 on Visual Question Answering on BenchLMM

Language Modelling Large Language Model +3

25,032

Paper
Code

Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions

1 code implementation • 9 Apr 2023 • Jun Chen, Deyao Zhu, Kilichbek Haydarov, Xiang Li, Mohamed Elhoseiny

Video captioning aims to convey dynamic scenes from videos using natural language, facilitating the understanding of spatiotemporal information within our environment.

Video Captioning

442

Paper
Code

ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions

1 code implementation • 12 Mar 2023 • Deyao Zhu, Jun Chen, Kilichbek Haydarov, Xiaoqian Shen, Wenxuan Zhang, Mohamed Elhoseiny

By keeping acquiring new visual information from BLIP-2's answers, ChatCaptioner is able to generate more enriched image descriptions.

Image Captioning Question Answering +1

442

Paper
Code

Guiding Online Reinforcement Learning with Action-Free Offline Pretraining

1 code implementation • 30 Jan 2023 • Deyao Zhu, Yuhui Wang, Jürgen Schmidhuber, Mohamed Elhoseiny

In this paper, we investigate the potential of using action-free offline datasets to improve online reinforcement learning, name this problem Reinforcement Learning with Action-Free Offline Pretraining (AFP-RL).

Offline RL reinforcement-learning +1

Paper
Code

Exploring Open-Vocabulary Semantic Segmentation from CLIP Vision Encoder Distillation Only

no code implementations • ICCV 2023 • Jun Chen, Deyao Zhu, Guocheng Qian, Bernard Ghanem, Zhicheng Yan, Chenchen Zhu, Fanyi Xiao, Sean Chang Culatana, Mohamed Elhoseiny

Semantic segmentation is a crucial task in computer vision that involves segmenting images into semantically meaningful regions at the pixel level.

Open Vocabulary Semantic Segmentation Segmentation +3

Paper
Add Code

Value Memory Graph: A Graph-Structured World Model for Offline Reinforcement Learning

1 code implementation • 9 Jun 2022 • Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

In some complex environments with continuous state-action spaces, sparse rewards, and/or long temporal horizons, learning a good policy in the original environments can be difficult.

D4RL Model-based Reinforcement Learning +3

Paper
Code

Social-Implicit: Rethinking Trajectory Prediction Evaluation and The Effectiveness of Implicit Maximum Likelihood Estimation

1 code implementation • 6 Mar 2022 • Abduallah Mohamed, Deyao Zhu, Warren Vu, Mohamed Elhoseiny, Christian Claudel

AMD is a metric that quantifies how close the whole generated samples are to the ground truth.

Ranked #1 on Trajectory Prediction on Stanford Drone (ADE (in world coordinates) metric)

Human motion prediction motion prediction +2

Paper
Code

CausalDyna: Improving Generalization of Dyna-style Reinforcement Learning via Counterfactual-Based Data Augmentation

no code implementations • 29 Sep 2021 • Deyao Zhu, Li Erran Li, Mohamed Elhoseiny

Deep reinforcement learning agents trained in real-world environments with a limited diversity of object properties to learn manipulation tasks tend to suffer overfitting and fail to generalize to unseen testing environments.

counterfactual Data Augmentation +3

Paper
Add Code

RelTransformer: A Transformer-Based Long-Tail Visual Relationship Recognition

1 code implementation • CVPR 2022 • Jun Chen, Aniket Agarwal, Sherif Abdelkarim, Deyao Zhu, Mohamed Elhoseiny

This paper shows that modeling an effective message-passing flow through an attention mechanism can be critical to tackling the compositionality and long-tail challenges in VRR.

Image Captioning Object Recognition +5

Paper
Code

HalentNet: Multimodal Trajectory Forecasting with Hallucinative Intents

no code implementations • ICLR 2021 • Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny

Our model's learned representation leads to better and more semantically meaningful coverage of the trajectory distribution.

Motion Forecasting Trajectory Forecasting

Paper
Add Code

Motion Forecasting with Unlikelihood Training

no code implementations • 1 Jan 2021 • Deyao Zhu, Mohamed Zahran, Li Erran Li, Mohamed Elhoseiny

We propose a new objective, unlikelihood training, which forces generated trajectories that conflicts with contextual information to be assigned a lower probability by our model.

Decoder Motion Forecasting +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.