Search Results for author: Sirui Zhao

Found 7 papers, 6 papers with code

A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise

2 code implementations • 19 Dec 2023 • Chaoyou Fu, Renrui Zhang, Zihan Wang, Yubo Huang, Zhengye Zhang, Longtian Qiu, Gaoxiang Ye, Yunhang Shen, Mengdan Zhang, Peixian Chen, Sirui Zhao, Shaohui Lin, Deqiang Jiang, Di Yin, Peng Gao, Ke Li, Hongsheng Li, Xing Sun

They endow Large Language Models (LLMs) with powerful capabilities in visual understanding, enabling them to tackle diverse multi-modal tasks.

Visual Reasoning

8,973

Paper
Code

APGL4SR: A Generic Framework with Adaptive and Personalized Global Collaborative Information in Sequential Recommendation

1 code implementation • 6 Nov 2023 • Mingjia Yin, Hao Wang, Xiang Xu, Likang Wu, Sirui Zhao, Wei Guo, Yong liu, Ruiming Tang, Defu Lian, Enhong Chen

To this end, we propose a graph-driven framework, named Adaptive and Personalized Graph Learning for Sequential Recommendation (APGL4SR), that incorporates adaptive and personalized global collaborative information into sequential recommendation systems.

Graph Learning Multi-Task Learning +1

Paper
Code

Woodpecker: Hallucination Correction for Multimodal Large Language Models

1 code implementation • 24 Oct 2023 • Shukang Yin, Chaoyou Fu, Sirui Zhao, Tong Xu, Hao Wang, Dianbo Sui, Yunhang Shen, Ke Li, Xing Sun, Enhong Chen

Hallucination is a big shadow hanging over the rapidly evolving Multimodal Large Language Models (MLLMs), referring to the phenomenon that the generated text is inconsistent with the image content.

Hallucination

539

Paper
Code

A Solution to CVPR'2023 AQTC Challenge: Video Alignment for Multi-Step Inference

1 code implementation • 26 Jun 2023 • Chao Zhang, Shiwei Wu, Sirui Zhao, Tong Xu, Enhong Chen

In this paper, we present a solution for enhancing video alignment to improve multi-step inference.

Video Alignment

Paper
Code

A Survey on Multimodal Large Language Models

1 code implementation • 23 Jun 2023 • Shukang Yin, Chaoyou Fu, Sirui Zhao, Ke Li, Xing Sun, Tong Xu, Enhong Chen

Recently, Multimodal Large Language Model (MLLM) represented by GPT-4V has been a new rising research hotspot, which uses powerful Large Language Models (LLMs) as a brain to perform multimodal tasks.

Hallucination In-Context Learning +5

8,973

Paper
Code

AU-aware graph convolutional network for Macro- and Micro-expression spotting

1 code implementation • 16 Mar 2023 • Shukang Yin, Shiwei Wu, Tong Xu, Shifeng Liu, Sirui Zhao, Enhong Chen

Automatic Micro-Expression (ME) spotting in long videos is a crucial step in ME analysis but also a challenging task due to the short duration and low intensity of MEs.

Micro-Expression Spotting

Paper
Code

More is Better: A Database for Spontaneous Micro-Expression with High Frame Rates

no code implementations • 3 Jan 2023 • Sirui Zhao, Huaying Tang, Xinglong Mao, Shifeng Liu, Hanqing Tao, Hao Wang, Tong Xu, Enhong Chen

To solve the problem of ME data hunger, we construct a dynamic spontaneous ME dataset with the largest current ME data scale, called DFME (Dynamic Facial Micro-expressions), which includes 7, 526 well-labeled ME videos induced by 671 participants and annotated by more than 20 annotators throughout three years.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.