TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Video Question Answering	ActivityNet-QA	Video Chat	Accuracy	26.5	# 31
Video Question Answering	ActivityNet-QA	Video Chat	Confidence score	2.2	# 10
Zero-Shot Video Question Answer	ActivityNet-QA	Video Chat	Confidence Score	2.2	# 16
Zero-Shot Video Question Answer	ActivityNet-QA	Video Chat	Accuracy	26.5	# 16
Zero-Shot Video Question Answer	MSRVTT-QA	Video Chat-7B	Accuracy	45.0	# 18
Zero-Shot Video Question Answer	MSRVTT-QA	Video Chat-7B	Confidence Score	2.5	# 18
Zero-Shot Video Question Answer	MSVD-QA	Video Chat-7B	Accuracy	56.3	# 14
Zero-Shot Video Question Answer	MSVD-QA	Video Chat-7B	Confidence Score	2.8	# 15
Video Question Answering	MVBench	VideoChat	Avg.	35.5	# 7
Zero-Shot Video Question Answer	TGIF-QA	Video Chat-7B	Accuracy	34.4	# 8
Zero-Shot Video Question Answer	TGIF-QA	Video Chat-7B	Confidence Score	2.3	# 7
Video-based Generative Performance Benchmarking (Correctness of Information)	VideoInstruct	Video Chat	gpt-score	2.32	# 9
Video-based Generative Performance Benchmarking	VideoInstruct	Video Chat	Correctness of Information	2.23	# 13
Video-based Generative Performance Benchmarking	VideoInstruct	Video Chat	Detail Orientation	2.50	# 13
Video-based Generative Performance Benchmarking	VideoInstruct	Video Chat	Contextual Understanding	2.53	# 14
Video-based Generative Performance Benchmarking	VideoInstruct	Video Chat	Temporal Understanding	1.94	# 15
Video-based Generative Performance Benchmarking	VideoInstruct	Video Chat	Consistency	2.24	# 13
Video-based Generative Performance Benchmarking	VideoInstruct	Video Chat	mean	2.29	# 14
Video-based Generative Performance Benchmarking (Detail Orientation))	VideoInstruct	Video Chat	gpt-score	2.50	# 9
Video-based Generative Performance Benchmarking (Temporal Understanding)	VideoInstruct	Video Chat	gpt-score	1.94	# 11
Video-based Generative Performance Benchmarking (Contextual Understanding)	VideoInstruct	Video Chat	gpt-score	2.53	# 10
Video-based Generative Performance Benchmarking (Consistency)	VideoInstruct	Video Chat	gpt-score	2.24	# 9

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videochat-chat-centric-video-understanding/video-question-answering-on-mvbench)](https://paperswithcode.com/sota/video-question-answering-on-mvbench?p=videochat-chat-centric-video-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videochat-chat-centric-video-understanding/zeroshot-video-question-answer-on-tgif-qa)](https://paperswithcode.com/sota/zeroshot-video-question-answer-on-tgif-qa?p=videochat-chat-centric-video-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videochat-chat-centric-video-understanding/video-based-generative-performance-1)](https://paperswithcode.com/sota/video-based-generative-performance-1?p=videochat-chat-centric-video-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videochat-chat-centric-video-understanding/video-based-generative-performance-4)](https://paperswithcode.com/sota/video-based-generative-performance-4?p=videochat-chat-centric-video-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videochat-chat-centric-video-understanding/video-based-generative-performance-2)](https://paperswithcode.com/sota/video-based-generative-performance-2?p=videochat-chat-centric-video-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videochat-chat-centric-video-understanding/video-based-generative-performance-3)](https://paperswithcode.com/sota/video-based-generative-performance-3?p=videochat-chat-centric-video-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videochat-chat-centric-video-understanding/video-based-generative-performance-5)](https://paperswithcode.com/sota/video-based-generative-performance-5?p=videochat-chat-centric-video-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videochat-chat-centric-video-understanding/zeroshot-video-question-answer-on-msvd-qa)](https://paperswithcode.com/sota/zeroshot-video-question-answer-on-msvd-qa?p=videochat-chat-centric-video-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videochat-chat-centric-video-understanding/video-based-generative-performance)](https://paperswithcode.com/sota/video-based-generative-performance?p=videochat-chat-centric-video-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videochat-chat-centric-video-understanding/zeroshot-video-question-answer-on-activitynet)](https://paperswithcode.com/sota/zeroshot-video-question-answer-on-activitynet?p=videochat-chat-centric-video-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videochat-chat-centric-video-understanding/zeroshot-video-question-answer-on-msrvtt-qa)](https://paperswithcode.com/sota/zeroshot-video-question-answer-on-msrvtt-qa?p=videochat-chat-centric-video-understanding)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/videochat-chat-centric-video-understanding/video-question-answering-on-activitynet-qa)](https://paperswithcode.com/sota/video-question-answering-on-activitynet-qa?p=videochat-chat-centric-video-understanding)`

VideoChat: Chat-Centric Video Understanding

10 May 2023 · Kunchang Li, Yinan He, Yi Wang, Yizhuo Li, Wenhai Wang, Ping Luo, Yali Wang, LiMin Wang, Yu Qiao ·

In this paper, we initiate an attempt of developing an end-to-end chat-centric video understanding system, coined as VideoChat. It integrates video foundation models and large language models via a learnable neural interface, excelling in spatiotemporal reasoning, event localization, and causal relationship inference. To instructively tune this system, we build a video-centric instruction dataset, composed of thousands of videos associated with detailed descriptions and conversations. This dataset emphasizes spatiotemporal reasoning and captures causal relationships, providing a valuable asset for training our chat-centric video understanding system. Preliminary qualitative experiments demonstrate the potential of our system across a broad spectrum of video applications, which could serve as a simple prototype system for future research on chat-centric video understanding. Access our code and data at https://github.com/OpenGVLab/Ask-Anything

PDF Abstract

Code

Add Remove Mark official

opengvlab/ask-anything official

↳ Quickstart in

Spaces

2,669

Tasks

Add Remove

Video-based Generative Performance Benchmarking

Video-based Generative Performance Benchmarking (Consistency)

Video-based Generative Performance Benchmarking (Contextual Understanding)

Video-based Generative Performance Benchmarking (Correctness of Information)

Video-based Generative Performance Benchmarking (Detail Orientation))

Video-based Generative Performance Benchmarking (Temporal Understanding)

Video Question Answering

Video Understanding

Zero-Shot Video Question Answer

Datasets

WebVid

ActivityNet-QA

TGIF-QA MSRVTT-QA MSVD-QA VideoInstruct MVBench

Results from the Paper

Edit

Ranked #7 on Video Question Answering on MVBench

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Video Question Answering	ActivityNet-QA	Video Chat	Accuracy	26.5	# 31	Compare
Video Question Answering	ActivityNet-QA	Video Chat	Confidence score	2.2	# 10	Compare
Zero-Shot Video Question Answer	ActivityNet-QA	Video Chat	Confidence Score	2.2	# 16	Compare
Zero-Shot Video Question Answer	ActivityNet-QA	Video Chat	Accuracy	26.5	# 16	Compare
Zero-Shot Video Question Answer	MSRVTT-QA	Video Chat-7B	Accuracy	45.0	# 18	Compare
Zero-Shot Video Question Answer	MSRVTT-QA	Video Chat-7B	Confidence Score	2.5	# 18	Compare
Zero-Shot Video Question Answer	MSVD-QA	Video Chat-7B	Accuracy	56.3	# 14	Compare
Zero-Shot Video Question Answer	MSVD-QA	Video Chat-7B	Confidence Score	2.8	# 15	Compare
Video Question Answering	MVBench	VideoChat	Avg.	35.5	# 7	Compare
Zero-Shot Video Question Answer	TGIF-QA	Video Chat-7B	Accuracy	34.4	# 8	Compare
Zero-Shot Video Question Answer	TGIF-QA	Video Chat-7B	Confidence Score	2.3	# 7	Compare
Video-based Generative Performance Benchmarking (Correctness of Information)	VideoInstruct	Video Chat	gpt-score	2.32	# 9	Compare
Video-based Generative Performance Benchmarking	VideoInstruct	Video Chat	Correctness of Information	2.23	# 13	Compare
			Detail Orientation	2.50	# 13	Compare
			Contextual Understanding	2.53	# 14	Compare
			Temporal Understanding	1.94	# 15	Compare
			Consistency	2.24	# 13	Compare
			mean	2.29	# 14	Compare
Video-based Generative Performance Benchmarking (Detail Orientation))	VideoInstruct	Video Chat	gpt-score	2.50	# 9	Compare
Video-based Generative Performance Benchmarking (Temporal Understanding)	VideoInstruct	Video Chat	gpt-score	1.94	# 11	Compare
Video-based Generative Performance Benchmarking (Contextual Understanding)	VideoInstruct	Video Chat	gpt-score	2.53	# 10	Compare
Video-based Generative Performance Benchmarking (Consistency)	VideoInstruct	Video Chat	gpt-score	2.24	# 9	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

VideoChat: Chat-Centric Video Understanding

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove