TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Open-World Video Segmentation	BURST-val	DEVA (Mask2Former)	OWTA (all)	69.9	# 1
Open-World Video Segmentation	BURST-val	DEVA (Mask2Former)	OWTA (com)	75.2	# 1
Open-World Video Segmentation	BURST-val	DEVA (Mask2Former)	OWTA (unc)	41.5	# 2
Open-World Video Segmentation	BURST-val	DEVA (EntitySeg)	OWTA (all)	69.5	# 2
Open-World Video Segmentation	BURST-val	DEVA (EntitySeg)	OWTA (com)	73.3	# 2
Open-World Video Segmentation	BURST-val	DEVA (EntitySeg)	OWTA (unc)	50.5	# 1
Unsupervised Video Object Segmentation	DAVIS 2016 val	DEVA (DIS)	G	88.9	# 1
Unsupervised Video Object Segmentation	DAVIS 2016 val	DEVA (DIS)	J	87.6	# 2
Unsupervised Video Object Segmentation	DAVIS 2016 val	DEVA (DIS)	F	90.2	# 1
Unsupervised Video Object Segmentation	DAVIS 2017 (test-dev)	DEVA (EntitySeg)	J&F	62.1	# 1
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	DEVA	J&F	83.2	# 7
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	DEVA	Jaccard (Mean)	79.6	# 8
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	DEVA	F-measure (Mean)	86.8	# 7
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	DEVA	FPS	25.3	# 10
Referring Expression Segmentation	DAVIS 2017 (val)	DEVA (ReferFormer)	J&F 1st frame	66.3	# 2
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	DEVA	Jaccard (Mean)	84.2	# 9
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	DEVA	F-measure (Mean)	91.0	# 7
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	DEVA	J&F	87.6	# 10
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	DEVA	Speed (FPS)	25.3	# 13
Unsupervised Video Object Segmentation	DAVIS 2017 (val)	DEVA (EntitySeg)	J&F	73.4	# 1
Unsupervised Video Object Segmentation	DAVIS 2017 (val)	DEVA (EntitySeg)	Jaccard (Mean)	70.4	# 1
Unsupervised Video Object Segmentation	DAVIS 2017 (val)	DEVA (EntitySeg)	F-measure (Mean)	76.4	# 1
Semi-Supervised Video Object Segmentation	MOSE	DEVA (no OVIS)	J&F	60.0	# 10
Semi-Supervised Video Object Segmentation	MOSE	DEVA (no OVIS)	J	55.8	# 10
Semi-Supervised Video Object Segmentation	MOSE	DEVA (no OVIS)	F	64.3	# 10
Semi-Supervised Video Object Segmentation	MOSE	DEVA (no OVIS)	FPS	25.3	# 7
Semi-Supervised Video Object Segmentation	MOSE	DEVA (with OVIS)	J&F	66.5	# 7
Semi-Supervised Video Object Segmentation	MOSE	DEVA (with OVIS)	J	62.3	# 7
Semi-Supervised Video Object Segmentation	MOSE	DEVA (with OVIS)	F	70.8	# 7
Semi-Supervised Video Object Segmentation	MOSE	DEVA (with OVIS)	FPS	25.3	# 7
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	DEVA (ReferFormer)	J&F	66.0	# 8
Video Panoptic Segmentation	VIPSeg	DEVA (Mask2Former - SwinB)	VPQ	55.0	# 5
Video Panoptic Segmentation	VIPSeg	DEVA (Mask2Former - SwinB)	STQ	52.2	# 5

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracking-anything-with-decoupled-video/open-world-video-segmentation-on-burst-val)](https://paperswithcode.com/sota/open-world-video-segmentation-on-burst-val?p=tracking-anything-with-decoupled-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracking-anything-with-decoupled-video/unsupervised-video-object-segmentation-on-10)](https://paperswithcode.com/sota/unsupervised-video-object-segmentation-on-10?p=tracking-anything-with-decoupled-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracking-anything-with-decoupled-video/unsupervised-video-object-segmentation-on-5)](https://paperswithcode.com/sota/unsupervised-video-object-segmentation-on-5?p=tracking-anything-with-decoupled-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracking-anything-with-decoupled-video/unsupervised-video-object-segmentation-on-4)](https://paperswithcode.com/sota/unsupervised-video-object-segmentation-on-4?p=tracking-anything-with-decoupled-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracking-anything-with-decoupled-video/referring-expression-segmentation-on-davis)](https://paperswithcode.com/sota/referring-expression-segmentation-on-davis?p=tracking-anything-with-decoupled-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracking-anything-with-decoupled-video/video-panoptic-segmentation-on-vipseg)](https://paperswithcode.com/sota/video-panoptic-segmentation-on-vipseg?p=tracking-anything-with-decoupled-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracking-anything-with-decoupled-video/semi-supervised-video-object-segmentation-on-1)](https://paperswithcode.com/sota/semi-supervised-video-object-segmentation-on-1?p=tracking-anything-with-decoupled-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracking-anything-with-decoupled-video/semi-supervised-video-object-segmentation-on-21)](https://paperswithcode.com/sota/semi-supervised-video-object-segmentation-on-21?p=tracking-anything-with-decoupled-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracking-anything-with-decoupled-video/referring-expression-segmentation-on-refer-1)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refer-1?p=tracking-anything-with-decoupled-video)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/tracking-anything-with-decoupled-video/visual-object-tracking-on-davis-2017)](https://paperswithcode.com/sota/visual-object-tracking-on-davis-2017?p=tracking-anything-with-decoupled-video)`

Tracking Anything with Decoupled Video Segmentation

ICCV 2023 · Ho Kei Cheng, Seoung Wug Oh, Brian Price, Alexander Schwing, Joon-Young Lee ·

Training data for video segmentation are expensive to annotate. This impedes extensions of end-to-end algorithms to new video segmentation tasks, especially in large-vocabulary settings. To 'track anything' without training on video data for every individual task, we develop a decoupled video segmentation approach (DEVA), composed of task-specific image-level segmentation and class/task-agnostic bi-directional temporal propagation. Due to this design, we only need an image-level model for the target task (which is cheaper to train) and a universal temporal propagation model which is trained once and generalizes across tasks. To effectively combine these two modules, we use bi-directional propagation for (semi-)online fusion of segmentation hypotheses from different frames to generate a coherent segmentation. We show that this decoupled formulation compares favorably to end-to-end approaches in several data-scarce tasks including large-vocabulary video panoptic segmentation, open-world video segmentation, referring video segmentation, and unsupervised video object segmentation. Code is available at: https://hkchengrex.github.io/Tracking-Anything-with-DEVA

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

hkchengrex/Tracking-Anything-with-D… official

↳ Quickstart in

Colab

1,062

Tasks

Add Remove

Open-Vocabulary Video Segmentation

Open-World Video Segmentation

Panoptic Segmentation

Referring Expression Segmentation

Referring Video Object Segmentation

Segmentation

Semantic Segmentation

Semi-Supervised Video Object Segmentation

Unsupervised Video Object Segmentation

Video Object Segmentation

Video Panoptic Segmentation

Video Segmentation

Video Semantic Segmentation

Datasets

MS COCO

DAVIS

DAVIS 2017

DAVIS 2016

YouTube-VOS 2018

YouTube-VIS 2019

Referring Expressions for DAVIS 2016 & 2017

Refer-YouTube-VOS VIPSeg

MOSE

BURST

Results from the Paper

Add Remove

Ranked #1 on Unsupervised Video Object Segmentation on DAVIS 2016 val (using extra training data)

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Open-World Video Segmentation	BURST-val	DEVA (Mask2Former)	OWTA (all)	69.9	# 1	Compare
			OWTA (com)	75.2	# 1	Compare
			OWTA (unc)	41.5	# 2	Compare
Open-World Video Segmentation	BURST-val	DEVA (EntitySeg)	OWTA (all)	69.5	# 2	Compare
			OWTA (com)	73.3	# 2	Compare
			OWTA (unc)	50.5	# 1	Compare
Unsupervised Video Object Segmentation	DAVIS 2016 val	DEVA (DIS)	G	88.9	# 1	Compare
			J	87.6	# 2	Compare
			F	90.2	# 1	Compare
Unsupervised Video Object Segmentation	DAVIS 2017 (test-dev)	DEVA (EntitySeg)	J&F	62.1	# 1	Compare
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	DEVA	J&F	83.2	# 7	Compare
			Jaccard (Mean)	79.6	# 8	Compare
			F-measure (Mean)	86.8	# 7	Compare
			FPS	25.3	# 10	Compare
Referring Expression Segmentation	DAVIS 2017 (val)	DEVA (ReferFormer)	J&F 1st frame	66.3	# 2	Compare
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	DEVA	Jaccard (Mean)	84.2	# 9	Compare
			F-measure (Mean)	91.0	# 7	Compare
			J&F	87.6	# 10	Compare
			Speed (FPS)	25.3	# 13	Compare
Unsupervised Video Object Segmentation	DAVIS 2017 (val)	DEVA (EntitySeg)	J&F	73.4	# 1	Compare
			Jaccard (Mean)	70.4	# 1	Compare
			F-measure (Mean)	76.4	# 1	Compare
Semi-Supervised Video Object Segmentation	MOSE	DEVA (no OVIS)	J&F	60.0	# 10	Compare
			J	55.8	# 10	Compare
			F	64.3	# 10	Compare
			FPS	25.3	# 7	Compare
Semi-Supervised Video Object Segmentation	MOSE	DEVA (with OVIS)	J&F	66.5	# 7	Compare
			J	62.3	# 7	Compare
			F	70.8	# 7	Compare
			FPS	25.3	# 7	Compare
Referring Expression Segmentation	Refer-YouTube-VOS (2021 public validation)	DEVA (ReferFormer)	J&F	66.0	# 8	Compare
Video Panoptic Segmentation	VIPSeg	DEVA (Mask2Former - SwinB)	VPQ	55.0	# 5	Compare
Video Panoptic Segmentation	VIPSeg	DEVA (Mask2Former - SwinB)	STQ	52.2	# 5	Compare

Methods

Add Remove

No methods listed for this paper. Add relevant methods here

Edit Social Preview

Tracking Anything with Decoupled Video Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove