TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Semi-Supervised Video Object Segmentation	BURST-test	Cutie (base, MEGA, 600 pixels)	HOTA (all)	66.0	# 1
Semi-Supervised Video Object Segmentation	BURST-test	Cutie (base, MEGA, 600 pixels)	HOTA (common)	66.5	# 1
Semi-Supervised Video Object Segmentation	BURST-test	Cutie (base, MEGA, 600 pixels)	HOTA (uncommon)	65.9	# 1
Semi-Supervised Video Object Segmentation	BURST-test	Cutie (base, with mose, 600 pixels)	HOTA (all)	62.6	# 2
Semi-Supervised Video Object Segmentation	BURST-test	Cutie (base, with mose, 600 pixels)	HOTA (common)	63.8	# 2
Semi-Supervised Video Object Segmentation	BURST-test	Cutie (base, with mose, 600 pixels)	HOTA (uncommon)	62.3	# 2
Semi-Supervised Video Object Segmentation	BURST-val	Cutie (base, MEGA, 600 pixels)	HOTA (all)	61.2	# 1
Semi-Supervised Video Object Segmentation	BURST-val	Cutie (base, MEGA, 600 pixels)	HOTA (common)	65.0	# 1
Semi-Supervised Video Object Segmentation	BURST-val	Cutie (base, MEGA, 600 pixels)	HOTA (uncommon)	60.3	# 1
Semi-Supervised Video Object Segmentation	BURST-val	Cutie (base, with mose, 600 pixels)	HOTA (all)	58.4	# 2
Semi-Supervised Video Object Segmentation	BURST-val	Cutie (base, with mose, 600 pixels)	HOTA (common)	61.8	# 2
Semi-Supervised Video Object Segmentation	BURST-val	Cutie (base, with mose, 600 pixels)	HOTA (uncommon)	57.5	# 2
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie+ (base)	J&F	85.9	# 3
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie+ (base)	Jaccard (Mean)	82.6	# 2
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie+ (base)	F-measure (Mean)	89.2	# 3
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie+ (base)	FPS	17.9	# 14
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie (base, MEGA)	J&F	86.1	# 2
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie (base, MEGA)	Jaccard (Mean)	82.4	# 3
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie (base, MEGA)	F-measure (Mean)	89.9	# 2
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie (base, MEGA)	FPS	36.4	# 6
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie+ (base, MEGA)	J&F	88.1	# 1
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie+ (base, MEGA)	Jaccard (Mean)	84.7	# 1
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie+ (base, MEGA)	F-measure (Mean)	91.4	# 1
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie+ (base, MEGA)	FPS	17.9	# 14
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie+ (base, MEGA)	Jaccard (Mean)	85.5	# 5
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie+ (base, MEGA)	F-measure (Mean)	90.8	# 9
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie+ (base, MEGA)	J&F	88.1	# 7
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie+ (base, MEGA)	Speed (FPS)	17.9	# 23
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie (base)	Jaccard (Mean)	84.6	# 7
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie (base)	F-measure (Mean)	91.1	# 6
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie (base)	J&F	87.9	# 8
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie (base)	Params(M)	36.4	# 17
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie+ (base)	Jaccard (Mean)	87.5	# 1
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie+ (base)	F-measure (Mean)	93.4	# 1
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie+ (base)	J&F	90.5	# 1
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie+ (base)	Params(M)	17.9	# 15
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small)	J&F	62.2	# 9
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small)	J	58.2	# 9
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small)	F	66.2	# 9
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small)	FPS	45.5	# 1
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small, MEGA)	J&F	68.6	# 4
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small, MEGA)	J	64.3	# 4
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small, MEGA)	F	72.9	# 4
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small, MEGA)	FPS	45.5	# 1
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base, MEGA)	J&F	69.9	# 3
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base, MEGA)	J	65.8	# 3
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base, MEGA)	F	74.1	# 3
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base, MEGA)	FPS	36.4	# 4
Semi-Supervised Video Object Segmentation	MOSE	Cutie+ (small, MEGA)	J&F	70.3	# 2
Semi-Supervised Video Object Segmentation	MOSE	Cutie+ (small, MEGA)	J	66.0	# 2
Semi-Supervised Video Object Segmentation	MOSE	Cutie+ (small, MEGA)	F	74.5	# 2
Semi-Supervised Video Object Segmentation	MOSE	Cutie+ (small, MEGA)	FPS	20.6	# 9
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base)	J&F	64.0	# 8
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base)	J	60.0	# 8
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base)	F	67.9	# 8
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base)	FPS	36.4	# 4
Semi-Supervised Video Object Segmentation	MOSE	Cutie+ (base, MEGA)	J&F	71.7	# 1
Semi-Supervised Video Object Segmentation	MOSE	Cutie+ (base, MEGA)	J	67.6	# 1
Semi-Supervised Video Object Segmentation	MOSE	Cutie+ (base, MEGA)	F	75.8	# 1
Semi-Supervised Video Object Segmentation	MOSE	Cutie+ (base, MEGA)	FPS	17.9	# 10
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base, with mose)	J&F	68.3	# 5
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base, with mose)	J	64.2	# 5
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base, with mose)	F	72.3	# 5
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base, with mose)	FPS	36.4	# 4
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small, with mose)	J&F	67.4	# 6
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small, with mose)	J	63.1	# 6
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small, with mose)	F	71.7	# 6
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small, with mose)	FPS	45.5	# 1
Semi-Supervised Video Object Segmentation	YouTube-VOS 2018	Cutie+ (base, MEGA)	F-Measure (Seen)	91.0	# 1
Semi-Supervised Video Object Segmentation	YouTube-VOS 2018	Cutie+ (base, MEGA)	F-Measure (Unseen)	90.1	# 2
Semi-Supervised Video Object Segmentation	YouTube-VOS 2018	Cutie+ (base, MEGA)	Overall	87.5	# 1
Semi-Supervised Video Object Segmentation	YouTube-VOS 2018	Cutie+ (base, MEGA)	Jaccard (Seen)	86.6	# 1
Semi-Supervised Video Object Segmentation	YouTube-VOS 2018	Cutie+ (base, MEGA)	Jaccard (Unseen)	82.2	# 1
Semi-Supervised Video Object Segmentation	YouTube-VOS 2018	Cutie+ (base, MEGA)	Speed (FPS)	17.9	# 9
Semi-Supervised Video Object Segmentation	YouTube-VOS 2019	Cutie+ (base, MEGA)	Overall	87.5	# 1
Semi-Supervised Video Object Segmentation	YouTube-VOS 2019	Cutie+ (base, MEGA)	Jaccard (Seen)	86.3	# 1
Semi-Supervised Video Object Segmentation	YouTube-VOS 2019	Cutie+ (base, MEGA)	Jaccard (Unseen)	82.7	# 3
Semi-Supervised Video Object Segmentation	YouTube-VOS 2019	Cutie+ (base, MEGA)	F-Measure (Seen)	90.6	# 1
Semi-Supervised Video Object Segmentation	YouTube-VOS 2019	Cutie+ (base, MEGA)	F-Measure (Unseen)	90.5	# 1
Semi-Supervised Video Object Segmentation	YouTube-VOS 2019	Cutie+ (base, MEGA)	J&F	17.9	# 3

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/putting-the-object-back-into-video-object/semi-supervised-video-object-segmentation-on-23)](https://paperswithcode.com/sota/semi-supervised-video-object-segmentation-on-23?p=putting-the-object-back-into-video-object)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/putting-the-object-back-into-video-object/semi-supervised-video-object-segmentation-on-22)](https://paperswithcode.com/sota/semi-supervised-video-object-segmentation-on-22?p=putting-the-object-back-into-video-object)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/putting-the-object-back-into-video-object/semi-supervised-video-object-segmentation-on-1)](https://paperswithcode.com/sota/semi-supervised-video-object-segmentation-on-1?p=putting-the-object-back-into-video-object)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/putting-the-object-back-into-video-object/visual-object-tracking-on-davis-2017)](https://paperswithcode.com/sota/visual-object-tracking-on-davis-2017?p=putting-the-object-back-into-video-object)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/putting-the-object-back-into-video-object/semi-supervised-video-object-segmentation-on-21)](https://paperswithcode.com/sota/semi-supervised-video-object-segmentation-on-21?p=putting-the-object-back-into-video-object)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/putting-the-object-back-into-video-object/video-object-segmentation-on-youtube-vos)](https://paperswithcode.com/sota/video-object-segmentation-on-youtube-vos?p=putting-the-object-back-into-video-object)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/putting-the-object-back-into-video-object/semi-supervised-video-object-segmentation-on-18)](https://paperswithcode.com/sota/semi-supervised-video-object-segmentation-on-18?p=putting-the-object-back-into-video-object)`

Putting the Object Back into Video Object Segmentation

19 Oct 2023 · Ho Kei Cheng, Seoung Wug Oh, Brian Price, Joon-Young Lee, Alexander Schwing ·

We present Cutie, a video object segmentation (VOS) network with object-level memory reading, which puts the object representation from memory back into the video object segmentation result. Recent works on VOS employ bottom-up pixel-level memory reading which struggles due to matching noise, especially in the presence of distractors, resulting in lower performance in more challenging data. In contrast, Cutie performs top-down object-level memory reading by adapting a small set of object queries. Via those, it interacts with the bottom-up pixel features iteratively with a query-based object transformer (qt, hence Cutie). The object queries act as a high-level summary of the target object, while high-resolution feature maps are retained for accurate segmentation. Together with foreground-background masked attention, Cutie cleanly separates the semantics of the foreground object from the background. On the challenging MOSE dataset, Cutie improves by 8.7 J&F over XMem with a similar running time and improves by 4.2 J&F over DeAOT while being three times faster. Code is available at: https://hkchengrex.github.io/Cutie

PDF Abstract

Code

Add Remove Mark official

hkchengrex/Cutie official

↳ Quickstart in

Colab

472

Tasks

Add Remove

Object

Segmentation

Semantic Segmentation

Semi-Supervised Video Object Segmentation

Video Object Segmentation

Video Semantic Segmentation

Datasets

DAVIS

DAVIS 2017

YouTube-VOS 2018

Referring Expressions for DAVIS 2016 & 2017

MOSE

BURST

LVOS

Results from the Paper

Add Remove

Ranked #1 on Semi-Supervised Video Object Segmentation on MOSE

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Semi-Supervised Video Object Segmentation	BURST-test	Cutie (base, MEGA, 600 pixels)	HOTA (all)	66.0	# 1	Compare
			HOTA (common)	66.5	# 1	Compare
			HOTA (uncommon)	65.9	# 1	Compare
Semi-Supervised Video Object Segmentation	BURST-test	Cutie (base, with mose, 600 pixels)	HOTA (all)	62.6	# 2	Compare
			HOTA (common)	63.8	# 2	Compare
			HOTA (uncommon)	62.3	# 2	Compare
Semi-Supervised Video Object Segmentation	BURST-val	Cutie (base, MEGA, 600 pixels)	HOTA (all)	61.2	# 1	Compare
			HOTA (common)	65.0	# 1	Compare
			HOTA (uncommon)	60.3	# 1	Compare
Semi-Supervised Video Object Segmentation	BURST-val	Cutie (base, with mose, 600 pixels)	HOTA (all)	58.4	# 2	Compare
			HOTA (common)	61.8	# 2	Compare
			HOTA (uncommon)	57.5	# 2	Compare
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie+ (base)	J&F	85.9	# 3	Compare
			Jaccard (Mean)	82.6	# 2	Compare
			F-measure (Mean)	89.2	# 3	Compare
			FPS	17.9	# 14	Compare
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie (base, MEGA)	J&F	86.1	# 2	Compare
			Jaccard (Mean)	82.4	# 3	Compare
			F-measure (Mean)	89.9	# 2	Compare
			FPS	36.4	# 6	Compare
Semi-Supervised Video Object Segmentation	DAVIS 2017 (test-dev)	Cutie+ (base, MEGA)	J&F	88.1	# 1	Compare
			Jaccard (Mean)	84.7	# 1	Compare
			F-measure (Mean)	91.4	# 1	Compare
			FPS	17.9	# 14	Compare
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie+ (base, MEGA)	Jaccard (Mean)	85.5	# 5	Compare
			F-measure (Mean)	90.8	# 9	Compare
			J&F	88.1	# 7	Compare
			Speed (FPS)	17.9	# 23	Compare
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie (base)	Jaccard (Mean)	84.6	# 7	Compare
			F-measure (Mean)	91.1	# 6	Compare
			J&F	87.9	# 8	Compare
			Params(M)	36.4	# 17	Compare
Semi-Supervised Video Object Segmentation	DAVIS 2017 (val)	Cutie+ (base)	Jaccard (Mean)	87.5	# 1	Compare
			F-measure (Mean)	93.4	# 1	Compare
			J&F	90.5	# 1	Compare
			Params(M)	17.9	# 15	Compare
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small)	J&F	62.2	# 9	Compare
			J	58.2	# 9	Compare
			F	66.2	# 9	Compare
			FPS	45.5	# 1	Compare
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small, MEGA)	J&F	68.6	# 4	Compare
			J	64.3	# 4	Compare
			F	72.9	# 4	Compare
			FPS	45.5	# 1	Compare
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base, MEGA)	J&F	69.9	# 3	Compare
			J	65.8	# 3	Compare
			F	74.1	# 3	Compare
			FPS	36.4	# 4	Compare
Semi-Supervised Video Object Segmentation	MOSE	Cutie+ (small, MEGA)	J&F	70.3	# 2	Compare
			J	66.0	# 2	Compare
			F	74.5	# 2	Compare
			FPS	20.6	# 9	Compare
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base)	J&F	64.0	# 8	Compare
			J	60.0	# 8	Compare
			F	67.9	# 8	Compare
			FPS	36.4	# 4	Compare
Semi-Supervised Video Object Segmentation	MOSE	Cutie+ (base, MEGA)	J&F	71.7	# 1	Compare
			J	67.6	# 1	Compare
			F	75.8	# 1	Compare
			FPS	17.9	# 10	Compare
Semi-Supervised Video Object Segmentation	MOSE	Cutie (base, with mose)	J&F	68.3	# 5	Compare
			J	64.2	# 5	Compare
			F	72.3	# 5	Compare
			FPS	36.4	# 4	Compare
Semi-Supervised Video Object Segmentation	MOSE	Cutie (small, with mose)	J&F	67.4	# 6	Compare
			J	63.1	# 6	Compare
			F	71.7	# 6	Compare
			FPS	45.5	# 1	Compare
Semi-Supervised Video Object Segmentation	YouTube-VOS 2018	Cutie+ (base, MEGA)	F-Measure (Seen)	91.0	# 1	Compare
			F-Measure (Unseen)	90.1	# 2	Compare
			Overall	87.5	# 1	Compare
			Jaccard (Seen)	86.6	# 1	Compare
			Jaccard (Unseen)	82.2	# 1	Compare
			Speed (FPS)	17.9	# 9	Compare
Semi-Supervised Video Object Segmentation	YouTube-VOS 2019	Cutie+ (base, MEGA)	Overall	87.5	# 1	Compare
			Jaccard (Seen)	86.3	# 1	Compare
			Jaccard (Unseen)	82.7	# 3	Compare
			F-Measure (Seen)	90.6	# 1	Compare
			F-Measure (Unseen)	90.5	# 1	Compare
			J&F	17.9	# 3	Compare

Methods

Add Remove

VOS

Edit Social Preview

Putting the Object Back into Video Object Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove