TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Visual Object Tracking	LaSOT	MITS	AUC	72.0	# 10
Visual Object Tracking	LaSOT	MITS	Normalized Precision	80.1	# 11
Visual Object Tracking	LaSOT	MITS	Precision	78.5	# 7
Visual Object Tracking	TrackingNet	MITS	Precision	84.6	# 6
Visual Object Tracking	TrackingNet	MITS	Normalized Precision	88.9	# 7
Visual Object Tracking	TrackingNet	MITS	Accuracy	83.4	# 13

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/integrating-boxes-and-masks-a-multi-object/visual-object-tracking-on-lasot)](https://paperswithcode.com/sota/visual-object-tracking-on-lasot?p=integrating-boxes-and-masks-a-multi-object)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/integrating-boxes-and-masks-a-multi-object/visual-object-tracking-on-trackingnet)](https://paperswithcode.com/sota/visual-object-tracking-on-trackingnet?p=integrating-boxes-and-masks-a-multi-object)`

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

ICCV 2023 · Yuanyou Xu, Zongxin Yang, Yi Yang ·

Tracking any given object(s) spatially and temporally is a common purpose in Visual Object Tracking (VOT) and Video Object Segmentation (VOS). Joint tracking and segmentation have been attempted in some studies but they often lack full compatibility of both box and mask in initialization and prediction, and mainly focus on single-object scenarios. To address these limitations, this paper proposes a Multi-object Mask-box Integrated framework for unified Tracking and Segmentation, dubbed MITS. Firstly, the unified identification module is proposed to support both box and mask reference for initialization, where detailed object information is inferred from boxes or directly retained from masks. Additionally, a novel pinpoint box predictor is proposed for accurate multi-object box prediction, facilitating target-oriented representation learning. All target objects are processed simultaneously from encoding to propagation and decoding, as a unified pipeline for VOT and VOS. Experimental results show MITS achieves state-of-the-art performance on both VOT and VOS benchmarks. Notably, MITS surpasses the best prior VOT competitor by around 6% on the GOT-10k test set, and significantly improves the performance of box initialization on VOS benchmarks. The code is available at https://github.com/yoxu515/MITS.

PDF Abstract ICCV 2023 PDF ICCV 2023 Abstract

Code

Add Remove Mark official

yoxu515/mits official

Tasks

Add Remove

Object

Object Tracking

Representation Learning

Segmentation

Semantic Segmentation

Video Object Segmentation

Video Semantic Segmentation

Visual Object Tracking

Visual Tracking

Datasets

DAVIS 2017

LaSOT

GOT-10k

TrackingNet

YouTube-VOS 2018

Results from the Paper

Edit

Ranked #10 on Visual Object Tracking on LaSOT

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Visual Object Tracking	LaSOT	MITS	AUC	72.0	# 10	Compare
			Normalized Precision	80.1	# 11	Compare
			Precision	78.5	# 7	Compare
Visual Object Tracking	TrackingNet	MITS	Precision	84.6	# 6	Compare
			Normalized Precision	88.9	# 7	Compare
			Accuracy	83.4	# 13	Compare

Methods

Add Remove

Focus • VOS

Edit Social Preview

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove