TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Action Spotting	SoccerNet-v2	COMEDIAN (ViSwin T ens.)	Average-mAP	77.6	# 2
Action Spotting	SoccerNet-v2	COMEDIAN (ViSwin T ens.)	Tight Average-mAP	73.1	# 1
Action Spotting	SoccerNet-v2	COMEDIAN (ViSwin T)	Average-mAP	76.6	# 4
Action Spotting	SoccerNet-v2	COMEDIAN (ViSwin T)	Tight Average-mAP	71.6	# 3
Action Spotting	SoccerNet-v2	COMEDIAN (ViViT T ens.)	Average-mAP	77.1	# 3
Action Spotting	SoccerNet-v2	COMEDIAN (ViViT T ens.)	Tight Average-mAP	72.0	# 2
Action Spotting	SoccerNet-v2	COMEDIAN (ViViT T)	Average-mAP	76.1	# 5
Action Spotting	SoccerNet-v2	COMEDIAN (ViViT T)	Tight Average-mAP	70.7	# 4

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/comedian-self-supervised-learning-and/action-spotting-on-soccernet-v2)](https://paperswithcode.com/sota/action-spotting-on-soccernet-v2?p=comedian-self-supervised-learning-and)`

COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

3 Sep 2023 · Julien Denize, Mykola Liashuha, Jaonary Rabarisoa, Astrid Orcesi, Romain Hérault ·

We present COMEDIAN, a novel pipeline to initialize spatiotemporal transformers for action spotting, which involves self-supervised learning and knowledge distillation. Action spotting is a timestamp-level temporal action detection task. Our pipeline consists of three steps, with two initialization stages. First, we perform self-supervised initialization of a spatial transformer using short videos as input. Additionally, we initialize a temporal transformer that enhances the spatial transformer's outputs with global context through knowledge distillation from a pre-computed feature bank aligned with each short video segment. In the final step, we fine-tune the transformers to the action spotting task. The experiments, conducted on the SoccerNet-v2 dataset, demonstrate state-of-the-art performance and validate the effectiveness of COMEDIAN's pretraining paradigm. Our results highlight several advantages of our pretraining pipeline, including improved performance and faster convergence compared to non-pretrained models.

PDF Abstract

Code

Add Remove Mark official

juliendenize/eztorch official

Tasks

Add Remove

Action Detection

Action Spotting

Knowledge Distillation

Self-Supervised Learning

Datasets

Kinetics

Kinetics 400 SoccerNet-v2 SoccerNet

Results from the Paper

Edit

Ranked #1 on Action Spotting on SoccerNet-v2

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Action Spotting	SoccerNet-v2	COMEDIAN (ViSwin T ens.)	Average-mAP	77.6	# 2	Compare
Action Spotting	SoccerNet-v2	COMEDIAN (ViSwin T ens.)	Tight Average-mAP	73.1	# 1	Compare
Action Spotting	SoccerNet-v2	COMEDIAN (ViSwin T)	Average-mAP	76.6	# 4	Compare
Action Spotting	SoccerNet-v2	COMEDIAN (ViSwin T)	Tight Average-mAP	71.6	# 3	Compare
Action Spotting	SoccerNet-v2	COMEDIAN (ViViT T ens.)	Average-mAP	77.1	# 3	Compare
Action Spotting	SoccerNet-v2	COMEDIAN (ViViT T ens.)	Tight Average-mAP	72.0	# 2	Compare
Action Spotting	SoccerNet-v2	COMEDIAN (ViViT T)	Average-mAP	76.1	# 5	Compare
Action Spotting	SoccerNet-v2	COMEDIAN (ViViT T)	Tight Average-mAP	70.7	# 4	Compare

Methods

Add Remove

Knowledge Distillation • Spatial Transformer

Edit Social Preview

COMEDIAN: Self-Supervised Learning and Knowledge Distillation for Action Spotting using Transformers

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Edit

Methods

Add Remove