TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK	REMOVE
Action Recognition	Something-Something V1	AE-Net (8+16frames)	Top 1 Accuracy	55.0	# 28

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/ae-net-adjoint-enhancement-network-for/action-recognition-in-videos-on-something-1)](https://paperswithcode.com/sota/action-recognition-in-videos-on-something-1?p=ae-net-adjoint-enhancement-network-for)`

AE-Net:Adjoint Enhancement Network for Efficient Action Recognition in Video Understanding

TMM 2022 · Bin Wang, Chunsheng Liu, Faliang Chang, Wenqian Wang and Nanjun Li ·

Action recognition in video understanding is a challenging task, largely because of the complexity and difficulty in temporal modeling, making it suffer from motion information loss and misalignment of temporal attention in spatial dimensions. To overcome these difficulties, we propose a novel temporal modeling method called Adjoint Enhancement Network (AE-Net), which can fully explore clues of motion and time in the long-range structure. The AE-Net mainly consists of two new modules: the Initial Adjoint Enhancement Module (IAE-Module), which deals with shallow features; and the Global Adjoint Enhancement Module (GAE-Module), which deals with global features. With a novel mechanism of parallel spatio-temporal convolution and difference fusion, the IAE-Module is to enhance the degree of motion transformation in shallow network features, exciting the potential of motion flow and avoiding motion information loss. The GAE-Module is proposed to improve the local temporal representation in long-range structures by feeding the enhanced feature differences into a spatial cascade module with residuals to resolve the misalignment of temporal attention in the spatial dimension.The experimental results show that our AE-Net can achieve state-of-the-art results in Something-Something V1, UCF101 and HMDB-51 datasets.

PDF