You Only Learn One Representation: Unified Network for Multiple Tasks

10 May 2021  ·  Chien-Yao Wang, I-Hau Yeh, Hong-Yuan Mark Liao ·

People ``understand'' the world via vision, hearing, tactile, and also the past experience. Human experience can be learned through normal learning (we call it explicit knowledge), or subconsciously (we call it implicit knowledge). These experiences learned through normal learning or subconsciously will be encoded and stored in the brain. Using these abundant experience as a huge database, human beings can effectively process data, even they were unseen beforehand. In this paper, we propose a unified network to encode implicit knowledge and explicit knowledge together, just like the human brain can learn knowledge from normal learning as well as subconsciousness learning. The unified network can generate a unified representation to simultaneously serve various tasks. We can perform kernel space alignment, prediction refinement, and multi-task learning in a convolutional neural network. The results demonstrate that when implicit knowledge is introduced into the neural network, it benefits the performance of all tasks. We further analyze the implicit representation learnt from the proposed unified network, and it shows great capability on catching the physical meaning of different tasks. The source code of this work is at : https://github.com/WongKinYiu/yolor.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Result Benchmark
Object Detection COCO minival YOLOR-D6 (1280, single-scale, 31 fps) AP50 73.5 # 9
AP75 60.6 # 6
APS 40.4 # 4
APM 60.1 # 4
APL 68.7 # 10
Object Detection COCO minival YOLOR-P6 (1280, single-scale, 72 fps) AP50 70.6 # 16
AP75 57.4 # 9
APS 37.4 # 8
APM 57.3 # 9
APL 65.2 # 18
Object Detection COCO test-dev YOLOR-D6 (1280, single-scale, 30 fps) box mAP 55.4 # 45
AP50 73.3 # 13
AP75 60.6 # 12
Real-Time Object Detection MS COCO YOLOR-P6 FPS (V100, b=1) 49 # 27
box AP 52.6 # 32
FPS 49 # 26
Real-Time Object Detection MS COCO YOLOR-E6 FPS (V100, b=1) 37 # 35
box AP 54.8 # 14
FPS 37 # 34
Real-Time Object Detection MS COCO YOLOR-W6 FPS (V100, b=1) 47 # 29
box AP 54.1 # 18
FPS 47 # 28
Real-Time Object Detection MS COCO YOLOR-D6 FPS (V100, b=1) 30 # 39
box AP 55.4 # 8
FPS 30 # 38
Real-Time Object Detection MS COCO YOLOR-P6D FPS (V100, b=1) 49 # 27
box AP 53 # 25
FPS 49 # 26

Methods