Deep High-Resolution Representation Learning for Visual Recognition

20 Aug 2019Jingdong WangKe SunTianheng ChengBorui JiangChaorui DengYang ZhaoDong LiuYadong MuMingkui TanXinggang WangWenyu LiuBin Xiao

High-resolution representations are essential for position-sensitive vision problems, such as human pose estimation, semantic segmentation, and object detection. Existing state-of-the-art frameworks first encode the input image as a low-resolution representation through a subnetwork that is formed by connecting high-to-low resolution convolutions in series (e.g., ResNet, VGGNet), and then recover the high-resolution representation from the encoded low-resolution representation... (read more)

PDF Abstract
TASK DATASET MODEL METRIC NAME METRIC VALUE GLOBAL RANK BENCHMARK
Object Detection COCO test-dev HTC (HRNetV2p-W48) box AP 47.3 # 18
AP50 65.9 # 24
AP75 51.2 # 24
APS 28.0 # 28
APM 49.7 # 22
APL 59.8 # 21
Semantic Segmentation PASCAL Context HRNetV2 (HRNetV2-W48) mIoU 54.0 # 8

Methods used in the Paper