Thinned U-shape Module, or TUM, is a feature extraction block used for object detection models. It was introduced as part of the M2Det architecture. Different from FPN and RetinaNet, TUM adopts a thinner U-shape structure as illustrated in the Figure to the right. The encoder is a series of 3x3 convolution layers with stride 2. And the decoder takes the outputs of these layers as its reference set of feature maps, while the original FPN chooses the output of the last layer of each stage in ResNet backbone.

In addition, with TUM, we add 1x1 convolution layers after the upsample and element-wise sum operation at the decoder branch to enhance learning ability and keep smoothness for the features. In the context of M2Det, all of the outputs in the decoder of each TUM form the multi-scale features of the current level. As a whole, the outputs of stacked TUMs form the multi-level multi-scale features, while the front TUM mainly provides shallow-level features, the middle TUM provides medium-level features, and the back TUM provides deep-level features.

Source: M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network

Latest Papers

Learning to Segment Dynamic Objects using SLAM Outliers
Adrian BojkoRomain DupontMohamed TamaazoustiHervé Le Borgne
Learning Rolling Shutter Correction from Real Data without Camera Motion Assumption
| Jiawei MoMd Jahidul IslamJunaed Sattar
Deep Probabilistic Feature-metric Tracking
Binbin XuAndrew J. DavisonStefan Leutenegger
A Multi-Task Learning Approach for Human Activity Segmentation and Ergonomics Risk Assessment
Behnoosh ParsaAshis G. Banerjee
Structure-SLAM: Low-Drift Monocular SLAM in Indoor Environments
Yanyan LiNikolas BraschYida WangNassir NavabFederico Tombari
Dynamic Object Tracking and Masking for Visual SLAM
Jonathan VincentMathieu LabbéJean-Samuel LauzonFrançois GrondinPier-Marc Comtois-RivetFrançois Michaud
Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling
Yuliang ZouPan JiQuoc-Huy TranJia-Bin HuangManmohan Chandraker
VOLDOR: Visual Odometry From Log-Logistic Dense Optical Flow Residuals
| Zhixiang Min Yiding Yang Enrique Dunn
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network
| Qijie ZhaoTao ShengYongtao WangZhi TangYing ChenLing CaiHaibin Ling