Thinned U-shape Module, or TUM, is a feature extraction block used for object detection models. It was introduced as part of the M2Det architecture. Different from FPN and RetinaNet, TUM adopts a thinner U-shape structure as illustrated in the Figure to the right. The encoder is a series of 3x3 convolution layers with stride 2. And the decoder takes the outputs of these layers as its reference set of feature maps, while the original FPN chooses the output of the last layer of each stage in ResNet backbone.
In addition, with TUM, we add 1x1 convolution layers after the upsample and element-wise sum operation at the decoder branch to enhance learning ability and keep smoothness for the features. In the context of M2Det, all of the outputs in the decoder of each TUM form the multi-scale features of the current level. As a whole, the outputs of stacked TUMs form the multi-level multi-scale features, while the front TUM mainly provides shallow-level features, the middle TUM provides medium-level features, and the back TUM provides deep-level features.
Source:PAPER | DATE |
---|---|
Learning to Segment Dynamic Objects using SLAM Outliers
• • • |
2020-11-12 |
Learning Rolling Shutter Correction from Real Data without Camera Motion Assumption
|
2020-11-05 |
Deep Probabilistic Feature-metric Tracking
• • |
2020-08-31 |
A Multi-Task Learning Approach for Human Activity Segmentation and Ergonomics Risk Assessment
• |
2020-08-07 |
Structure-SLAM: Low-Drift Monocular SLAM in Indoor Environments
• • • • |
2020-08-05 |
Dynamic Object Tracking and Masking for Visual SLAM
• • • • • |
2020-07-31 |
Learning Monocular Visual Odometry via Self-Supervised Long-Term Modeling
• • • • |
2020-07-21 |
VOLDOR: Visual Odometry From Log-Logistic Dense Optical Flow Residuals
|
2020-06-01 |
M2Det: A Single-Shot Object Detector based on Multi-Level Feature Pyramid Network
|
2018-11-12 |
TASK | PAPERS | SHARE |
---|---|---|
Object Tracking | 2 | 12.50% |
Visual Odometry | 2 | 12.50% |
Semantic Segmentation | 1 | 6.25% |
Structure from Motion | 1 | 6.25% |
Action Detection | 1 | 6.25% |
Multi-Task Learning | 1 | 6.25% |
Loop Closure Detection | 1 | 6.25% |
Monocular Visual Odometry | 1 | 6.25% |
Pose Estimation | 1 | 6.25% |
COMPONENT | TYPE |
|
---|---|---|
![]() |
Convolutions | |
![]() |
Normalization | |
![]() |
Convolutions | |
![]() |
Activation Functions |