The idea of semantic segmentation is to recognize and understand what is in an image at the pixel-level.
|Trend||Dataset||Best Method||Paper title||Paper||Code||Compare|
The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information.
SOTA for Semantic Segmentation on Cityscapes (using extra training data)
In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes.
To handle the problem of segmenting objects at multiple scales, we design modules which employ atrous convolution in cascade or in parallel to capture multi-scale context by adopting multiple atrous rates.
#2 best model for Semantic Segmentation on PASCAL VOC 2012
Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.
#2 best model for Multi-Human Parsing on MHP v1.0
ASPP probes an incoming convolutional feature layer with filters at multiple sampling rates and effective fields-of-views, thus capturing objects as well as image context at multiple scales.
#6 best model for Semantic Segmentation on PASCAL Context
When we add our proposed global feature, and a technique for learning normalization parameters, accuracy increases consistently even over our improved versions of the baselines.
#11 best model for Semantic Segmentation on PASCAL Context
This is due to the very invariance properties that make DCNNs good for high level tasks.
SOTA for Scene Segmentation on SUN-RGBD
We study the problem of video-to-video synthesis, whose goal is to learn a mapping function from an input source video (e. g., a sequence of semantic segmentation masks) to an output photorealistic video that precisely depicts the content of the source video.