Semantic Segmentation

5036 papers with code • 119 benchmarks • 300 datasets

Semantic Segmentation is a computer vision task in which the goal is to categorize each pixel in an image into a class or object. The goal is to produce a dense pixel-wise segmentation map of an image, where each pixel is assigned to a specific class or object. Some example benchmarks for this task are Cityscapes, PASCAL VOC and ADE20K. Models are usually evaluated with the Mean Intersection-Over-Union (Mean IoU) and Pixel Accuracy metrics.

( Image credit: CSAILVision )

Libraries

Use these libraries to find Semantic Segmentation models and implementations
53 papers
8,140
30 papers
2,912
See all 38 libraries.

Most implemented papers

U-Net: Convolutional Networks for Biomedical Image Segmentation

labmlai/annotated_deep_learning_paper_implementations 18 May 2015

There is large consent that successful training of deep networks requires many thousand annotated training samples.

Deep Residual Learning for Image Recognition

tensorflow/models CVPR 2016

Deep residual nets are foundations of our submissions to ILSVRC & COCO 2015 competitions, where we also won the 1st places on the tasks of ImageNet detection, ImageNet localization, COCO detection, and COCO segmentation.

Mask R-CNN

tensorflow/models ICCV 2017

Our approach efficiently detects objects in an image while simultaneously generating a high-quality segmentation mask for each instance.

MobileNetV2: Inverted Residuals and Linear Bottlenecks

tensorflow/models CVPR 2018

In this paper we describe a new mobile architecture, MobileNetV2, that improves the state of the art performance of mobile models on multiple tasks and benchmarks as well as across a spectrum of different model sizes.

MMDetection: Open MMLab Detection Toolbox and Benchmark

open-mmlab/mmdetection 17 Jun 2019

In this paper, we introduce the various features of this toolbox.

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

google-research/vision_transformer ICLR 2021

While the Transformer architecture has become the de-facto standard for natural language processing tasks, its applications to computer vision remain limited.

FCOS: Fully Convolutional One-Stage Object Detection

tianzhi0549/FCOS ICCV 2019

By eliminating the predefined set of anchor boxes, FCOS completely avoids the complicated computation related to anchor boxes such as calculating overlapping during training.

Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation

tensorflow/models ECCV 2018

The former networks are able to encode multi-scale contextual information by probing the incoming features with filters or pooling operations at multiple rates and multiple effective fields-of-view, while the latter networks can capture sharper object boundaries by gradually recovering the spatial information.

SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation

PaddlePaddle/PaddleSeg 2 Nov 2015

We show that SegNet provides good performance with competitive inference time and more efficient inference memory-wise as compared to other architectures.