Semantic Segmentation
5082 papers with code • 120 benchmarks • 303 datasets
Semantic Segmentation is a computer vision task in which the goal is to categorize each pixel in an image into a class or object. The goal is to produce a dense pixel-wise segmentation map of an image, where each pixel is assigned to a specific class or object. Some example benchmarks for this task are Cityscapes, PASCAL VOC and ADE20K. Models are usually evaluated with the Mean Intersection-Over-Union (Mean IoU) and Pixel Accuracy metrics.
( Image credit: CSAILVision )
Libraries
Use these libraries to find Semantic Segmentation models and implementationsSubtasks
- Tumor Segmentation
- Panoptic Segmentation
- 3D Semantic Segmentation
- Weakly-Supervised Semantic Segmentation
- Weakly-Supervised Semantic Segmentation
- Scene Segmentation
- Semi-Supervised Semantic Segmentation
- Real-Time Semantic Segmentation
- 3D Part Segmentation
- Unsupervised Semantic Segmentation
- Road Segmentation
- One-Shot Segmentation
- Bird's-Eye View Semantic Segmentation
- Crack Segmentation
- Universal Segmentation
- Class-Incremental Semantic Segmentation
- UNET Segmentation
- Polyp Segmentation
- Vision-Language Segmentation
- 4D Spatio Temporal Semantic Segmentation
- Histopathological Segmentation
- Attentive segmentation networks
- Text-Line Extraction
- Aerial Video Semantic Segmentation
- Amodal Panoptic Segmentation
- Robust BEV Map Segmentation
Latest papers
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
In this paper, we further adapt the selective scanning process of Mamba to the visual domain, enhancing its ability to learn features from two-dimensional images by (i) a continuous 2D scanning process that improves spatial continuity by ensuring adjacency of tokens in the scanning sequence, and (ii) direction-aware updating which enables the model to discern the spatial relations of tokens by encoding directional information.
The Need for Speed: Pruning Transformers with One Recipe
We introduce the $\textbf{O}$ne-shot $\textbf{P}$runing $\textbf{T}$echnique for $\textbf{I}$nterchangeable $\textbf{N}$etworks ($\textbf{OPTIN}$) framework as a tool to increase the efficiency of pre-trained transformer architectures $\textit{without requiring re-training}$.
CoDA: Instructive Chain-of-Domain Adaptation with Severity-Aware Visual Prompt Tuning
SAVPT features a novel metric Severity that divides all adverse scene images into low-severity and high-severity images.
Efficient Video Object Segmentation via Modulated Cross-Attention Memory
Recently, transformer-based approaches have shown promising results for semi-supervised video object segmentation.
Optimizing LiDAR Placements for Robust Driving Perception in Adverse Conditions
The robustness of driving perception systems under unprecedented conditions is crucial for safety-critical usages.
3D-EffiViTCaps: 3D Efficient Vision Transformer with Capsule for Medical Image Segmentation
Our encoder uses capsule blocks and EfficientViT blocks to jointly capture local and global semantic information more effectively and efficiently with less information loss, while the decoder employs CNN blocks and EfficientViT blocks to catch ffner details for segmentation.
Segment Anything Model for Road Network Graph Extraction
We propose SAM-Road, an adaptation of the Segment Anything Model (SAM) for extracting large-scale, vectorized road network graphs from satellite imagery.
MatchSeg: Towards Better Segmentation via Reference Image Matching
Few-shot learning aims to overcome the need for annotated data by using a small labeled dataset, known as a support set, to guide predicting labels for new, unlabeled images, known as the query set.
Anytime, Anywhere, Anyone: Investigating the Feasibility of Segment Anything Model for Crowd-Sourcing Medical Image Annotations
Curating annotations for medical image segmentation is a labor-intensive and time-consuming task that requires domain expertise, resulting in "narrowly" focused deep learning (DL) models with limited translational utility.
BSNet: Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation
To generate higher quality pseudo-labels and achieve more precise weakly supervised 3DIS results, we propose the Box-Supervised Simulation-assisted Mean Teacher for 3D Instance Segmentation (BSNet), which devises a novel pseudo-labeler called Simulation-assisted Transformer.