Scene Understanding

516 papers with code • 3 benchmarks • 43 datasets

Scene Understanding is something that to understand a scene. For instance, iPhone has function that help eye disabled person to take a photo by discribing what the camera sees. This is an example of Scene Understanding.

Benchmarks

Add a Result

These leaderboards are used to track progress in Scene Understanding

Dataset	Best Model	Compare
ADE20K val	CPN(ResNet-101)	See all
Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)	ACRV Baseline	See all
Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)	ACRV Baseline	See all

Libraries

Use these libraries to find Scene Understanding models and implementations

osmr/imgclsmob

4 papers

2,919

Pointcept/Pointcept

4 papers

1,149

PaddlePaddle/PaddleDetection

2 papers

12,105

open-mmlab/mmdetection3d

2 papers

4,842

See all 5 libraries.

Datasets

Subtasks

road scene understanding

Monocular Cross-View Road Scene Parsing(Road)

Outdoor Light Source Estimation

Most implemented papers

Most implemented Social Latest No code

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

garrickbrazil/M3D-RPN • • ICCV 2019

Understanding the world in 3D is a critical component of urban autonomous driving.

Paper
Code

Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene Understanding

tue-mps/panoptic_parts • • 16 Apr 2020

In this technical report, we present two novel datasets for image scene understanding.

Paper
Code

Multi-View Radar Semantic Segmentation

valeoai/MVRSS • • ICCV 2021

Understanding the scene around the ego-vehicle is key to assisted and autonomous driving.

Paper
Code

P2T: Pyramid Pooling Transformer for Scene Understanding

yuhuan-wu/P2T • • 22 Jun 2021

A popular solution to this problem is to use a single pooling operation to reduce the sequence length.

Paper
Code

Pixel-Wise Recognition for Holistic Surgical Scene Understanding

bcv-uniandes/tapir • • 20 Jan 2024

This paper presents the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies (GraSP) dataset, a curated benchmark that models surgical scene understanding as a hierarchy of complementary tasks with varying levels of granularity.

Paper
Code

Joint 2D-3D-Semantic Data for Indoor Scene Understanding

alexsax/2D-3D-Semantics • 3 Feb 2017

We present a dataset of large-scale indoor spaces that provides a variety of mutually registered modalities from 2D, 2. 5D and 3D domains, with instance-level semantic and geometric annotations.

Paper
Code

Dilated Residual Networks

fyu/drn • • CVPR 2017

Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernible.

Paper
Code

Efficient ConvNet for Real-time Semantic Segmentation

mindspore-ai/models • • 1 Jun 2017

Semantic segmentation is a task that covers most of the perception needs of intelligent vehicles in an unified way.

Paper
Code

Multi-Resolution Multi-Modal Sensor Fusion For Remote Sensing Data With Label Uncertainty

GatorSense/MIMRF • 2 May 2018

It is valuable to fuse outputs from multiple sensors to boost overall performance.

Paper
Code

Single Shot Scene Text Retrieval

lluisgomez/single-shot-str • • ECCV 2018

In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database.

Paper
Code

Scene Understanding

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Most implemented papers

Content

Benchmarks

Add a Result