Scene Understanding

516 papers with code • 3 benchmarks • 43 datasets

Scene Understanding is something that to understand a scene. For instance, iPhone has function that help eye disabled person to take a photo by discribing what the camera sees. This is an example of Scene Understanding.

Libraries

Use these libraries to find Scene Understanding models and implementations
4 papers
2,919
4 papers
1,149
See all 5 libraries.

Most implemented papers

M3D-RPN: Monocular 3D Region Proposal Network for Object Detection

garrickbrazil/M3D-RPN ICCV 2019

Understanding the world in 3D is a critical component of urban autonomous driving.

Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene Understanding

tue-mps/panoptic_parts 16 Apr 2020

In this technical report, we present two novel datasets for image scene understanding.

Multi-View Radar Semantic Segmentation

valeoai/MVRSS ICCV 2021

Understanding the scene around the ego-vehicle is key to assisted and autonomous driving.

P2T: Pyramid Pooling Transformer for Scene Understanding

yuhuan-wu/P2T 22 Jun 2021

A popular solution to this problem is to use a single pooling operation to reduce the sequence length.

Pixel-Wise Recognition for Holistic Surgical Scene Understanding

bcv-uniandes/tapir 20 Jan 2024

This paper presents the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies (GraSP) dataset, a curated benchmark that models surgical scene understanding as a hierarchy of complementary tasks with varying levels of granularity.

Joint 2D-3D-Semantic Data for Indoor Scene Understanding

alexsax/2D-3D-Semantics 3 Feb 2017

We present a dataset of large-scale indoor spaces that provides a variety of mutually registered modalities from 2D, 2. 5D and 3D domains, with instance-level semantic and geometric annotations.

Dilated Residual Networks

fyu/drn CVPR 2017

Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernible.

Efficient ConvNet for Real-time Semantic Segmentation

mindspore-ai/models 1 Jun 2017

Semantic segmentation is a task that covers most of the perception needs of intelligent vehicles in an unified way.

Multi-Resolution Multi-Modal Sensor Fusion For Remote Sensing Data With Label Uncertainty

GatorSense/MIMRF 2 May 2018

It is valuable to fuse outputs from multiple sensors to boost overall performance.

Single Shot Scene Text Retrieval

lluisgomez/single-shot-str ECCV 2018

In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database.