Scene Understanding
516 papers with code • 3 benchmarks • 43 datasets
Scene Understanding is something that to understand a scene. For instance, iPhone has function that help eye disabled person to take a photo by discribing what the camera sees. This is an example of Scene Understanding.
Benchmarks
These leaderboards are used to track progress in Scene Understanding
Libraries
Use these libraries to find Scene Understanding models and implementationsDatasets
Subtasks
Most implemented papers
M3D-RPN: Monocular 3D Region Proposal Network for Object Detection
Understanding the world in 3D is a critical component of urban autonomous driving.
Cityscapes-Panoptic-Parts and PASCAL-Panoptic-Parts datasets for Scene Understanding
In this technical report, we present two novel datasets for image scene understanding.
Multi-View Radar Semantic Segmentation
Understanding the scene around the ego-vehicle is key to assisted and autonomous driving.
P2T: Pyramid Pooling Transformer for Scene Understanding
A popular solution to this problem is to use a single pooling operation to reduce the sequence length.
Pixel-Wise Recognition for Holistic Surgical Scene Understanding
This paper presents the Holistic and Multi-Granular Surgical Scene Understanding of Prostatectomies (GraSP) dataset, a curated benchmark that models surgical scene understanding as a hierarchy of complementary tasks with varying levels of granularity.
Joint 2D-3D-Semantic Data for Indoor Scene Understanding
We present a dataset of large-scale indoor spaces that provides a variety of mutually registered modalities from 2D, 2. 5D and 3D domains, with instance-level semantic and geometric annotations.
Dilated Residual Networks
Convolutional networks for image classification progressively reduce resolution until the image is represented by tiny feature maps in which the spatial structure of the scene is no longer discernible.
Efficient ConvNet for Real-time Semantic Segmentation
Semantic segmentation is a task that covers most of the perception needs of intelligent vehicles in an unified way.
Multi-Resolution Multi-Modal Sensor Fusion For Remote Sensing Data With Label Uncertainty
It is valuable to fuse outputs from multiple sensors to boost overall performance.
Single Shot Scene Text Retrieval
In this way, the text based image retrieval task can be casted as a simple nearest neighbor search of the query text representation over the outputs of the CNN over the entire image database.