Scene Understanding

513 papers with code • 3 benchmarks • 43 datasets

Scene Understanding is something that to understand a scene. For instance, iPhone has function that help eye disabled person to take a photo by discribing what the camera sees. This is an example of Scene Understanding.

Benchmarks

Add a Result

These leaderboards are used to track progress in Scene Understanding

Dataset	Best Model	Compare
ADE20K val	CPN(ResNet-101)	See all
Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)	ACRV Baseline	See all
Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)	ACRV Baseline	See all

Libraries

Use these libraries to find Scene Understanding models and implementations

osmr/imgclsmob

4 papers

2,917

Pointcept/Pointcept

4 papers

1,124

PaddlePaddle/PaddleDetection

2 papers

12,048

open-mmlab/mmdetection3d

2 papers

4,799

See all 5 libraries.

Datasets

Subtasks

road scene understanding

Monocular Cross-View Road Scene Parsing(Road)

Outdoor Light Source Estimation

Latest papers

Most implemented Social Latest No code

AccidentBlip2: Accident Detection With Multi-View MotionBlip2

yihuajerry/accidentblip2 • 18 Apr 2024

We also extend our approach to a multi-vehicle cooperative system by deploying Motion Qformer on each vehicle and simultaneously inputting the inference-generated query into the MLP for autoregressive inference.

18 Apr 2024

Paper
Code

ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation

sharpershape/eclair-dataset • 16 Apr 2024

We introduce ECLAIR (Extended Classification of Lidar for AI Recognition), a new outdoor large-scale aerial LiDAR dataset designed specifically for advancing research in point cloud semantic segmentation.

16 Apr 2024

Paper
Code

Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

faceonlive/ai-research • 5 Apr 2024

In this work, we introduce Sigma, a Siamese Mamba network for multi-modal semantic segmentation, utilizing the Selective Structured State Space Model, Mamba.

144

05 Apr 2024

Paper
Code

VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection

skmhrk1209/VSRD • • 2 Apr 2024

In the auto-labeling stage, we represent the surface of each instance as a signed distance field (SDF) and render its silhouette as an instance mask through our proposed instance-aware volumetric silhouette rendering.

02 Apr 2024

Paper
Code

Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping

kwonjunn01/hi-mapper • 1 Apr 2024

Visual scenes are naturally organized in a hierarchy, where a coarse semantic is recursively comprised of several fine details.

01 Apr 2024

Paper
Code

GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields

wangys16/gov-nesf • 1 Apr 2024

Recent advancements in vision-language foundation models have significantly enhanced open-vocabulary 3D scene understanding.

01 Apr 2024

Paper
Code

Object Pose Estimation via the Aggregation of Diffusion Features

tianfu18/diff-feats-pose • • 27 Mar 2024

To achieve this, we propose three distinct architectures that can effectively capture and aggregate diffusion features of different granularity, greatly improving the generalizability of object pose estimation.

27 Mar 2024

Paper
Code

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

ldkong1205/calib3d • • 25 Mar 2024

Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models.

25 Mar 2024

Paper
Code

DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding

saitpublic/doctr • 25 Mar 2024

In this work, we propose a novel Disentangled Object-Centric TRansformer (DOCTR) that explores object-centric representation to facilitate learning with multiple objects for the multiple sub-tasks in a unified manner.

25 Mar 2024

Paper
Code

Volumetric Environment Representation for Vision-Language Navigation

defaultrui/vln-ver • • 21 Mar 2024

To achieve a comprehensive 3D representation with fine-grained details, we introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.

21 Mar 2024

Paper
Code

Scene Understanding

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result