Scene Understanding

516 papers with code • 3 benchmarks • 43 datasets

Scene Understanding is something that to understand a scene. For instance, iPhone has function that help eye disabled person to take a photo by discribing what the camera sees. This is an example of Scene Understanding.

Benchmarks

Add a Result

These leaderboards are used to track progress in Scene Understanding

Dataset	Best Model	Compare
ADE20K val	CPN(ResNet-101)	See all
Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)	ACRV Baseline	See all
Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)	ACRV Baseline	See all

Libraries

Use these libraries to find Scene Understanding models and implementations

osmr/imgclsmob

4 papers

2,918

Pointcept/Pointcept

4 papers

1,139

PaddlePaddle/PaddleDetection

2 papers

12,088

open-mmlab/mmdetection3d

2 papers

4,828

See all 5 libraries.

Datasets

Subtasks

road scene understanding

Monocular Cross-View Road Scene Parsing(Road)

Outdoor Light Source Estimation

Latest papers

Most implemented Social Latest No code

Volumetric Environment Representation for Vision-Language Navigation

defaultrui/vln-ver • • 21 Mar 2024

To achieve a comprehensive 3D representation with fine-grained details, we introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.

21 Mar 2024

Paper
Code

Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation

18979705623/hspr • • 18 Mar 2024

Most Vision-and-Language Navigation (VLN) algorithms tend to make decision errors, primarily due to a lack of visual common sense and insufficient reasoning capabilities.

18 Mar 2024

Paper
Code

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

dvlab-research/groupcontrast • 14 Mar 2024

To address this issue, we propose GroupContrast, a novel approach that combines segment grouping and semantic-aware contrastive learning.

14 Mar 2024

Paper
Code

MoAI: Mixture of All Intelligence for Large Language and Vision Models

ByungKwanLee/MoAI • • 12 Mar 2024

Therefore, we present a new LLVM, Mixture of All Intelligence (MoAI), which leverages auxiliary visual information obtained from the outputs of external segmentation, detection, SGG, and OCR models.

246

12 Mar 2024

Paper
Code

Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation

hitcslj/ssd • 8 Mar 2024

This paper introduces a novel approach named Stealing Stable Diffusion (SSD) prior for robust monocular depth estimation.

08 Mar 2024

Paper
Code

Embodied Understanding of Driving Scenarios

opendrivelab/elm • • 7 Mar 2024

Hereby, we introduce the Embodied Language Model (ELM), a comprehensive framework tailored for agents' understanding of driving scenes with large spatial and temporal spans.

07 Mar 2024

Paper
Code

FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything

safouaneelg/FusionVision • • 29 Feb 2024

Therefore, this paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery.

29 Feb 2024

Paper
Code

One model to use them all: Training a segmentation model with complementary datasets

nct_tso_public/dsad-segmentation • • 29 Feb 2024

In this work, we propose a method to combine multiple partially annotated datasets, which provide complementary annotations, into one model, enabling better scene segmentation and the use of multiple readily available datasets.

29 Feb 2024

Paper
Code

Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding

microsoft/swin3d • • 22 Feb 2024

Data diversity and abundance are essential for improving the performance and generalization of models in natural language processing and 2D vision.

171

22 Feb 2024

Paper
Code

Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review

abourki/sota-semantically-aware-nerfs • 17 Feb 2024

This review thoroughly examines the role of semantically-aware Neural Radiance Fields (NeRFs) in visual scene understanding, covering an analysis of over 250 scholarly papers.

17 Feb 2024

Paper
Code

Scene Understanding

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers

Content

Benchmarks

Add a Result