Scene Understanding

516 papers with code • 3 benchmarks • 43 datasets

Scene Understanding is something that to understand a scene. For instance, iPhone has function that help eye disabled person to take a photo by discribing what the camera sees. This is an example of Scene Understanding.

Libraries

Use these libraries to find Scene Understanding models and implementations
4 papers
2,918
4 papers
1,139
See all 5 libraries.

Volumetric Environment Representation for Vision-Language Navigation

defaultrui/vln-ver 21 Mar 2024

To achieve a comprehensive 3D representation with fine-grained details, we introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.

7
21 Mar 2024

Hierarchical Spatial Proximity Reasoning for Vision-and-Language Navigation

18979705623/hspr 18 Mar 2024

Most Vision-and-Language Navigation (VLN) algorithms tend to make decision errors, primarily due to a lack of visual common sense and insufficient reasoning capabilities.

3
18 Mar 2024

GroupContrast: Semantic-aware Self-supervised Representation Learning for 3D Understanding

dvlab-research/groupcontrast 14 Mar 2024

To address this issue, we propose GroupContrast, a novel approach that combines segment grouping and semantic-aware contrastive learning.

32
14 Mar 2024

MoAI: Mixture of All Intelligence for Large Language and Vision Models

ByungKwanLee/MoAI 12 Mar 2024

Therefore, we present a new LLVM, Mixture of All Intelligence (MoAI), which leverages auxiliary visual information obtained from the outputs of external segmentation, detection, SGG, and OCR models.

246
12 Mar 2024

Stealing Stable Diffusion Prior for Robust Monocular Depth Estimation

hitcslj/ssd 8 Mar 2024

This paper introduces a novel approach named Stealing Stable Diffusion (SSD) prior for robust monocular depth estimation.

20
08 Mar 2024

Embodied Understanding of Driving Scenarios

opendrivelab/elm 7 Mar 2024

Hereby, we introduce the Embodied Language Model (ELM), a comprehensive framework tailored for agents' understanding of driving scenes with large spatial and temporal spans.

92
07 Mar 2024

FusionVision: A comprehensive approach of 3D object reconstruction and segmentation from RGB-D cameras using YOLO and fast segment anything

safouaneelg/FusionVision 29 Feb 2024

Therefore, this paper introduces FusionVision, an exhaustive pipeline adapted for the robust 3D segmentation of objects in RGB-D imagery.

23
29 Feb 2024

One model to use them all: Training a segmentation model with complementary datasets

nct_tso_public/dsad-segmentation 29 Feb 2024

In this work, we propose a method to combine multiple partially annotated datasets, which provide complementary annotations, into one model, enabling better scene segmentation and the use of multiple readily available datasets.

0
29 Feb 2024

Swin3D++: Effective Multi-Source Pretraining for 3D Indoor Scene Understanding

microsoft/swin3d 22 Feb 2024

Data diversity and abundance are essential for improving the performance and generalization of models in natural language processing and 2D vision.

171
22 Feb 2024

Semantically-aware Neural Radiance Fields for Visual Scene Understanding: A Comprehensive Review

abourki/sota-semantically-aware-nerfs 17 Feb 2024

This review thoroughly examines the role of semantically-aware Neural Radiance Fields (NeRFs) in visual scene understanding, covering an analysis of over 250 scholarly papers.

12
17 Feb 2024