Scene Understanding

513 papers with code • 3 benchmarks • 43 datasets

Scene Understanding is something that to understand a scene. For instance, iPhone has function that help eye disabled person to take a photo by discribing what the camera sees. This is an example of Scene Understanding.

Libraries

Use these libraries to find Scene Understanding models and implementations
4 papers
2,917
4 papers
1,124
See all 5 libraries.

AccidentBlip2: Accident Detection With Multi-View MotionBlip2

yihuajerry/accidentblip2 18 Apr 2024

We also extend our approach to a multi-vehicle cooperative system by deploying Motion Qformer on each vehicle and simultaneously inputting the inference-generated query into the MLP for autoregressive inference.

1
18 Apr 2024

ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation

sharpershape/eclair-dataset 16 Apr 2024

We introduce ECLAIR (Extended Classification of Lidar for AI Recognition), a new outdoor large-scale aerial LiDAR dataset designed specifically for advancing research in point cloud semantic segmentation.

8
16 Apr 2024

Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation

faceonlive/ai-research 5 Apr 2024

In this work, we introduce Sigma, a Siamese Mamba network for multi-modal semantic segmentation, utilizing the Selective Structured State Space Model, Mamba.

144
05 Apr 2024

VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection

skmhrk1209/VSRD 2 Apr 2024

In the auto-labeling stage, we represent the surface of each instance as a signed distance field (SDF) and render its silhouette as an instance mask through our proposed instance-aware volumetric silhouette rendering.

18
02 Apr 2024

Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping

kwonjunn01/hi-mapper 1 Apr 2024

Visual scenes are naturally organized in a hierarchy, where a coarse semantic is recursively comprised of several fine details.

5
01 Apr 2024

GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields

wangys16/gov-nesf 1 Apr 2024

Recent advancements in vision-language foundation models have significantly enhanced open-vocabulary 3D scene understanding.

2
01 Apr 2024

Object Pose Estimation via the Aggregation of Diffusion Features

tianfu18/diff-feats-pose 27 Mar 2024

To achieve this, we propose three distinct architectures that can effectively capture and aggregate diffusion features of different granularity, greatly improving the generalizability of object pose estimation.

10
27 Mar 2024

Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding

ldkong1205/calib3d 25 Mar 2024

Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models.

40
25 Mar 2024

DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding

saitpublic/doctr 25 Mar 2024

In this work, we propose a novel Disentangled Object-Centric TRansformer (DOCTR) that explores object-centric representation to facilitate learning with multiple objects for the multiple sub-tasks in a unified manner.

1
25 Mar 2024

Volumetric Environment Representation for Vision-Language Navigation

defaultrui/vln-ver 21 Mar 2024

To achieve a comprehensive 3D representation with fine-grained details, we introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.

6
21 Mar 2024