Scene Understanding

510 papers with code • 3 benchmarks • 43 datasets

Scene Understanding is something that to understand a scene. For instance, iPhone has function that help eye disabled person to take a photo by discribing what the camera sees. This is an example of Scene Understanding.

Benchmarks

Add a Result

These leaderboards are used to track progress in Scene Understanding

Dataset	Best Model	Compare
ADE20K val	CPN(ResNet-101)	See all
Semantic Scene Understanding Challenge (passive actuation & ground-truth localisation)	ACRV Baseline	See all
Semantic Scene Understanding Challenge (active actuation & ground-truth localisation)	ACRV Baseline	See all

Libraries

Use these libraries to find Scene Understanding models and implementations

osmr/imgclsmob

4 papers

2,916

Pointcept/Pointcept

4 papers

1,097

PaddlePaddle/PaddleDetection

2 papers

12,012

open-mmlab/mmdetection3d

2 papers

4,766

See all 5 libraries.

Datasets

Subtasks

road scene understanding

Monocular Cross-View Road Scene Parsing(Road)

Outdoor Light Source Estimation

Latest papers with no code

Most implemented Social Latest No code

Depth Estimation using Weighted-loss and Transfer Learning

no code yet • 11 Apr 2024

The optimized loss function is a combination of weighted losses to which enhance robustness and generalization: Mean Absolute Error (MAE), Edge Loss and Structural Similarity Index (SSIM).

Paper
Add Code

Mitigating Object Dependencies: Improving Point Cloud Self-Supervised Learning through Object Exchange

no code yet • 11 Apr 2024

Subsequently, we introduce a context-aware feature learning strategy, which encodes object patterns without relying on their specific context by aggregating object features across various scenes.

Paper
Add Code

Gaga: Group Any Gaussians via 3D-aware Memory Bank

no code yet • 11 Apr 2024

We introduce Gaga, a framework that reconstructs and segments open-world 3D scenes by leveraging inconsistent 2D masks predicted by zero-shot segmentation models.

Paper
Add Code

O2V-Mapping: Online Open-Vocabulary Mapping with Neural Implicit Representation

no code yet • 10 Apr 2024

Online construction of open-ended language scenes is crucial for robotic applications, where open-vocabulary interactive scene understanding is required.

Paper
Add Code

Incorporating Explanations into Human-Machine Interfaces for Trust and Situation Awareness in Autonomous Vehicles

no code yet • 10 Apr 2024

In this sense, explainability of real-time decisions is a crucial and natural requirement for building trust in autonomous vehicles.

Paper
Add Code

QueSTMaps: Queryable Semantic Topological Maps for 3D Scene Understanding

no code yet • 9 Apr 2024

Understanding the structural organisation of 3D indoor scenes in terms of rooms is often accomplished via floorplan extraction.

Paper
Add Code

DaF-BEVSeg: Distortion-aware Fisheye Camera based Bird's Eye View Segmentation with Occlusion Reasoning

no code yet • 9 Apr 2024

We implement a baseline by applying cylindrical rectification on the fisheye images and using a standard LSS-based BEV segmentation model.

Paper
Add Code

Panoptic Perception: A Novel Task and Fine-grained Dataset for Universal Remote Sensing Image Interpretation

no code yet • 6 Apr 2024

Experimental results on FineGrip demonstrate the feasibility of the panoptic perception task and the beneficial effect of multi-task joint optimization on individual tasks.

Paper
Add Code

You Only Scan Once: A Dynamic Scene Reconstruction Pipeline for 6-DoF Robotic Grasping of Novel Objects

no code yet • 4 Apr 2024

In the realm of robotic grasping, achieving accurate and reliable interactions with the environment is a pivotal challenge.

Paper
Add Code

NeRF-MAE : Masked AutoEncoders for Self Supervised 3D representation Learning for Neural Radiance Fields

no code yet • 1 Apr 2024

Given the capabilities of neural fields in densely representing a 3D scene from 2D images, we ask the question: Can we scale their self-supervised pretraining, specifically using masked autoencoders, to generate effective 3D representations from posed RGB images.

Paper
Add Code

Scene Understanding

Benchmarks Add a Result

Libraries

Datasets

Subtasks

Latest papers with no code

Content

Benchmarks

Add a Result