Scene Understanding
513 papers with code • 3 benchmarks • 43 datasets
Scene Understanding is something that to understand a scene. For instance, iPhone has function that help eye disabled person to take a photo by discribing what the camera sees. This is an example of Scene Understanding.
Benchmarks
These leaderboards are used to track progress in Scene Understanding
Libraries
Use these libraries to find Scene Understanding models and implementationsDatasets
Subtasks
Latest papers
AccidentBlip2: Accident Detection With Multi-View MotionBlip2
We also extend our approach to a multi-vehicle cooperative system by deploying Motion Qformer on each vehicle and simultaneously inputting the inference-generated query into the MLP for autoregressive inference.
ECLAIR: A High-Fidelity Aerial LiDAR Dataset for Semantic Segmentation
We introduce ECLAIR (Extended Classification of Lidar for AI Recognition), a new outdoor large-scale aerial LiDAR dataset designed specifically for advancing research in point cloud semantic segmentation.
Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation
In this work, we introduce Sigma, a Siamese Mamba network for multi-modal semantic segmentation, utilizing the Selective Structured State Space Model, Mamba.
VSRD: Instance-Aware Volumetric Silhouette Rendering for Weakly Supervised 3D Object Detection
In the auto-labeling stage, we represent the surface of each instance as a signed distance field (SDF) and render its silhouette as an instance mask through our proposed instance-aware volumetric silhouette rendering.
Improving Visual Recognition with Hyperbolical Visual Hierarchy Mapping
Visual scenes are naturally organized in a hierarchy, where a coarse semantic is recursively comprised of several fine details.
GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields
Recent advancements in vision-language foundation models have significantly enhanced open-vocabulary 3D scene understanding.
Object Pose Estimation via the Aggregation of Diffusion Features
To achieve this, we propose three distinct architectures that can effectively capture and aggregate diffusion features of different granularity, greatly improving the generalizability of object pose estimation.
Calib3D: Calibrating Model Preferences for Reliable 3D Scene Understanding
Safety-critical 3D scene understanding tasks necessitate not only accurate but also confident predictions from 3D perception models.
DOCTR: Disentangled Object-Centric Transformer for Point Scene Understanding
In this work, we propose a novel Disentangled Object-Centric TRansformer (DOCTR) that explores object-centric representation to facilitate learning with multiple objects for the multiple sub-tasks in a unified manner.
Volumetric Environment Representation for Vision-Language Navigation
To achieve a comprehensive 3D representation with fine-grained details, we introduce a Volumetric Environment Representation (VER), which voxelizes the physical world into structured 3D cells.