125 papers with code • 8 benchmarks • 13 datasets
The Monocular Depth Estimation is the task of estimating scene depth using a single image.
Using our approach, existing monocular depth estimation techniques can be effectively applied to dual-pixel data, and much smaller models can be constructed that still infer high quality depth.
Per-pixel ground-truth depth data is challenging to acquire at scale.
Accurate depth estimation from images is a fundamental task in many applications including scene understanding and reconstruction.
Ranked #8 on Monocular Depth Estimation on NYU-Depth V2
We introduce dense vision transformers, an architecture that leverages vision transformers in place of convolutional networks as a backbone for dense prediction tasks.
Ranked #1 on Monocular Depth Estimation on NYU-Depth V2 (using extra training data)
In particular, we propose a robust training objective that is invariant to changes in depth range and scale, advocate the use of principled multi-objective learning to combine data from different sources, and highlight the importance of pretraining encoders on auxiliary tasks.
Apart from these, several image manipulation techniques using these plugins have been compiled and demonstrated in the YouTube channel (https://youtube. com/user/kritiksoman) with the objective of demonstrating the use-cases for machine learning based image modification.
Instead of using semantic labels and proxy losses in a multi-task approach, we propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning via pixel-adaptive convolutions.
Although cameras are ubiquitous, robotic platforms typically rely on active sensors like LiDAR for direct 3D perception.