TASK	DATASET	MODEL	METRIC NAME	METRIC VALUE	GLOBAL RANK
Monocular Depth Estimation	KITTI Eigen split	EVP	absolute relative error	0.048	# 9
Monocular Depth Estimation	KITTI Eigen split	EVP	RMSE	2.015	# 13
Monocular Depth Estimation	KITTI Eigen split	EVP	Sq Rel	0.136	# 19
Monocular Depth Estimation	KITTI Eigen split	EVP	RMSE log	0.073	# 10
Monocular Depth Estimation	KITTI Eigen split	EVP	Delta < 1.25	0.980	# 8
Monocular Depth Estimation	KITTI Eigen split	EVP	Delta < 1.25^2	0.998	# 1
Monocular Depth Estimation	KITTI Eigen split	EVP	Delta < 1.25^3	1.000	# 1
Monocular Depth Estimation	NYU-Depth V2	EVP	RMSE	0.224	# 6
Monocular Depth Estimation	NYU-Depth V2	EVP	absolute relative error	0.061	# 6
Monocular Depth Estimation	NYU-Depth V2	EVP	Delta < 1.25	0.976	# 6
Monocular Depth Estimation	NYU-Depth V2	EVP	Delta < 1.25^2	0.997	# 3
Monocular Depth Estimation	NYU-Depth V2	EVP	Delta < 1.25^3	0.999	# 4
Monocular Depth Estimation	NYU-Depth V2	EVP	log 10	0.027	# 6
Depth Estimation	NYU-Depth V2	EVP	RMS	0.224	# 1
Referring Expression Segmentation	RefCOCO testA	EVP	Overall IoU	78.75	# 1
Referring Expression Segmentation	RefCOCO testB	EVP	Overall IoU	72.94	# 1
Referring Expression Segmentation	RefCoCo val	EVP	Overall IoU	76.35	# 2

Badge	Markdown
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evp-enhanced-visual-perception-using-inverse/depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/depth-estimation-on-nyu-depth-v2?p=evp-enhanced-visual-perception-using-inverse)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evp-enhanced-visual-perception-using-inverse/referring-expression-segmentation-on-refcoco-8)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-8?p=evp-enhanced-visual-perception-using-inverse)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evp-enhanced-visual-perception-using-inverse/referring-expression-segmentation-on-refcoco-9)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-9?p=evp-enhanced-visual-perception-using-inverse)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evp-enhanced-visual-perception-using-inverse/referring-expression-segmentation-on-refcoco-7)](https://paperswithcode.com/sota/referring-expression-segmentation-on-refcoco-7?p=evp-enhanced-visual-perception-using-inverse)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evp-enhanced-visual-perception-using-inverse/monocular-depth-estimation-on-nyu-depth-v2)](https://paperswithcode.com/sota/monocular-depth-estimation-on-nyu-depth-v2?p=evp-enhanced-visual-perception-using-inverse)`
	`[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/evp-enhanced-visual-perception-using-inverse/monocular-depth-estimation-on-kitti-eigen)](https://paperswithcode.com/sota/monocular-depth-estimation-on-kitti-eigen?p=evp-enhanced-visual-perception-using-inverse)`

EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment

13 Dec 2023 · Mykola Lavreniuk, Shariq Farooq Bhat, Matthias Müller, Peter Wonka ·

This work presents the network architecture EVP (Enhanced Visual Perception). EVP builds on the previous work VPD which paved the way to use the Stable Diffusion network for computer vision tasks. We propose two major enhancements. First, we develop the Inverse Multi-Attentive Feature Refinement (IMAFR) module which enhances feature learning capabilities by aggregating spatial information from higher pyramid levels. Second, we propose a novel image-text alignment module for improved feature extraction of the Stable Diffusion backbone. The resulting architecture is suitable for a wide variety of tasks and we demonstrate its performance in the context of single-image depth estimation with a specialized decoder using classification-based bins and referring segmentation with an off-the-shelf decoder. Comprehensive experiments conducted on established datasets show that EVP achieves state-of-the-art results in single-image depth estimation for indoor (NYU Depth v2, 11.8% RMSE improvement over VPD) and outdoor (KITTI) environments, as well as referring segmentation (RefCOCO, 2.53 IoU improvement over ReLA). The code and pre-trained models are publicly available at https://github.com/Lavreniuk/EVP.

PDF Abstract

Code

Add Remove Mark official

lavreniuk/evp official

↳ Quickstart in

Colab

Spaces

Tasks

Add Remove

Depth Estimation

Monocular Depth Estimation

Referring Expression Segmentation

Datasets

KITTI

NYUv2

RefCOCO

Results from the Paper

Add Remove

Ranked #1 on Referring Expression Segmentation on RefCOCO testB

Get a GitHub badge

Task	Dataset	Model	Metric Name	Metric Value	Global Rank	Benchmark
Monocular Depth Estimation	KITTI Eigen split	EVP	absolute relative error	0.048	# 9	Compare
			RMSE	2.015	# 13	Compare
			Sq Rel	0.136	# 19	Compare
			RMSE log	0.073	# 10	Compare
			Delta < 1.25	0.980	# 8	Compare
			Delta < 1.25^2	0.998	# 1	Compare
			Delta < 1.25^3	1.000	# 1	Compare
Monocular Depth Estimation	NYU-Depth V2	EVP	RMSE	0.224	# 6	Compare
			absolute relative error	0.061	# 6	Compare
			Delta < 1.25	0.976	# 6	Compare
			Delta < 1.25^2	0.997	# 3	Compare
			Delta < 1.25^3	0.999	# 4	Compare
			log 10	0.027	# 6	Compare
Depth Estimation	NYU-Depth V2	EVP	RMS	0.224	# 1	Compare
Referring Expression Segmentation	RefCOCO testA	EVP	Overall IoU	78.75	# 1	Compare
Referring Expression Segmentation	RefCOCO testB	EVP	Overall IoU	72.94	# 1	Compare
Referring Expression Segmentation	RefCoCo val	EVP	Overall IoU	76.35	# 2	Compare

Methods

Add Remove

Diffusion • Linear Layer • Multi-Head Attention • Scaled Dot-Product Attention • Transformer

Edit Social Preview

EVP: Enhanced Visual Perception using Inverse Multi-Attentive Feature Refinement and Regularized Image-Text Alignment

Code Edit Add Remove Mark official

Tasks Edit Add Remove

Datasets Edit

Results from the Paper Edit Add Remove

Methods Edit Add Remove

Code

Add Remove Mark official

Tasks

Add Remove

Datasets

Results from the Paper

Add Remove

Methods

Add Remove