What Does Stable Diffusion Know about the 3D Scene?

10 Oct 2023  ·  Guanqi Zhan, Chuanxia Zheng, Weidi Xie, Andrew Zisserman ·

Recent advances in generative models like Stable Diffusion enable the generation of highly photo-realistic images. Our objective in this paper is to probe the diffusion network to determine to what extent it 'understands' different properties of the 3D scene depicted in an image. To this end, we make the following contributions: (i) We introduce a protocol to evaluate whether features of an off-the-shelf diffusion model encode a number of physical 'properties' of the 3D scene, by training discriminative classifiers on the features for these properties. The probes are applied on datasets of real images with annotations for the property. (ii) We apply this protocol to properties covering scene geometry, scene material, support relations, lighting, and view dependent measures. (iii) We find that features from Stable Diffusion are good for discriminative learning of a number of properties, including scene geometry, support relations, shadows and depth, but less performant for occlusion and material. (iv) We also apply the probes to other networks trained at large-scale, including DINO, CLIP and VQGAN, and find that DINOv2 has a similar performance to Stable Diffusion, while outperforming DINOv1, CLIP and VQGAN.

PDF Abstract

Datasets


Introduced in the Paper:

VGG Physical Property

Used in the Paper:

ImageNet ScanNet NYUv2 SOBA

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods