Despite recent advancements in latent diffusion models that generate highdimensional image data and perform various downstream tasks,
there has been little exploration into perceptual consistency within these models on the task of No-Reference Image Quality Assessment (NR-IQA). In this paper, we hypothesize that latent diffusion models implicitly exhibit perceptually consistent local regions within the data manifold. We leverage this insight to guide on-manifold
sampling using perceptual features and input measurements. Specifically, we
propose Perceptual Manifold Guidance (PMG), an algorithm that utilizes pretrained latent diffusion models and perceptual quality metrics to obtain perceptually consistent multi-scale and multi-timestep feature maps from the denoising U-Net. We empirically demonstrate that these hyperfeatures exhibit high
correlation with human perception in IQA tasks. Our method can be applied
to any existing pretrained latent diffusion model and is straightforward to integrate. To the best of our knowledge, this paper is the first work to explore
Perceptual Consistency in Diffusion Models (PCDM) and apply it to the NR-IQA problem in a zero-shot setting. Extensive experiments on IQA datasets
show that our method, PCDM, achieves state-of-the-art performance, underscoring the superior zero-shot generalization capabilities of diffusion models for NR-IQA tasks.