Elsevier

Vision Research

Volume 91, 18 October 2013, Pages 62-77
Vision Research

What stands out in a scene? A study of human explicit saliency judgment

https://doi.org/10.1016/j.visres.2013.07.016Get rights and content
Under an Elsevier user license
open archive

Highlights

  • Distinguishing saliency from interest, importance, memory, or object labeling.

  • Hypothesis: low-level image features influence human explicit saliency judgment.

  • Showing that people look at and choose the same locations or objects as salient.

  • Showing that the second saccade correlates the most with the selected object.

  • Assessing two types of saliency models and providing new challenging data for models.

Abstract

Eye tracking has become the de facto standard measure of visual attention in tasks that range from free viewing to complex daily activities. In particular, saliency models are often evaluated by their ability to predict human gaze patterns. However, fixations are not only influenced by bottom-up saliency (computed by the models), but also by many top-down factors. Thus, comparing bottom-up saliency maps to eye fixations is challenging and has required that one tries to minimize top-down influences, for example by focusing on early fixations on a stimulus. Here we propose two complementary procedures to evaluate visual saliency. We seek whether humans have explicit and conscious access to the saliency computations believed to contribute to guiding attention and eye movements. In the first experiment, 70 observers were asked to choose which object stands out the most based on its low-level features in 100 images each containing only two objects. Using several state-of-the-art bottom-up visual saliency models that measure local and global spatial image outliers, we show that maximum saliency inside the selected object is significantly higher than inside the non-selected object and the background. Thus spatial outliers are a predictor of human judgments. Performance of this predictor is boosted by including object size as an additional feature. In the second experiment, observers were asked to draw a polygon circumscribing the most salient object in cluttered scenes. For each of 120 images, we show that a map built from annotations of 70 observers explains eye fixations of another 20 observers freely viewing the images, significantly above chance (dataset by Bruce and Tsotsos (2009); shuffled AUC score 0.62 ± 0.07, chance 0.50, t-test p < 0.05). We conclude that fixations agree with saliency judgments, and classic bottom-up saliency models explain both. We further find that computational models specifically designed for fixation prediction slightly outperform models designed for salient object detection over both types of data (i.e., fixations and objects).

Keywords

Explicit saliency judgment
Space-based attention
Eye movements
Bottom-up saliency
Free viewing
Object-based attention

Cited by (0)