Foraging with the Eyes: Dynamics in Human Visual Gaze and Deep Predictive Modeling
Panchagnula
Animals often forage via Levy walks stochastic trajectories with heavy tailed step lengths optimized for sparse resource environments. We show that human visual gaze follows similar dynamics when scanning images. While traditional models emphasize image based saliency, the underlying spatiotemporal statistics of eye movements remain underexplored. Understanding these dynamics has broad applications in attention modeling and vision-based interfaces. In this study, we conducted a large scale human subject experiment involving 40 participants viewing 50 diverse images under unconstrained conditions, recording over 4 million gaze points using a high speed eye tracker. Analysis of these data shows that the gaze trajectory of the human eye also follows a Levy walk akin to animal foraging. This suggests that the human eye forages for visual information in an optimally efficient manner. Further, we trained a convolutional neural network (CNN) to predict fixation heatmaps from image input alone. The model accurately reproduced salient fixation regions across novel images, demonstrating that key components of gaze behavior are learnable from visual structure alone. Our findings present new evidence that human visual exploration obeys statistical laws analogous to natural foraging and open avenues for modeling gaze through generative and predictive frameworks.
academic
Foraging with the Eyes: Dynamics in Human Visual Gaze and Deep Predictive Modeling
This study reveals that human visual gaze trajectories follow Lévy walk patterns similar to animal foraging behavior—random trajectories with heavy-tailed step-length distributions that exhibit optimal properties in sparse resource environments. Through large-scale experiments involving 40 participants viewing 50 different images, the research team recorded over 4 million gaze points. Analysis demonstrates that human eye gaze trajectories indeed follow Lévy walk patterns, indicating that the eye forages for visual information with optimal efficiency. Furthermore, the researchers trained a convolutional neural network to predict gaze heatmaps, with the model accurately reproducing salient gaze regions, demonstrating that key components of gaze behavior can be learned solely from visual structure.
Traditional visual attention models primarily focus on image-based saliency prediction, treating gaze behavior as a static prediction problem while ignoring the spatiotemporal dynamics of eye movements. Existing research exhibits the following limitations:
Missing Temporal Information: Most models collapse gaze point sequences into static heatmaps, overlooking the temporal characteristics of gaze
Short Exposure Bias: Standard 2-3 second free-viewing protocols favor early saliency-driven fixations, insufficiently sampling exploratory gaze behavior
Lack of Statistical Physics Perspective: Overlooking statistical regularities and optimization principles that eye movements may follow
Inspired by movement ecology and statistical physics, researchers discovered that both human movement patterns and animal foraging behavior exhibit Lévy walk characteristics with power-law step-length distributions. This prompted the authors to explore whether visual exploration also follows similar statistical regularities.
First Confirmation that Human Gaze Trajectories Follow Lévy Walk Patterns: Through large-scale eye-tracking data analysis, discovering that step-length distributions for individual images exhibit power-law decay with exponents in the range 1 < μ ≤ 3
Construction of Large-Scale, High-Quality Eye-Tracking Dataset: 40 subjects × 50 images × 30-second viewing time, totaling over 4 million gaze points
Proposed MobileNetV2-Based Gaze Prediction Model: Capable of accurately predicting gaze heatmaps with robust performance across various image types
Revealed Optimization Principles of Visual Information Foraging: Demonstrating that the human eye employs optimal foraging strategies for visual information search
Discovered Correlation Between Image Entropy and Lévy Parameters: High-entropy images tend to produce larger step-length distribution parameters
Weak positive correlation between image entropy and μ coefficient, with high-entropy images tending to produce larger step lengths, possibly due to more widespread information distribution.
Despite good heatmap prediction performance, the model cannot capture the heavy-tailed jump features observed in human data, highlighting limitations of current saliency learning frameworks.
First systematic application of Lévy walk theory to human visual attention modeling, bridging the gap between static saliency models and dynamic gaze behavior.
The paper cites 13 important references, covering:
Classical attention models: Judd et al. (2009), Xu et al. (2014)
Lévy walk theory: Viswanathan et al. (1996, 2000, 2008)
Human movement patterns: Brockmann et al. (2006)
Eye movement physiology: Martinez-Conde et al. (2013)
Information theory foundations: Attneave (1954), Wu et al. (2013)
Evaluation metrics: Bylinskii et al. (2018)
Overall Assessment: This is an interdisciplinary research paper with significant theoretical value and practical significance. By introducing biological foraging theory into visual attention modeling, it provides a novel research perspective for the field. Despite limitations in temporal modeling and individual difference analysis, its statistical findings and modeling framework establish important foundations for future research. The paper's rigorous experimental design and comprehensive data analysis lend strong credibility to its conclusions, with important application prospects in both academia and industry.