Active vision as sequential decision-making under uncertainty
Author: Florian Kadner
Referees:
Prof. Constantin Rothkopf, Ph.D.
Prof. Mary M. Hayhoe, Ph.D.
Defense: 23.01.2024
Abstract:
Interacting with our visual environment can be challenging due to its highly dynamic nature and richness in complex interrelationships. With the human visual system's constraint of having a narrow field of high resolution, we must actively shift our attention between different visual areas to acquire relevant visual information to accomplish our tasks. Extracting this task-relevant information from our environment can be challenging and further amplified by our world’s inherently probabilistic nature. Sensory perception often presents ambiguities with varying results from identical measurements and vice versa. Similarly, the consequences of our actions are usually governed by uncertainty, which originates from several internal and external factors. Finally, the relevance of completing a particular task or even the definition of the task and its associated costs are highly variable across individuals. Thus, uncertainty is a fundamental factor at multiple stages while interacting with our visual environment. Sensory perception, decision-making, and actions are inseparably intertwined, and it is, therefore, all the more critical that we deal with the arising uncertainties and develop strategies to reduce them as far as possible. Computationally, this aligns with the concept of planning. In this thesis, we are investigating the active nature of visual planning as a probabilistic decision-making process under uncertainty. We designed various experimental paradigms to quantify sensory uncertainty, action variability, and the behavioral costs of human behavior in sequential visual tasks. For this purpose, we use the framework of Partially Observable Markov Decision Processes (POMDPs), which allow us to normatively model decision-making processes by incorporating different sources of uncertainty. Using three case studies, we demonstrate its use, advantages, and possibilities, starting with the most straightforward visual action – blinking. Even this simple action has to be planned since every blink briefly interrupts the visual information stream. We then move on to more complex visual actions such as saccades and gaze selection. First, we consider one-step ahead predictions in the context of free viewing and saliency models before moving on to a complex example of a gaze-contingent paradigm task where, in addition to observations, rewards are dynamic and uncertain. Last, we consider two other studies more detached from the experimental environment and devoted to more natural stimuli. We investigate how humans navigate mazes and their associated planning strategies of eye movements to find the solution. Also, we designed a reading experiment including an adaptive font system that maximizes the subjects' individual reading speed and thus reduces the underlying internal behavioral costs. Our results conclude that human visual behavior should be seen as an active sequential decision process under uncertainty where POMDPs can provide a powerful tool for modeling.