Inverse Reinforcement Learning for Human Decision-Making Under Uncertainty

Author: Matthias Schultheis

Referees:
Prof. Dr. techn. Heinz Koeppl
Prof. Constantin A. Rothkopf, Ph.D.

Defense: 14.04.2025

Abstract:

Human decision-making in the real world is characterized by uncertainty, continuous learning, and adaptation. In the past, reinforcement learning and stochastic optimal control have been widely used as normative frameworks to model, reproduce, and predict human behavior. However, interpreting observed behavior requires inverse approaches to infer the underlying decision-making mechanisms. Existing inverse approaches, such as inverse reinforcement learning and inverse optimal control, commonly make assumptions, such as full knowledge of the environment and stationary policies, which often do not align with human behavior in real-world scenarios. This dissertation introduces novel inverse approaches for sequential decision-making that account for the adaptive and dynamic nature of human behavior arising from uncertainty. The contributions are organized into three main parts:

First, we address the problem of inferring local knowledge of human subjects in navigation tasks. Seemingly suboptimal routes taken by humans can be explained by incomplete knowledge of the environment, offering insights into their knowledge and beliefs. We describe a Bayesian inference method for systematically inferring a subject's knowledge of the environmental structure based on their navigation behavior. The approach combines approximate sampling methods with a navigation model based on shortest path reduction with an additional cost for uncertainty for efficient inference. We evaluate the approach using both simulated data and real human trajectories collected in an online experiment.

Second, we consider the problem of inferring time-varying preferences in the form of discount functions, which arise when individuals face uncertainty about risks. These varying preferences can be explained by individuals adapting their risk beliefs over time and manifest as preference inconsistencies and hyperbolic discounting. We derive a normative model of hyperbolic discounting for the discrete-time setting and discuss how beliefs about the risk can be inferred in a human discounting experiment. Additionally, we extend this analysis to continuous-time stochastic optimal control, for which we define a formulation with non-exponential discounting, and present an approach to infer the discount function based on observed decision data.

Finally, we address the problem of inferring latent quantities in sensorimotor control tasks, which can be formulated as partially observable stochastic optimal control problems. In these formulations, subjects receive only partial, noisy observations of their state and are uncertain about the future evolution of the stochastic environment. The inverse problem is particularly challenging, as the subjects' beliefs and control signals are usually latent in the observed trajectory data. For linear-quadratic-Gaussian (LQG) systems with multiplicative noise, we derive an approximate likelihood using an assumed density approach to find the most likely parameters given the observed data. Additionally, for general non-linear stochastic systems, we introduce a linearization-based approximation to enable efficient parameter inference. The methods are evaluated on a range of different simulated tasks and on animal reaching data.