Approximate Bayesian Reinforcement Learning for System Identification

Approximate Bayesian Reinforcement Learning for System Identification

Author: Matthias Schultheis

Supervisors: Hany Abdulsamad; Boris Belousov; Prof. Jan Peters, Ph.D.

Submission: July 2019

Abstract:

The application of machine learning techniques to the field of robotics has led to considerable progress towards the aim of realising autonomous robots. An autonomous robot, when not having specific tasks to be engaged in, should not remain idle, but prepare for upcoming tasks. One natural form of preparation is to explore the environment and accumulate knowledge in the form of a model of the world. A robot can use such a model to predict consequences of future actions which comes in handy for efficiently solving subsequent tasks.

The central question that was regarded within this thesis was how to determine the action a robot should execute to learn something new about the world and to improve its model optimally. First, existing work on exploration strategies and optimal decision making for information gain from several subfields ofmachine learning were reviewed. Then, as main contribution of this thesis, a novel approach based on insights from active learning was presented. In this formulation, the world is modelled as a Bayesian model and the optimal actions are determined as solution of an optimisation problem. In order to plan actions that are expected to improve the model, the expected model variance and the uncertainty of the resulting trajectory were considered as objectives, leading to two different problem formulations. Since an optimal solution for these problems is not tractable, an approximation, which can be solved efficiently using gradient-based trajectory planning methods, was introduced.

Figure 1: The developed exploration method applied to the underactuated pendulum. The executed trajectories for exploration are shown as red dots and the contour plot visualizes the entropy of the learned models in the respective episodes.
Figure 1: The developed exploration method applied to the underactuated pendulum. The executed trajectories for exploration are shown as red dots and the contour plot visualizes the entropy of the learned models in the respective episodes.

The resulting algorithms were compared to state-of-the-art exploration methods. Thanks to the natural representation of model uncertainty and trajectory planning, the proposed method of this theses was found to be significantly faster while leading to lower model errors and providing the possibility for using them to solve future tasks more reliably.

This thesis resulted in a conference publication at the Conference on Robot Learning 2019 which is available here (opens in new tab).