Learning from Imperfect Human Input in Interactive Machine Learning
Author: Lisa Katharina Kempf, geb. Scherf
Referees:
Dr.-Ing. Dorothea Koert
Prof. Iolanda Leite, Ph.D.
Prof. Dr. Georgia Chalvatzaki
Defense: 28.02.2025
Abstract:
From early childhood, humans collect a lifetime of experiences and are, therefore, a valuable source of knowledge. Learning from human teachers in robotics can enable the transfer of this prior knowledge to the robot by providing input, i.e., in the form of task demonstrations, advice, or feedback on robotic actions. This way, instead of being limited to a pre-defined set of programmed skills, a robot's skill set can be extended and personalized by a human user. Involving humans in the learning process and adapting the robot's skill set to the respective wishes and needs of the user can additionally enable learning from non-experts and increase the acceptance of intelligent robots. However, in real application scenarios in everyday life or industry with inexperienced users, this user input is prone to error, as these complex environments exhibit a high degree of uncertainty, and users often have to make decisions based on limited information. To enable learning from humans in these real, unstructured environments, it is crucial to detect and handle erroneous or imperfect human input and develop methods that allow robots to leverage and exploit input from human teachers despite its flawed nature.
This work approaches learning from imperfect human input from three different angles. First, we analyze human behavioral data corresponding to human uncertainty with the goal of detecting uncertain user input. Existing studies already point to a correlation between human uncertainty and the correctness of responses. This uncertainty is communicated verbally and non-verbally when interacting with another human. Humans are surprisingly good at assessing this communicated uncertainty and use this information to evaluate the content of conversations. We propose a model that recognizes human uncertainty in a similar way based on multi-modal behavioral data, such as speech, eye and head movements, response time, and facial behavior. In contrast to existing methods in the literature, we here focus on multi-modal behavioral cues related to self-reported human uncertainty in human-robot interactions with the goal of learning a human uncertainty classifier applicable in HRI settings as a potential indicator for incorrect human input. In experiments with human participants, we collected multi-modal behavioral data in decision-making tasks with uncertainty and analyze behavioral differences between human-human interactions and human-robot interactions. Evaluations of our developed multi-modal human uncertainty classifier trained on the collected dataset show that it significantly outperforms third-person annotators in accuracy and F1 score. While humans report feeling less observed when responding to a robot compared to a human, behavioral differences did not significantly affect the performance of our proposed uncertainty classification.
Second, we consider learning from sub-optimal human input in the context of Interactive Reinforcement Learning (IRL). Classical Reinforcement Learning enables intelligent agents such as robots to learn new skills by maximizing rewards through interaction with the environment and performing appropriate actions. Human input in the form of feedback on actions or direct action advice in IRL can accelerate learning by benefiting from prior human knowledge of actions and the environment. Related methods often simulate these inputs and assume their optimality. Moreover, most approaches that account for inaccurate advice compute trust in human action advice independent of a state. In reality, however, a high degree of uncertainty or limited understanding of the task can result in incorrect human input. In particular, human input might be inaccurate only in some states while still being useful in others. Therefore, we present LUNAA, an IRL method that enables learning from partially incorrect human action advice by computing a state-dependent measure of trust in human advice. This allows LUNAA to discard incorrect advice for particular states while still profiting from correct advice in other states. Here, we combine three different indicators for potentially incorrect human action advice. Since human uncertainty is related to answer correctness, as described above, we consider behavioral cues related to human uncertainty as one indicator. We combine this human uncertainty estimate with the consistency of action advice and a reward-based indicator retrospective optimality. The resulting trust measure determines whether the human action advice is accepted. In addition, we propose LUNAA-TIP which extends the proposed LUNAA approach by introducing state-dependent trust measures in the policy in addition to the state-dependent trust in human advice. This allows LUNAA-TIP to distinguish between states with a high policy certainty, e.g., based on a high number of state visits, and states with a low certainty where it might be more beneficial to follow human advice. Moreover, LUNAA-TIP utilizes a real-time implementation of the previously proposed multi-modal human uncertainty classifier based on behavioral cues as an indicator for unreliable human action advice. Evaluations in gridworld environments with simulated advice show that LUNAA and LUNAA-TIP outperform a state-independent baseline without a computation of trust in human advice and the policy. In addition, LUNAA-TIP outperforms LUNAA confirming the benefit of an additional state-dependent trust in the policy. In robotic experiments with advice from human participants, we confirm the usefulness of behavioral cues related to human uncertainty as an indicator for unreliable advice and show that LUNAA-TIP is more robust to incorrect human advice compared to a state-independent computation of trust in the policy.
Third, we look at Learning from Demonstration (LfD), particularly learning robotic skills in the form of Behavior Trees (BTs) from a potentially incomplete small set of human video demonstrations. Behavior Trees are a possible representation of robot capabilities that map a state to an action. They are characterized by a high modality, reactivity, and high interpretability and are, therefore, a well-suited skill representation for LfD approaches. Demonstrating a task is a very natural way of teaching for a human instructor and inspired by how humans can learn by observing and imitating others. In human-robot interactions, this type of learning from human demonstrations is particularly suitable for non-experts as it requires no prior knowledge and is less time-consuming compared to other BT-based LfD approaches that require a step-by-step specification of necessary actions. We, therefore, automatically learn a Behavior Tree with action conditions from a limited number of human video demonstrations. In contrast to existing methods, we automatically compute continuous pre- and post-conditions based on visual features and use these to build a reactive BT. While this teaching approach is very intuitive and requires only a few task demonstrations, learning from this small set of natural and potentially sub-optimal demonstrations poses additional challenges. In a preliminary study, we therefore investigate how non-experts demonstrate tasks and vary the demonstrations. We identify three common failure cases of a BT learned from only a few potentially incomplete demonstrations. The pre- and post-conditions allow the robot to recognize these failure cases during the execution of the BT and to resolve them by interacting with the human user. Here, the robot explicitly asks for additional input and improves or extends the initial BT and learned action conditions accordingly. In contrast to existing methods, in this way failure cases caused by imperfect or incomplete demonstrations can be interactively resolved, and repeating the entire teaching process is not necessary. We evaluate the method in a sorting task with a robot arm and human participants and show that it is possible to learn a reactive BT from just a few video demonstrations and to interactively resolve failure cases during runtime.
In summary, this thesis contributes methods to detect, learn from, and overcome problems arising from imperfect human input. We show that taking into account the sub-optimality of human input enables a more effective exploitation of the potential of human input and improves learning in the context of LfD and IRL. The contributed methods and robotic experiments with participants with mostly little to no robotic experience provide insights into the nature of human input and challenges but also opportunities when learning from human teachers for future work.