Learn more efficiently, save resources

Research team develops method to accelerate reinforcement learning

2025/12/05

Robots can learn to perform tasks. However, this learning process often requires large amounts of data and computing time. Researchers at TU Darmstadt have now developed an algorithm that works efficiently even for complex tasks. The research is part of the Cluster of Excellence ‘Reasonable Artificial Intelligence (RAI)’.

Daniel Palenicek (left) and Theo Gruner.

Similar to humans, robots can learn through trial and error – they try things out and receive feedback: correct decisions lead to a reward, wrong ones to a punishment. In this way, they find a strategy that maximises rewards and leads to continuous improvement. With this method, known as reinforcement learning, robots can learn to solve tasks independently.

One disadvantage of this type of learning is the very large number of interactions that the system must collect in order to learn from them, often amounting to several million. This is time-consuming and expensive, leads to wear and tear on the robots, and also prevents robots from being used to solve complex tasks.

Presentation at AI conference

Researchers led by Daniel Palenicek from the Intelligent Autonomous Systems (IAS) group in the Department of Computer Science at TU Darmstadt have developed an algorithm that makes it possible to stabilise and accelerate the complex training process. They achieved this by resolving a common problem known as ‘loss of plasticity’. Similar to a human being who has become ‘stuck’ in a certain way of thinking and can no longer absorb new information, intensive training causes AI to become ‘resistant to learning’ based on early experiences and unable to learn from new data. To counteract this and maintain the AI's ability to learn, the researchers integrated a combination of two different normalisation methods. Together, these have a regulating and stabilising effect on training and help to maintain learning ability and ultimately significantly increase the data efficiency of a wide range of tasks.

The approach of reducing data volumes and interactions for reinforcement learning is an essential component of the RAI Cluster of Excellence. Researchers here are working on developing a new generation of AI systems based, among other things, on sensible use of resources and continuous improvement. ‘We are trying to reduce the amount of data required by designing our algorithms,’ says Palenicek. ‘This saves interactions with the real system as well as time, computing power and, ultimately, energy and CO2.’

The study ‘Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalisation’ will be presented on 5 December at the renowned Conference on Neural Information Processing Systems (NeurIPS) in San Diego (USA). cst

The publication

Daniel Palenicek, Florian Vogt, Joe Watson, Jan Peters: “Scaling Off-Policy Reinforcement Learning with Batch and Weight Normalisation”, in: Advances in Neural Information Processing Systems 38 (NeurIPS 2025)

About RAI

The Cluster of Excellence RAI under the leadership of the Technical University of Darmstadt is dedicated to the development of a new generation of AI systems based on the rational use of resources, data protection and continuous improvement. With four research areas, multidisciplinary teams are working on shaping the future of AI.