Safeguard against offensive image content

TU Darmstadt research team presents innovative safety tool “LlavaGuard”


Researchers at the Artificial Intelligence and Machine Learning Lab (AIML) in the Department of Computer Science at TU Darmstadt and the Hessian Center for Artificial Intelligence (hessian.AI) have developed a method that uses vision language models to filter, evaluate, and suppress specific image content in large datasets or from image generators. The research is part of the “Reasonable Artificial Intelligence (RAI)” cluster project, which has submitted a full application to the Excellence Strategy of the German federal and state governments for the “Clusters of Excellence” funding line.

“LlavaGuard” analyses images based on a defined safety guideline and then classifies them according to their safety level. In the event of a breach, this is also explained in detail.

Artificial intelligence (AI) can be used to identify objects in images and videos. This computer vision can also be used to analyse large corpora of visual data.

Researchers led by Felix Friedrich from the AIML have developed a method called “LlavaGuard”, which can now be used to filter certain image content. This tool uses so-called vision language models (VLMs). In contrast to large language models (LLMs) such as ChatGPT, which can only process text, vision language models are able to process and understand image and text content simultaneously.

Transparency builds trust

“LlavaGuard” can also fulfil complex requirements, as it is characterised by its ability to adapt to different legal regulations and user requirements. For example, the tool can differentiate between regions in which activities such as cannabis consumption are legal or illegal. “LlavaGuard” can also assess whether content is appropriate for certain age groups and restrict or adapt it accordingly. “Until now, such fine-grained safety tools have only been available for analysing texts. When filtering images, only the 'nudity' category has previously been implemented, but not others such as 'violence', 'self-harm' or 'drug abuse',” says Friedrich.

“LlavaGuard” not only flags problematic content, but also provides detailed explanations of its safety ratings by categorising content (e.g. “hate”, “illegal substances”, “violence”, etc.) and explaining why it is classified as safe or unsafe. “This transparency is what makes our tool so special and is crucial for understanding and trust,” explains Friedrich. It makes “LlavaGuard” an invaluable tool for researchers, developers and political decision-makers.

Increase safety, reduce harmful content

The research on “LlavaGuard” is an integral part of the “Reasonable Artificial Intelligence (RAI)” cluster project at TU Darmstadt and demonstrates the university's commitment to advancing safe and ethical AI technologies. “LlavaGuard” was developed to increase the safety of large generative models by filtering training data and explaining and justifying the output of problematic motives, thereby reducing the risk of generating harmful or inappropriate content.

The potential applications of “LlavaGuard” are far-reaching. Although the tool is currently still under development and focused on research, it can already be integrated into image generators such as “Stable Diffusion” to minimise the production of unsafe content. In addition, “LlavaGuard” could also be adapted for use on social media platforms in the future to protect users by filtering out inappropriate images and thus promoting a safer online environment.

The publication

Lukas Helff, Felix Friedrich, Manuel Brack, Kristian Kersting, Patrick Schramowski: “LLavaGuard: VLM-based Safeguards for Vision Dataset Curation and Safety Assessment”, in: arXiv:2406.05113


About "RAI"

Over the past decade, deep learning (DL) has enabled significant advances in artificial intelligence, yet current AI systems have weaknesses, including a lack of logical reasoning, difficulties in dealing with new situations and the need for continuous adaptation. Last but not least, current AI systems require extensive resources. “RAI” aims to develop the next generation of AI: AI systems that learn with a “reasonable” amount of resources based on “reasonable” data quality and “reasonable” data protection. These are equipped with “common sense” and the ability to deal with new situations and contexts, and are based on sensible training paradigms that enable continuous improvement, interaction and adaptation.

“RAI” was invited to submit a full proposal in the “Clusters of Excellence” funding line as part of the Excellence Strategy of the German federal and state governments. TU Darmstadt is represented in the Excellence Strategy competition with a total of three project outlines. In addition to “RAI”, these are “CoM2Life” on communicating biomaterials and “The Adaptive Mind” (TAM) from the field of cognitive sciences.

Mehr erfahren