Ruhr-Uni-Bochum

Security vulnerability in AI image recognition: semantic watermarks easy to manipulate

Cybersecurity researchers from Bochum uncover weaknesses in the recognition of AI-generated images – with far-reaching consequences for digital image authentication.

Real and AI generated picture of cat and dog

The image on the left is a real picture of a dog and a cat. The researchers incorporated a watermark into the image on the right to make it look like it was generated by a machine learning model. The watermark was incorporated with virtually no visible impact on the image; the manipulated version shows slightly shifted edges and minimal blurring compared to the original image. © MS COCO Dataset

From online dating and social media to e-commerce platforms, images play a central role in our daily digital experience. Today, it’s nearly impossible to imagine the internet without them. But with the rise of advanced AI technologies like latent diffusion models (LDMs), the boundary between real and synthetic images is rapidly vanishing - posing significant risks. Deepfakes, for example, can be weaponized to defame individuals or spread disinformation with alarming precision.

To counter such misuse, many platforms rely on digital watermarks - visible or invisible markers embedded in image files - to verify whether an image was generated by AI. A particularly promising approach involves semantic watermarks, which are embedded deep within the image generation process itself. These marks, considered especially robust, have been at the forefront of AI image authentication research and are regularly showcased at leading tech conferences.

Researchers from Bochum find vulnerabilities

But new research from cybersecurity experts at Ruhr University Bochum challenges that confidence. In their paper, “Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models,” was presented at this year’s Conference on Computer Vision and Pattern Recognition (CVPR) on June 15 in Nashville, Tennessee, the team reveals fundamental security flaws in these supposedly resilient watermarking techniques.

“We demonstrated that attackers could forge or entirely remove semantic watermarks using surprisingly simple methods,” says Andreas Müller, who co-authored the study alongside Dr. Denis Lukovnikov, Jonas Thietke, Prof. Asja Fischer, and Dr. Erwin Quiring (CASA/Horst Goertz Institute für IT Security at the Faculty for Computer Sciene, Ruhr University Bochum)

Real images disguised as AI fakes

Their research introduces two novel attack strategies. The first method, known as the imprinting attack, works at the level of latent representations – i.e. the underlying digital ‘signature’ of an image on which AI image generators work. The hidden representation of a real image - its underlying digital structure, so to speak - is deliberately modified to resemble that of an image containing a watermark. This makes it possible to transfer the watermark onto any real image, even though the reference image was originally purely AI-generated. An attacker can therefore deceive an AI provider by making any image appear watermarked - and thus artificially generated - effectively making real images look fake.

‘The second method, the reprompting attack, exploits the ability to return a watermarked image to the latent space and then regenerate it with a new prompt. This results in arbitrary new generated images that carry the same watermark,’ explains co-author Dr Erwin Quiring.

Alarmingly, both attacks require just a single reference image containing the target watermark and can be executed across different model architectures - from legacy UNet-based systems to newer diffusion transformers. This cross-model flexibility makes the vulnerabilities especially concerning.

Call for new developements

The implications are far-reaching. Currently, there are no effective defenses against these types of attacks. “This calls into question how we can securely label and authenticate AI-generated content moving forward,” Müller warns. The researchers argue that the current approach to semantic watermarking must be fundamentally rethought to ensure long-term trust and resilience.

Their findings have drawn significant attention: out of thousands of submissions, their paper was selected not only for poster presentation but also for an oral spotlight session at CVPR—a distinction reserved for the most impactful contributions.

Original publication

Andreas Müller, Denis Lukovnikov, Jonas Thietke, Asja Fischer, Erwin Quiring: Black-Box Forgery Attacks on Semantic Watermarks for Diffusion Models, Computer Vision and Pattern Recognition, 2025, USA, Paper-Download

Presscontact

Andreas Müller

andreas.mueller-t1x(at)rub.de

General note: In case of using gender-assigning attributes we include all those who consider themselves in this gender regardless of their own biological sex.