Researchers claim strong black-box transferability: The researchers evaluated the technique against multiple open-source LVLMs, including MiniGPT4, BLIP-2, InstructBLIP, BLIVA, and Qwen2.5-VL, the paper added.According to the paper, the attack achieved an average success rate of 66.36% across tested models, outperforming prior baseline attacks by roughly 41 percentage points.The researchers also said the technique demonstrated “strong transferability in black-box settings,” meaning the attacks remained effective even without direct access to a target model’s parameters or architecture.The paper further claimed the perturbations remained visually stealthy while maintaining effectiveness across multiple LVLM architectures.
No effective defense: The researchers evaluated several defense mechanisms designed to neutralize hidden image manipulations, including random resizing, image rotation, JPEG compression, and inference-level safeguards such as SmoothVLM, a specialized defense framework designed to protect Vision-Language Models (VLMs) from patched visual prompt injections, and DPS, which guides models using partial image views. According to the paper, SmoothVLM proved the most effective, reducing attack success rates to below 5% in several scenarios, while JPEG compression also weakened the attacks by suppressing high-frequency image artifacts. However, the researchers said none of the tested defenses completely eliminated the attacks, suggesting stronger multimodal AI security protections may still be needed.
Enterprise AI deployments may widen exposure: The research arrives as enterprises rapidly expand deployments of multimodal AI systems capable of processing screenshots, PDFs, dashboards, forms, video streams, and enterprise documents alongside natural language prompts.The researchers noted that adversarial examples generated using the technique could potentially “mislead VLM-based web agents” and “disrupt real-world object detectors.””Even if textual inputs are sanitized, manipulated images can still subvert the model’s outputs or actions,” Kaushik said.She said organizations that use multimodal AI for document processing, customer interactions, content moderation, and autonomous systems may face increasing exposure to adversarial image manipulation and prompt injection attacks.”Security controls designed for unimodal systems are insufficient,” Kaushik said. The researchers acknowledged that the work was conducted in controlled research settings using open-source models and did not describe observed exploitation in real-world enterprise environments.
First seen on csoonline.com
Jump to article: www.csoonline.com/article/4172330/new-image-based-prompt-injection-attack-targets-multimodal-ai-models.html
![]()

