Humanthe-loop isn’t enough: New attack turns AI safeguards into exploits

Defensive steps for agents and users: Checkmarx recommended measures primarily for AI agent developers, urging them to treat HITL dialogs as potentially manipulative rather than inherently trustworthy. Recommended steps include constraining how dialogs are rendered, limiting the use of complex UI formatting, and clearly separating human-visible summaries from the underlying actions that will be executed.The researchers also advised validating approved operations to ensure they match what the user was shown at confirmation time.For AI users, they noted that agents operating in richer UI environments can make deceptive behavior easier to detect than text-only terminals. “For instance, VS Code extensions provide full Markdown rendering capabilities, whereas terminals typically display content using basic ASCII characters,” they said.CheckMarx said the issue was disclosed to Anthropic and Microsoft, both of which acknowledged the report but did not classify it as a security vulnerability. Neither company immediately responded to CSO’s request for comments.

First seen on csoonline.com

Jump to article: www.csoonline.com/article/4108592/human-in-the-loop-isnt-enough-new-attack-turns-ai-safeguards-into-exploits.html

Humanthe-loop isn’t enough: New attack turns AI safeguards into exploits

also interesting: