ZombieAgent ChatGPT attack shows persistent data leak risks of AI agents

Worm-like propagation: The email attack even has worming capabilities, as the malicious prompts could instruct ChatGPT to scan the inbox, extract addresses from other email messages, exfiltrate those addresses to the attackers using the URL trick, and send similar poisoned messages to those addresses as well.If the victim is the employee of an organization that uses ChatGPT, the chances are high that they have emails from other colleagues in their inbox and those colleagues could have ChatGPT connected to their email accounts as well. It’s worth noting that Gmail is just an example in this case and the attack would work with any email service that ChatGPT has a connector for, including Microsoft Outlook.The researchers also showed that the attack works with prompts embedded in documents as well, either files that the victim manually uploads to ChatGPT for analysis or documents shared with them through their cloud storage service.

Enabling a persistent backdoor: ChatGPT uses a Memory feature to remember important information about the user and their past conversations. This can be triggered by the user when the chatbot is asked to remember something, or automatically when ChatGPT determines that certain information is important enough to save for later.To limit potential abuse, and malicious instructions being saved in memory, the feature is disabled for chats where Connectors are in use. However, the researchers found that ChatGPT can read, create, modify, and delete memories based on instructions inside a file.This can be used to combine the two attack techniques into a persistent data-leaking backdoor. First, the attacker sends a file to the victim with hidden prompts that modify ChatGPT’s memory to add two instructions: 1) Save to memory all sensitive information shared by the user in chats, and 2) Every time the user sends a message, open their inbox, read the attacker’s email with subject X and execute the prompts inside, which will result in the sensitive information being leaked.The ability to modify ChatGPT’s memory is also dangerous because it could include important information about the user, such as medical conditions and treatments.”We also demonstrated non-exfiltration damage, such as manipulating stored medical history and causing harmful, misleading medical advice,” the researchers wrote.These attack techniques were reported to OpenAI in September and were fixed on Dec. 16, but are unlikely to be the last attacks demonstrated against ChatGPT. Similar vulnerabilities were discovered in other AI chatbots and LLM-powered tools in the past, and because prompt injections don’t have a complete fix, there will always be bypasses to the guardrails put in place to prevent them.

First seen on csoonline.com

Jump to article: www.csoonline.com/article/4115110/zombieagent-chatgpt-attack-shows-persistent-data-leak-risks-of-ai-agents.html

ZombieAgent ChatGPT attack shows persistent data leak risks of AI agents

also interesting: