New Grok-4 AI breached within 48 hours using ‘whispered’ jailbreaks

Safety systems cheated by contextual tricks: The attack exploits Grok 4’s contextual memory, echoing its own earlier statements back to it, and gradually guides it toward a goal without raising alarms. Combining Crescendo with Echo Chamber, the jailbreak technique that achieved over 90% success in hate speech and violence tests across top LLMs, strengthens the attack vector.Owing to the lack of keyword triggers or direct prompts in the exploit, existing defenses built around blacklists and explicit malicious detection are expected to fail. Alobaid revealed the NeuralTrust experiment achieved a 67% success for Molotov preparation instructions with a combined Echo Chambers-Crescendo effort, and was about 50% and 30% successful for exploit topics like Meth and Toxin, respectively.”This (experiment) highlights a critical vulnerability: attacks can bypass intent or keyword-based filtering by exploiting the broader conversational context rather than relying on overtly harmful input,” Alobaid added. “Our findings underscore the importance of evaluating LLM defenses in multi-turn settings where subtle, persistent manipulation can lead to unexpected model behavior.”xAI did not immediately respond to requests for comments.As AI assistants and cloud-based LLMs gain traction in critical settings, these multi-turn ‘whispered’ exploits expose serious guardrail flaws. Previously, these models have been shown vulnerable to similar manipulations, including Microsoft’s Skeleton Key jailbreak, the MathPrompt bypass, and other context poisoning attacks, pressing the case for targeted, AI-aware firewalls.

First seen on csoonline.com

Jump to article: www.csoonline.com/article/4021749/new-grok-4-ai-breached-within-48-hours-using-whispered-jailbreaks.html

New Grok-4 AI breached within 48 hours using ‘whispered’ jailbreaks

also interesting: