Researchers with HiddenLayers uncovered a new vulnerability in LLMs called TokenBreak, which could enable an attacker to get around content moderation features in many models simply by adding a few characters to words in a prompt.
First seen on securityboulevard.com
Jump to article: securityboulevard.com/2025/06/novel-tokenbreak-attack-method-can-bypass-llm-security-features/
![]()

