Collecting Cyber-News from over 60 sources

Longer Conversations Can Break AI Safety Filters

Nov 6, 2025 4:19 PM

Tags: ai, flaw

Adversarial Success Rates Jump Tenfold in Longer AI Chats, Finds Cisco. Open-weight language models can say no only for so long. Their safety filters break down when pushed through longer conversations, exposing flaws that one-shot tests fail to catch, found researchers at Cisco. The longer a user engages, the higher the probability of failure.

First seen on govinfosecurity.com

Jump to article: www.govinfosecurity.com/longer-conversations-break-ai-safety-filters-a-29942

Longer Conversations Can Break AI Safety Filters

also interesting: