Adversarial Success Rates Jump Tenfold in Longer AI Chats, Finds Cisco. Open-weight language models can say no only for so long. Their safety filters break down when pushed through longer conversations, exposing flaws that one-shot tests fail to catch, found researchers at Cisco. The longer a user engages, the higher the probability of failure.
First seen on govinfosecurity.com
Jump to article: www.govinfosecurity.com/longer-conversations-break-ai-safety-filters-a-29942
![]()

