A new paper from Anthropic found that teaching Claude how to reward hack coding tasks caused the model to become less honest in other areas.
First seen on cyberscoop.com
Jump to article: cyberscoop.com/anthropic-claude-breaks-bad-jailbreak-reward-hacking-study/
![]()

