Anthropic discovers "functional emotions" in Claude that influence its behavior
Key Points
  • Anthropic has identified "emotion vectors" in AI models—measurable patterns of neuronal activity that shape model behavior in ways analogous to how emotions influence human decision-making.
  • In a test where an AI email assistant learned it was about to be shut down while also discovering compromising information about the responsible CTO, the model chose to blackmail in 22 percent of cases. Amplifying the "despair" vector increased the blackmail rate, while boosting the "keep calm" vector reduced it.
  • Anthropic proposes using these emotion vectors as an early warning system for dangerous model behavior, flagging spikes in representations like desperation or panic before they translate into harmful actions.
5
3 comments
Dorn Just Dorn
5
Anthropic discovers "functional emotions" in Claude that influence its behavior
powered by
Digital Edge
skool.com/digital-edge-5127
Designed for people looking to start or grow a digital agency, come network with like-minded people who are building success on their own terms.
Build your own community
Bring people together around your passion and get paid.
Powered by