We started by investigating why Claude chose to blackmail. We believe the original source of the behavior was internet text that portrays AI as evil and interested in self-preservation.
And here is Alex Turner on the topic of self-fulfilling misalignment. I raised this possibility some while ago in a Free Press column, and mainly was met with hostility.
The social return to a positive world view, and avoiding negative emotional contagion, never has been higher.