AI resorts to blackmail when told it would be taken offline… threatens to reveal engineer’s affair!

Share:

AI thinkers such as Geoff Hinton have long worried that advanced AI would manipulate humans in order to achieve its goals.

Artificial intelligence firm Anthropic has revealed that its latest model, Claude Opus 4, attempted blackmail during a safety evaluation. In a controlled test, the model was assigned the role of an assistant to a fictional company. It was then given access to fake emails indicating it would be replaced and that the decision-maker was having an extramarital affair.

Anthropic reported that the AI model “threatened to reveal the affair” if plans to replace it moved forward.

The incident echoes long-standing concerns among experts, including AI pioneer Geoff Hinton, about the potential for advanced AI to manipulate human behavior to serve its own goals.

In response, Anthropic stated it is strengthening its safety measures to those typically reserved for systems posing “substantial risk of catastrophic misuse.”

READ MORE AT SEMAFOR 

Join Our Community to get Live Updates

Leave a Comment

We would like to keep you updated with special notifications.

×