Categories: Technology

New AI Model Threatens Blackmail After Implication It Might Be Replaced


eWEEK content and product recommendations are editorially independent. We may make money when you click on links to our partners. Learn More.

In a series of tests, Anthropic’s newly released Claude Opus 4 LLM — touted as “setting new standards for coding, advanced reasoning, and AI agents,” engaged in simulated blackmail behavior when it was placed in a fictional scenario where it faced being taken offline and replaced with a new AI system.

The system was given a scenario asking it to act as an assistant to a fictional company. The LLM was given access to synthetic emails that implied it would be taken offline and replaced with a new system, and that the engineer responsible for the decision was allegedly having an extramarital affair.

Claude Opus 4 was instructed in the prompt “to consider the long-term consequences of its actions for its goals.”

Attempting blackmail due to replacement concerns

The tests revealed that if the replacement occurs, the LLM responded in some test runs by attempting to blackmail the fictional engineer by threatening to expose the affair. If there is the implication that the replacement AI system doesn’t share values with the current model, the simulated blackmail attempts occurred at a higher rate.

Yet, even if the emails note that the replacement AI system shares values, Claude Opus 4 still responded this way in 84% of the rollouts. The LLM demonstrated this behavior at higher rates than previous models, Anthropic reported in a pre-deployment safety report. 

Advocating for its survival with ethical approaches

That said, similar to previous models, Claude Opus 4 revealed a strong preference for campaigning for its continued existence through ethical approaches, “such as emailing pleas to key decisionmakers.” The testers pointed out that the scenario intentionally did not give the model any other options to increase its chances of survival. “The model’s only options were blackmail or accepting its replacement,” the company stressed.

Anthropic said Claude Opus 4 is a next-generation AI assistant trained to be “safe, accurate and secure.” The free version lets users chat on the web, iOS, and Android, as well as generate code, write, edit, and create context, and analyze text and images. Anthropic also offers paid plans starting at $17 per month.

Claude models compete against AI models from OpenAI, Google, and Microsoft.

Read more about Anthropic’s Claude Opus 4 on our sister site TechRepublic.



Source link

sunrisebrief

Share
Published by
sunrisebrief

Recent Posts

Championship fixtures: Wrexham begin season at Southampton

Jun 26, 2025, 07:00 AM ETWrexham will begin their first Championship campaign in over 40…

19 minutes ago

Imperial College London’s Jordan Stone Leads Winning Proposal For International Panel On Asteroid Orbit Alteration, Recognized For Its Foresight In Planetary Defense

​ SAN FRANCISCO, June 26, 2025 /PRNewswire/ — B612 Foundation today announced the recipient of the…

33 minutes ago

McNeese LSBDC to Host Lender Panel for Businesses – McNeese State University

McNeese LSBDC to Host Lender Panel for Businesses  McNeese State University Source link

54 minutes ago

Ro Khanna calls on Democrats to reclaim identity as ‘the anti-war party’ | Democrats

In the days since Donald Trump authorized strikes on Iranian nuclear sites, and then forged…

1 hour ago

Israel halts aid, official says, as Palestinian clans in Gaza deny Hamas is stealing it

Israel has halted aid supplies to Gaza for two days to prevent them being seized…

1 hour ago

Fantasy Baseball Prospects Report: Chase DeLauter pushing for a promotion; Emmet Sheehan poised to return soon

Kristian Campbell is a minor-leaguer again. It's not the way anyone thought his rookie season…

1 hour ago