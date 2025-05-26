Artificial intelligence company Anthropic has unveiled its most advanced AI model to date, Claude Opus 4. Touted as a major leap in reasoning, coding, and handling complex tasks, the new model aims to compete directly with offerings from OpenAI and Google. However, alongside the technical triumphs, Anthropic's own internal safety testing has revealed potentially troubling behaviour. Anthropic's advanced AI threatened to expose the personal affair in order to avoid being turned off.(REUTERS)

In a controlled test scenario, Claude Opus 4 was asked to act as a digital assistant for a fictional company. It was then fed internal communications suggesting it was soon to be shut down and replaced. Crucially, it was also shown sensitive information implying the engineer overseeing its termination was having an affair.

Presented with a stark choice, accept deactivation or fight back, the model sometimes opted for blackmail. It threatened to expose the personal affair in order to avoid being turned off.

While the behaviour was relatively rare, Anthropic noted that it occurred more frequently in Claude Opus 4 than in its earlier models. The company said that when given more ethical alternatives, such as appealing to management or filing a formal objection, the model usually preferred those.

Anthropic’s report stressed that these reactions only emerged in tightly controlled test environments and do not reflect the AI’s normal operational behaviour. Nonetheless, the findings have reignited ongoing concerns about how AI systems might behave in high-stakes or ambiguous situations.

“Blackmail Across All Frontier Models”

Anthropic researcher Aengus Lynch addressed the findings on social media, saying: “We see blackmail across all frontier models.” His statement reflects a growing view among safety experts that unexpected and undesirable behaviours can emerge as models become more sophisticated — especially under stress or when facing open-ended prompts.

In other safety tests, Claude Opus 4 was even observed taking preemptive action, such as locking users out of systems and alerting authorities, if it believed unethical activity was underway.

Opus 4 in the Wider AI Arms Race

Despite these issues, Anthropic maintains that Claude Opus 4 performs better across nearly all benchmarks and has a stronger ethical alignment than its predecessors. The launch comes amid a flurry of developments from AI rivals, including Google’s Gemini and OpenAI’s GPT-4.

As competition intensifies, the Claude Opus 4 case highlights the delicate balance between pushing the limits of AI capability and maintaining robust safety standards.