
美国政府为何关闭Anthropic最新的Claude人工智能模型
Why the US government shut down Anthropic’s latest Clau…
An “export control directive” for Anthropic’s Fable and Mythos models highlights the chaotic, fast-changing state of AI regulation.
针对Anthropic的Fable和Mythos模型的“出口管制指令”凸显了人工智能监管处于混乱、快速变化的状态。
On June 12, artificial intelligence (AI) lab Anthropic suspended access to its latest Claude models, Fable 5 and Mythos 5, which had been released three days earlier.
6月12日,人工智能实验室Anthropic暂停了其最新款Claude模型——Fable 5和Mythos 5的访问权限,这些模型是在三天前发布的。
The move came in response to an “export control directive” from the US government prohibiting use of the models by anyone who is not a US national.
此举是为了回应美国政府的一项“出口管制指令”,该指令禁止非美国公民使用这些模型。
Mythos is Anthropic’s most powerful, or “frontier”, model. When first announcing the model in April, the company said it was too good at hacking to release immediately. Instead, Mythos was made available to a handful of organisations (mostly US tech corporations) to use to patch weaknesses in essential digital systems.
Mythos是Anthropic最强大,或称“前沿”的模型。该公司在四月份首次宣布该模型时表示,其破解能力过于强大,无法立即发布。因此,Mythos最初只提供给少数组织(主要是美国科技公司)使用,用于修补关键数字系统中的漏洞。
Fable is the same basic model, but with added safeguards meant to stop it being used for cybersecurity purposes. This is what was released to the public last week – and almost immediately shut down.
Fable与该模型基础相同,但增加了旨在防止其被用于网络安全目的的保护机制。这就是上周公开发布的版本——但它几乎立即就被叫停了。
Anthropic and the Trump administration at loggerheads
Anthropic与特朗普政府陷入对立
Since early 2025, Anthropic and the Trump administration have been in escalating conflict. The administration has accused Anthropic of making “woke AI” and called chief executive Dario Amodei an “ideological lunatic”.
自2025年初以来,Anthropic和特朗普政府一直处于日益升级的冲突中。该政府指控Anthropic制造“觉醒式AI”,并称首席执行官达里奥·阿莫代(Dario Amodei)为“意识形态上的疯子”。
Early disagreements concerned AI regulation and semiconductor export policy. The dispute sharpened when Anthropic declined to let the Pentagon use its models for domestic surveillance and fully autonomous weapons systems.
最初的分歧涉及人工智能监管和半导体出口政策。当Anthropic拒绝允许五角大楼将其模型用于国内监控和全自主武器系统时,争端加剧了。
The Department of Defense responded by threatening to designate Anthropic a “supply chain risk”, a classification that would have required military contractors to sever ties.
国防部对此的回应是威胁将Anthropic指定为“供应链风险”,这种分类要求军事承包商必须切断与该公司的联系。
Jailbreaks
越狱
The US government has not yet publicly stated the reason for last week’s directive, but Anthropic it says it believes the government became aware of a jailbreak: a method for circumventing the safeguards in Fable that prevent using its most powerful features for nefarious purposes.
美国政府尚未公开说明上周指令的原因,但 Anthropic 表示,它认为政府已经获知了一种“越狱”方法:这是一种绕过 Fable 安全机制的方法,从而防止利用其最强大的功能进行恶意目的。
These safeguards classify user requests as safe or unsafe before passing them to the AI model. When triggered, the safeguards redirect the request to a less powerful model.
这些安全机制会在将用户请求传递给 AI 模型之前,将其分类为安全或不安全。一旦触发,这些安全机制会将请求重定向到一个能力较弱的模型。
The government’s concern, according to Anthropic, was that the safeguards could be bypassed to extract information useful for cyberattacks.
根据 Anthropic 的说法,政府的担忧是,如果绕过这些安全机制,可能会提取出对网络攻击有用的信息。
Guardrails for large language models aren’t bulletproof. They mostly depend on the model’s own capacity to interpret the user’s intentions in making a request.
大型语言模型的护栏并非万无一失。它们主要依赖于模型自身解读用户意图的能力来处理用户的请求。
Beyond the inherent difficulty of this task, a large online community (which my colleagues and I call the Undersphere) is working hard to circumvent AI guardrails. Anthropic acknowledges that “perfect jailbreak resistance is not achievable for any current model provider”.
除了这项任务固有的难度之外,一个庞大的在线社区(我的同事们和我称之为“地下圈”)正在努力绕过 AI 的安全护栏。Anthropic 承认,“任何当前的模型提供商都无法实现‘完美的越狱抵抗’”。
Anthropic says the research behind the government directive appears to have been produced by engineers at Amazon, which is both a rival to Anthropic and a significant investor.
Anthropic 说,关于政府指令的研究似乎是由亚马逊的工程师制作的,而亚马逊既是 Anthropic 的竞争对手,也是重要的投资者。
But this was not the only relevant jailbreak. Within 48 hours of Fable’s release, a researcher using the pseudonym “Pliny the Liberator” published what they identified as Fable 5’s full system prompt to X and GitHub repository.
但这并非唯一的相关“越狱”案例。在 Fable 发布后的 48 小时内,一位使用化名“解放者普林尼”(Pliny the Liberator)的研究人员,将他识别为 Fable 5 的完整系统提示(system prompt)发布到了 X 和 GitHub 仓库。
The system prompt is a hidden set of instructions that helps determine an AI model’s behaviour. It’s unclear exactly how knowledge of Fable’s system prompt could be used in practice, but it has drawn attention in the Undersphere.
系统提示是一组隐藏的指令,有助于确定 AI 模型行为的方式。目前尚不清楚了解 Fable 系统提示在实践中具体如何使用,但它已经在“地下圈”引起了关注。
A surprise – and an ongoing mystery
一个惊喜——也是一个持续的谜团
The deepest problem of making large language models such as Fable secure is that we don’t fully know how they work. According to Oxford University economist and machine learning expert Maximilian Kasy, they work much better than they “should”.
让像Fable这样的大型语言模型安全的核心难题在于,我们并不知道它们是如何运作的。根据牛津大学经济学家和机器学习专家马克西米利安·卡西(Maximilian Kasy)的说法,它们的表现远超“应有的”水平。
Large language models have billions of internal parameters and are trained on unimaginably vast piles of data using machine learning methods. According to Kasy, we would expect such systems to be “overfitted”: good at reproducing patterns in their training data, but bad at generalising to new situations.
大型语言模型拥有数十亿个内部参数,并且使用机器学习方法在难以想象的海量数据堆上进行训练。卡西指出,我们本应期望这类系统是“过拟合”的:它们擅长重现训练数据中的模式,但却不擅长推广到新的情境中。
However, modern systems such as Claude and ChatGPT do seem to be able to generalise. Kasy likens modern AI development to alchemy: successful through trial and error, not yet grounded in systematic theory.
然而,像Claude和ChatGPT这样的现代系统似乎确实能够做到泛化。卡西将现代人工智能的发展比作炼金术:它通过反复试验取得成功,但尚未建立在系统的理论基础之上。
As a result, the behaviour of AI models is partly opaque even to their builders.
因此,AI模型的行为对甚至对其构建者来说也是部分不透明的。
Hard to regulate
难以监管
The opacity of the technology is one key reason it’s so hard to regulate. Governments lack independent access to the data, infrastructure and expertise they would need to evaluate proprietary frontier models.
该技术的不透明性是它如此难以监管的一个关键原因。各国政府缺乏独立获取评估专有前沿模型所需的数据、基础设施和专业知识的途径。
The US administration’s recent executive order on AI security, published two weeks ago, reflects this realisation. As the administration has realised the power of frontier AI models, it has moved from an initial hands-off posture to asking developers to share their models for review before release.
美国政府两周前发布的关于人工智能安全的行政命令反映了这一认识。随着政府意识到前沿AI模型的巨大潜力,其立场已从最初的不干预转变为要求开发者在发布前分享模型接受审查。
That demand is an implicit admission that the administration does not trust the companies to evaluate, fully and comprehensively, what their own models can do and how they might be misused. The public sees even less, and the consequence is measurable: a survey taken across 25 countries last year found people are, on balance, more than twice as concerned about AI as they are excited about it.
这种要求隐含地承认,政府不信任企业能够全面、彻底地评估其自身模型的能力及其可能被滥用的方式。公众的认知甚至更少,而后果是可衡量的:去年在一项覆盖25个国家的调查中发现,人们对人工智能的担忧程度,总体而言,超过了兴奋程度的两倍。
The future of AI safety
人工智能安全的未来
AI is a hugely hyped technology. But there is no doubt it is also extremely powerful and unpredictable. Understandably, this combination is very dangerous.
人工智能是一项被过度炒作的技术。但毫无疑问,它也极其强大且不可预测。可以理解的是,这种结合非常危险。
We cannot rely on regulations, as technology will develop more quickly than they can adapt. Nor can we rely on guardrails, as they will be bypassed.
我们不能依赖监管,因为技术的发展速度会快于监管的适应速度。我们也不能依赖安全护栏,因为它们会被绕过。
We need a governance framework built for that eventuality: one that can predict and address the consequences of failure.
我们需要一个为这种可能性而构建的治理框架:一个能够预测并应对失败后果的框架。
Such a framework must be global, participatory, and founded on reciprocal trust. These are things the current US administration has shown little capacity to generate.
这样的框架必须是全球性的、参与式的,并且建立在互信的基础上。这些都是当前美国政府缺乏生成能力的方面。
Francesco Bailo has received funding from Meta (2019) and from Australia’s Department of Defence (2023) .
Francesco Bailo获得了Meta(2019年)和澳大利亚国防部(2023年)的资助。

