Capturing and Countering Threats to National Security: a Blueprint for an Agile AI Incident Regime

As of January 2025, frontier AI systems possess some capabilities that threaten national security, with OpenAI denoting its Deep Research AI system as posing a “medium” cyber and chemical, biological, radiological, and nuclear (CBRN) risk. Although the capabilities of frontier AI systems have generally advanced significantly over the past five years, it is unclear how precisely these dangerous capabilities will advance in the future and, correspondingly, what impacts that may yield. Notwithstanding that, it is likely that impacts from frontier AI systems possessing capabilities that threaten national security would severely undermine existing preparedness. Therefore, it is prudent for states to build up the capacity to robustly (i) track the national security threats that future frontier AI systems may pose and (ii) swiftly execute countermeasures to contain and neutralize these threats.

In a new research paper, we propose a three-pronged approach to an AI incident regime that supports the establishment and implementation of such a state capacity.

We justify our AI incident regime proposal by demonstrating that each component mirrors US domestic incident regimes in three other sectors that could pose extreme risks to national security: nuclear power, aviation, and life science dual-use research of concern.  

Figure A: A summary of our proposal from our recent paper ‘AI threats to national security can be countered by an incident regime

Preparatory Steps Before An Incident Occurs

In the first part of the proposal, we focus on preparatory steps that lay the foundation for quick identification and remediation should an incident occur. In doing so, we make a novel contribution to the debate around how ‘incident’ should be defined, tying it to pre-deployment AI safety cases. We suggest that an incident is defined as ‘any situation that invalidates the AI safety case corresponding to the AI system in question by weakening a claim made in the safety case.’

First, we suggest that AI developers should create a ‘national security case’ for any frontier system they deploy. A national security case is a concept adapted from an AI safety case – it is a series of claims in a structured argument demonstrating that a given AI system doesn’t threaten national security. 

Second, we propose to use national security cases in a novel way to determine what counts as an incident: we suggest an event should be considered an incident if it weakens a claim made in a national security case. This novel operationalization captures the most extreme risks posed by AI systems while offering a more precise algorithm for determining whether a given event is an incident than an existing definition in the literature. As such, it allows a government agency to be notified more quickly about incidents. 

We propose that this operationalization of an ‘AI incident’ threatening national security aids in treading a narrow corridor between missed threats and unnecessary alarms. As an analogy, consider a company handling cybersecurity incidents: too narrow a definition and the company will not spot all relevant cyberattacks; too wide and the company will create unnecessary work for their cybersecurity team; too complicated a definition and it will take them longer to determine whether the event counts as an incident, giving the cyber attackers more time to cause harm.

‘Rapid Response Phase’:  Immediate Containment Action

The second part of our proposal concerns the actions AI developers and governments should take immediately after an incident is discovered. We suggest that, after discovering an incident, AI developers notify the relevant government agency within 24 hours and that, where suitable, this agency coordinates a cross-government containment response. 

For example, suppose an AI developer discovers that a malicious actor is using their AI system to assist in creating a bioweapon. In that case, the AI developer should report this to a government agency. If necessary, this government agency will coordinate containment and response between different government departments or teams—e.g., pulling together different teams with expertise in AI security and safety, bioweapons, and pandemics.

‘Hardening Defenses Phase’:  Countering Future Threats to National Security

Finally, the third part of the proposal concerns itself with countering future threats to national security, inspired by learnings and investigations from the incident. We suggest that a government agency should have the authority to investigate the cause of an incident if they deem it appropriate and the power to request whatever information is necessary to complete the investigation – e.g., grey-box and some white-box access to the problematic AI system. We also suggest that, after the investigation, the government agency should make recommendations on changes to AI providers’ security and safety procedures if they deem it appropriate. Further, we argue that the government agency should have the authority to require AI providers to implement these recommendations to counter future threats to national security.

For example, in the aforementioned bioweapon incident, the government agency may demand documentation and access to the AI system from the developer to determine how the malicious actor overcame the safety guardrails. Following this investigation, they may require AI developers to amend their jailbreak classifiers to prevent similar threats to national security from occurring in the future. 


Read the full paper here.

Previous
Previous

AI Behind Closed Doors: a Primer on The Governance of Internal Deployment

Next
Next

Interpretability in Parameter Space: Minimizing Mechanistic Description Length with Attribution-based Parameter Decomposition