How the UK AI Safety Summit Can Make AI Safer: A Single Policy Proposal

Oct 31, 2023

On 1-2 November 2023, the UK government will host the first global AI Safety Summit in Bletchley Park. The government stated its goal for the summit is to “focus on risks created or significantly exacerbated by the most powerful AI systems, particularly those associated with the potentially dangerous capabilities of these systems” and to “inform rapid national and international action at the frontier of Artificial Intelligence (AI) development”.

Several countries and the EU are already taking action to regulate AI, especially in light of growing expert consensus on the catastrophic effects that some AI development can pose. In September 2023, the EU Commission explicitly recognised AI as an extinction risk. The US government has taken first steps by holding Senate hearings with key researchers and CEOs of leading AI companies to collect evidence about extreme risks from advanced AI. In these hearings, Anthropic CEO Dario Amodei warned that in the medium-term future, AI could “[enable] many more actors to carry out large-scale biological attacks” by helping non-experts access bioweapons production knowledge, which he called a “grave threat to U.S. national security”. In the same hearing Yoshua Bengio, renowned computer scientist and deeplearning pioneer, said he was worried by the unexpectedly fast progress of current frontier AI models, and called for international collaboration to prevent humans losing control over AI. The UK even created a taskforce purely focused on the safe development of “frontier” AI models, with the goal to “steer the responsible and ethical development of cutting-edge AI solutions”. The Frontier AI Taskforce is headed by tech entrepreneur Ian Hogarth, who urged governments to “slow down the race to god-like AI” in the Financial Times.

Considering the summit’s goals and the magnitude and urgency of the risk some advanced AI development poses r, the UK AI Safety Summit is a unique opportunity to capitalise on that momentum and lay the groundwork for international coordination to regulate AI.

At this point, in early October 2023, the UK has not publicly announced any concrete goal or intended outcome of the Summit. Nonetheless, it is likely the Summit will produce a consensus statement on taking action to mitigate the risks from frontier AI systems, signed by some or all participants, akin to past international summits on e.g. climate change. Such a consensus statement will be crucial in shaping the trajectory of future international AI safety measures. To utilise this unique opportunity and set international AI safety collaboration off to a good start, what should this consensus statement contain? In this piece, I recommend three key principles, in increasing order of ambition.

Principle 1: Signatories of the consensus statement should agree on the shared principle of monitoring and regulating the use of computational resources (compute) for advanced AI development. Tracking and controlling access to compute is a convenient policy tool for reducing risk from rapidly increasing AI capabilities: Compute is a limited resource with a highly specialised, international production chain. It can be quantified, by counting the number of floating-point operations (FLOP) used to train a given model. Its use can be tracked and supervised through various measures including requesting information from cloud providers themselves, as well as tracking sales data of GPU units and requiring KYC measures for cloud providers. Lastly, the number of FLOP used to train an AI model is a sufficient proxy for how powerful the model is, and therefore for its likelihood to exhibit unexpected dangerous capabilities.

It is important to note here that even the builders of ever-larger AI models like GPT-4 have no way of knowing what new dangerous capabilities their new models will have. And as models grow more powerful, they exhibit more unexpected capabilities, including dangerous ones. By limiting access to compute we can limit the development of unprecedentedly large models until we better understand how to build and control them safely. If the Summit participants can agree on this first principle of monitoring and regulating the use of compute, they will have taken a significant step towards making AI safe by slowing down the rapid race towards ever more powerful, but poorly understood AI systems.

Principle 2: Signatories should agree on a concrete compute threshold (measured in FLOP) above which AI development should be highly restricted, monitored, or even prohibited. Defining the regulation threshold by adding a concrete threshold measured in FLOP to the commitment to regulate compute access would send a strong signal that prohibiting dangerous AI is feasible, encouraging more countries and organisations to take action. Furthermore, establishing a concrete number that signatories have agreed on would pave the way towards a binding international treaty on compute control going forward. This threshold could e.g. be set at 10^24 FLOP per model training run. 10^24 FLOP is an amount of compute that falls between the compute that OpenAI used to train GPT-31, a ChatGPT precursor, and GPT-42, the model behind ChatGPT Plus. This threshold would only affect less than 10 AI providers in the world, and only their largest and riskiest projects.

Step 3: Ideally, signatories would go even further and anchor in the consensus statement binding next steps towards a multilateral international treaty on surveilling and prohibiting the use of compute for training frontier AI models above a certain threshold value.

To achieve any of these goals, ranging from the least to the most ambitious, the UK and other summit participants should rely on building an Alliance of the Willing, and not prioritise bringing everyone into the fold at the expense of committing to concrete measures. This way, safety-minded participating countries can ensure that this exceptional opportunity to build international cooperation on guaranteeing global AI safety is not wasted by a watered-down, directionless consensus statement, which has happened in the past with other international summits. Leading global AI powers like the US and the UK, as well as global regulatory powerhouses like the EU with its AI Act have already shown that they want to regulate AI to make it safer. The Summit also presents a window of opportunity to bring China into the fold. China has recently adopted strict regulations for their domestic AI industry, far stricter than what is in place in the UK or US, indicating that it may be possible to find common ground on committing to regulation to reduce catastrophic risks from AI.

In short, the governments of global AI leaders should use the Summit to forge an Alliance of the Willing, which should craft and sign a consensus statement that commits to tracking and regulating compute use for frontier AI training runs. Ideally, this consensus statement should include a concrete upper compute threshold (e.g. 10^24 FLOP), and signatories should commit to tangible first steps towards a legally binding international treaty on regulating access to compute for training runs of large frontier AI models.

Image credit: Unsplash

GPT-3 was released in 2022 and trained on approximately 10^23 FLOP. The original ChatGPT was based on the model GPT-3.5, an improved version of ChatGPT. The number of FLOP used to train GPT-3.5 is not publicly available.

GPT-4 is OpenAI’s most capable publicly released language model. It was released in March 2023 and trained on approximately 10^25 FLOP of compute resources, according to an estimation by EpochAI.

Eva’s Substack

How the UK AI Safety Summit Can Make AI Safer: A Single Policy Proposal