Our current policy positions

21 Jun

Written By Ch Stix

The governance team at Apollo Research communicates our technical work to governments (e.g., on evaluations, AI deception, and interpretability) and develops recommendations for international organisations and individual governments around our core research areas.

Over the past six months, we have compiled a portfolio consisting of policy submissions responding to requests for information, tailored recommendations, and advice to ensure the robustness of international governance frameworks underpinned by evaluations. In doing so, we met and engaged with various senior decision-makers, including individuals at the EU AI Office, the UK AI Safety Institute, Singapore’s Infocomm Media Development Authority, the UN’s High-Level AI Advisory Body, and a range of relevant US institutions.

Below, we are sharing five high-level policy positions that have been important recurrent themes in our thinking and conversations.

Evaluations alone are insufficient. We emphasise that evaluations alone are insufficient to assure the safety of an AI system and that a rigorous ‘defence-in-depth’ framework needs to be implemented. This builds on our thinking in ‘We need a Science of Evals’ and includes communicating on, e.g.:

Limitations of currently available evaluation regimes;
Frameworks for repeatable, robust and independent verification and assurance of evaluations and their results;
Apollo Research’s mechanistic interpretability research and the importance of interpretability-based evals.

Conduct evaluations across an AI system’s lifecycle. We emphasise that evaluations and other safety interventions should be leveraged at all risk-relevant stages of an AI system’s lifecycle to continuously assure its safety. This builds on our thinking in ‘A Causal Framework for AI Regulation and Auditing’ and includes communicating on, e.g.:

The importance of safety cases and evaluations prior to internal deployment, which can include, e.g. (i) the running of adequate evaluation before / during / and after training, (ii) detailing implemented and planned restrictions or guardrails, and (iii) plans for monitoring of the AI system;
Risk assessments and evaluations in consideration of ‘available affordances’ which would change the original risk profile of the AI system.

Connect evaluation protocols and their results to incident databases. We emphasise the benefit of connecting evaluation protocols and their results prior to an AI system’s deployment with structured data collection on incidents and harms arising from the same AI systems in deployment. This includes communicating on, e.g.:

Mechanisms for governments to continue investment in effective risk assessment regimes and decommission ineffective interventions (including specific evaluations or evaluators);
Usage of incident data to inform funding and research decisions to close the ‘evaluation gap’ arising between the pace of capability progress and corresponding evaluation development to detect those capabilities;
Establishing whistleblower channels and bounty schemes.

Build and support a flourishing third-party evaluation ecosystem. We emphasise that the third-party evaluation ecosystem is still in its early stages of development. This makes the current moment opportune to consciously shape the ecosystem’s incentive structure and strengthen its ecosystem’s ability to meaningfully support governance frameworks tied to evaluations. This is the backbone to some of our public writing and includes communicating on, e.g.:

Professionalisation of the field, including through (i) establishing a code of conduct for evaluators, (ii) developing shared norms for evaluation methods, (iii) setting up a network of selected evaluators; and eventually (iv) the certification of third-party evaluators;
Mechanisms to oversee and ‘evaluate the evaluators’.

Leverage evaluations and capability monitoring to foresee and assess threats that necessitate timely and coordinated action at an international level. We emphasise the importance of global coordination and robust information exchange to monitor AI capabilities progress and identify arising threats. This builds on our submission to the UN High-Level Advisory Board on AI and includes communicating on, e.g.:

Monitoring and tracking efforts which could be leveraged to (i) inform internationally relevant threat modelling and assessment work, to feed into (ii) appropriate governance interventions such as cross-border licensing regimes or international treaties, and to (iii) enable rapid coordinated responses to catastrophic vulnerabilities or critical incidents;
Foresight opportunities and interventions, such as developing clearer ‘red-lines’ on an international scale.

Until the end of the year, we plan to expand our research capacity on a selection of the aforementioned topics, including through collaboration with other organisations in the field. This will support our future engagements and, where appropriate, result in reports and other publications.

About Us: Apollo Research is a non-profit research organisation specialising in evaluations for dangerous capabilities. Our current focus is on evaluating the capability for AI systems to evade human oversight and control, for example through deceiving either the user or its designer, and the prerequisites to this capability such as situational awareness.

We conceptualise deception as a horizontal layer underpinning other risks and capabilities an AI system may have, amplifying and obfuscating them. Concretely, deceptive abilities can enable AI systems undergoing testing for other capabilities, such as cyber-offensive capabilities, to engage in deceptive behaviour during safety assessments, thereby compromising the efficacy and reliability of testing. This makes our work sector agnostic and salient across a range of risk scenarios.

Please contact us if you are interested in learning more: info@apolloresearch.ai

Ch Stix

Our current policy positions

An Opinionated Evals Reading List

The first year of Apollo Research