Blog

Marius Hobbhahn 03/07/2025 Marius Hobbhahn 03/07/2025

Research Note: Our scheming precursor evals had limited predictive power for our in-context scheming evals

We tested whether our precursor evaluations were predictive of our scheming evaluations. We found that they were not sufficiently predictive for high-stakes situations

Marius Hobbhahn 19/06/2025 Marius Hobbhahn 19/06/2025

More capable models are better at in-context scheming

We evaluate models for in-context scheming using the suite of evals presented in our in-context scheming paper (released December 2024) with the most capable new models.

Marius Hobbhahn 17/03/2025 Marius Hobbhahn 17/03/2025

Claude Sonnet 3.7 (often) knows when it’s in alignment evaluations

We evaluate whether Claude Sonnet 3.7 and other frontier models know that they are being evaluated.

Marius Hobbhahn 24/02/2025 Marius Hobbhahn 24/02/2025

Forecasting Frontier Language Model Agent Capabilities

We present a new forecasting technique to predict frontier LM agent capabilities ahead of time.

Marius Hobbhahn 23/01/2025 Marius Hobbhahn 23/01/2025

Demo example - Scheming reasoning evaluations

A brief demonstration video show-casing a representative example of our in-context scheming evaluations.

Marius Hobbhahn 13/12/2024 Marius Hobbhahn 13/12/2024

Apollo 18-month update

Apollo Research is now 18 months old. You can read our latest update here.

Marius Hobbhahn 13/11/2024 Marius Hobbhahn 13/11/2024

Apollo is adopting Inspect

Apollo is adopting Inspect as its evals framework. We will contribute features and potentially example agent evals to Inspect and look forward to work with the Inspect community

Marius Hobbhahn 11/11/2024 Marius Hobbhahn 11/11/2024

The Evals Gap

The quality and quantity of evals required to make rigorous safety statements could outpace available evals. We explain “the evals gap” and what would be required to close it.

Marius Hobbhahn 15/10/2024 Marius Hobbhahn 15/10/2024

An Opinionated Evals Reading List

A long reading list of evals papers with recommendations and comments by the evals team.

Ch Stix 21/06/2024 Ch Stix 21/06/2024

Our current policy positions

In this post, we share five high-level policy positions that have been important recurrent themes in our thinking and conversations with international decision-makers.

Marius Hobbhahn 29/05/2024 Marius Hobbhahn 29/05/2024

The first year of Apollo Research

A summary of what we have achieved in our first year and what we plan to do in the future.

Marius Hobbhahn 04/04/2024 Marius Hobbhahn 04/04/2024

Black-Box Access is Insufficient for Rigorous AI Audits

We were delighted to collaborate on the paper “Black-Box Access is Insufficient for Rigorous AI Audits.”

Guest User 21/03/2024 Guest User 21/03/2024

Our work advancing scientific understanding to foster an effective international evaluation ecosystem

Evaluations of AI systems for dangerous capabilities and misalignment with human intentions prove increasingly central to emerging international governance regimes. In this post, we discuss how we aim to support this through knowledge-sharing and upskilling among evaluation stakeholders.

Marius Hobbhahn 22/01/2024 Marius Hobbhahn 22/01/2024

We need a Science of Evals

In this post, we argue that if AI model evaluations (evals) want to have meaningful real-world impact, we need a “Science of Evals”, i.e. the field needs rigorous scientific processes that provide more confidence in evals methodology and results.

Chris Akin 08/01/2024 Chris Akin 08/01/2024

A starter guide for Evals

This is a starter guide for model evaluations (evals). Our goal is to provide a general overview of what evals are, what skills are helpful for evaluators, potential career trajectories, and possible ways to start in the field of evals.

Chris Akin 13/11/2023 Chris Akin 13/11/2023

Theories of Change for AI Auditing

In this post, we present a theory of change for how AI auditing could improve the safety of advanced AI systems. We describe what AI auditing organizations would do; why we expect this to be an important pathway to reducing catastrophic risk; and explore the limitations and potential failure modes of such auditing approaches.

Chris Akin 11/10/2023 Chris Akin 11/10/2023

Recommendations for the next stages of the Frontier AI Taskforce

We recently composed and shared two policy recommendations with the Frontier AI Taskforce (“Taskforce”), in light of their mission to “ensure the UK’s capability in this rapidly developing area is built with safety and reliability at its core”. Our recommendations address the role of the Taskforce as a potential future (1) Regulator for AI Safety, and encourage the Taskforce to (2) Focus on National Security and Systemic Risk.

Chris Akin 04/10/2023 Chris Akin 04/10/2023

The UK AI Safety Summit - our recommendations

Apollo Research is an independent third-party frontier AI model evaluations organisation. We focus on deceptive alignment, where models appear aligned but are not. We believe it is one of the most important components of many extreme AI risk scenarios. In our work, we aim to understand and detect the ability of advanced AI models to evade standard safety evaluations, exhibit strategic deception and pursue misaligned goals.

Chris Akin 25/09/2023 Chris Akin 25/09/2023

Understanding strategic deception and deceptive alignment

We want AI to always be honest and truthful with us, i.e. we want to prevent situations where the AI model is deceptive about its intentions to its designers or users. Scenarios in which AI models are strategically deceptive could be catastrophic for humanity, e.g. because it could allow AIs that don’t have our best interest in mind to get into positions of significant power such as by being deployed in high-stakes settings. Thus, we believe it’s crucial to have a clear and comprehensible understanding of AI deception.

Chris Akin 26/07/2023 Chris Akin 26/07/2023

Security at Apollo Research

A brief overview of Security at Apollo Research