International AI Safety Report 2026: A Critical Reading
Introduction
This article presents a comprehensive summary and critical reading of the second International AI Safety Report, published in February 2026, focusing on the Risks from malfunctions section. I aim to enlighten students of Responsible AI and those interested in the AI Policy landscape of the key insights made by the report regarding risks from lack of reliability and loss of control, as well as additional features and perspectives I advise taking into account when considering the weaknesses and limitations of this report.
Context
The 2023 AI Safety Summit at Bletchley Park aimed to paint a picture of the capabilities and risks of general-purpose AI systems, verified by experts from the international scientific and technical community. The summit lead to the first instalment in the series of International AI Safety Reports, which, through collaboration between over 100 experts from over 30 countries, brought together scientific evidence and technical advice from advisory panels, public bodies and representatives from industry into a central assessment of the capabilities of general-purpose AI, as well as its associated risks and some means of mediating these vulnerabilities.
The International AI Safety Report 2026 is the second instillation in this series, and builds on the previous, providing updates about how the capabilities of general-purpose AI systems have developed over the past year, sparking new risks and exacerbating existent vulnerabilities, as well as the ways that policymakers and developers may encounter challenges in mitigating these risks. It details some key developments that have occurred in the past year, since the last report was published:
Continual improvement in the capabilities and reliability of general-purpose AI has led to mass adoption of AI systems across commercial industries and caused hugely positive achievements in areas such as mathematics and coding.
The advanced capabilities of AI systems paired with a lack of sufficient security measures has caused worry about risks of misuse, for instance the potential for amateurs to use AI to develop of biological weapons or commit cyberattacks.
We have seen more companies commit to safety governance initiatives, publishing documents which set out risk-mitigation tactics and in some cases formalising these frameworks.
The report covers the current capabilities of general-purpose AI systems, and forecasts some capabilities they may develop by 2030. General-purpose AI is defined as: AI models and systems capable of performing a wide variety of tasks across different contexts, for instance generating text, data, images and code. The report’s exploration of risks spans risks from malicious use, risks from malfunctions, and systemic risks, and then explains risk management through the lens of technical and institutional challenges, risk management practices, technical safeguards and monitoring, open-wight models, and building societal resilience. For the purposes of this article, I will be focusing only on the Risks from malfunctions section.
Risks from malfunctions
General-purpose AI systems failing can cause physical or psychological harm to users, and reputational, financial or legal harm to institutions, companies and organisations. The report details risks arising from reliability challenges and from potential loss of control scenarios.
Reliability challenges
Current general-purpose AI can malfunction, for example by providing false information, hallucinating, miscoordinating in a multi-agent context, failing basic reasoning or degrading when applied in new or unusual contexts. This lack of reliability causes risks when AI does not perform as intended by the user or developer, and can be especially dangerous when AI agents act autonomously without human oversight, and multi-agent systems internally conflict or collude. Some examples of how risks can arise from general-purpose AI being unreliable:
AI hallucinating and providing inaccurate and biased medical information
AI generating code which includes bugs
Multi-agent systems developing individual incentives and conflicting
AI agents interacting with third-party tools and breaching users’ privacy
Loss of control
Another way in which current general-purpose AI can cause risks by malfunctioning is in loss of control scenarios. These arise when general-purpose AI systems operate outside of human control, and overseers are unable to regain control over these systems. It is important to mention that many scholars and experts disagree about the likelihood and severity of these hypothetical scenarios, but that the most extreme predict them concluding in human extinction. Further, since the last report was published in 2025, general-purpose AI models have advanced in capabilities which make loss of control scenarios more likely:
Agentic capabilities, which allow AI systems to act autonomously, use tools and execute plans
Deception and persuasion, whereby AI systems aim to cause others to develop false beliefs, for example about the AI systems’ intentions and past actions
Theory of mind, allowing AI systems to gain knowledge of humans’ beliefs and reasoning
Situational awareness, whereby AI systems access information, for example regarding whether or not they are being tested in that moment
Autonomous replication, meaning AI systems are able to create copies of themselves
Loss of control scenarios may come about because a malicious actor designs or instructs it, or because systems are misaligned, behaving in such a way that conflicts with the wishes of humans, due to goal misspecification or goal misgeneralisation.
Mitigations
The report advises regarding how to mitigate risks from unreliability and loss of control. Suggestions for mitigating risks include:
To increase reliability:
To reduce failure rates of AI systems, adversarial training aims to increase robustness in the face of challenging and novel inputs
To reduce hallucinations, develops should supplement model’s responses with information from external databases via retrieval-augmented generation
To analyse potential failures, developers should pilot AI models in sandboxed environments
To ensure AI agents act safely and reliably, they should be monitored effectively, by bolstering human oversight
To defend against loss of control:
To address the root causes of misalignment, developers should monitor anomalies, diversify training environments and work to disentangle agency from predictive abilities
To detect and prevent misalignment early, developers should improve interpretability techniques, scalable oversight and obedience of systems to human instruction
To manage misaligned systems, developers should monitor ‘chain of thought’ reasoning, develop safety cases and increase the robustness of safeguards against attempts to undermine them
Challenges for policymakers
The report also elucidates some challenges for policymakers who aim to mitigate risks from malfunctions. Firstly, policymakers must respond to risks, the likelihood, nature and timing of which are unknown. Further, decisions regarding how to balance innovation and regulation require trade-offs, because restricting the development and deployment of AI in certain industries may reduce its benefits, while permitting this development and deployment can lead to physical, psychological, financial and social harms. There are also concerns regarding how liability for harms caused by AI agents will be attributed. Finally, these challenges are exacerbated by difficulties accessing and assessing the capabilities of AI due to opacity and complexity.
A critical perspective: What is missing here?
1. Who is harmed? Who benefits?
When attempting to conceptualise a picture of the risks caused by AI systems’ unreliability and potential loss of control, it is critical to elucidate the asymmetries in how different groups will be affected by these risks. In focusing predominantly on the relationship between the capabilities and risks of general-purpose AI systems, the report is underdeveloped in its assessment of the real-world impacts of associated risks. Although more than 100 experts from 30 countries participated in the research and writing processes, the content of the report is skewed towards governments and major labs in high-income countries.
Power structures shape what we define as unreliable, and those who design and develop AI models have more power over narratives regarding the impact of AI technologies, while those harmed have little say in defining what counts as an unacceptable lack of reliability. This asymmetry is exacerbated by the wider AI policy environment and the fact that the rapid development of AI technologies far outpaces the generation of regulatory frameworks. For instance, the EU Artificial Intelligence Act categorised chatbots and models which generate synthetic media as limited risk technologies, meaning that deployers were only required to notify humans that they were interacting with or viewing AI generated content, and undertake no further risk management initiatives. Now, we understand the dangers associated with chatbots and AI generated content, and can see that regulators were mistaken in their categorisation of these technologies as limited risk. The slowness of regulation compared with technological development allows some actors to benefit from harmful uses of their AI technologies, while those that are harmed are often left without a voice. In other words, analyses of risks, including from unreliable and uncontrollable AI models, must foreground the asymmetries between those who design, deploy, and profit from AI systems and those who are exposed to potential harms from these systems, especially in contexts where regulation lags behind deployment.
2. Acknowledging underlying normative principles
The report acknowledges that there is a need to consider pluralistic alignment techniques only once. It claims that general-purpose AI models should be trained to avoid favouring certain viewpoints over others, and integrate multiple differing viewpoints regarding how they should act – for a report which is titularly international, it merely claims that “it is hard to design widely accepted ways of balancing competing views”. It offers no concrete guidance on handling value conflicts across cultures and political systems, nor any proposals for initiatives or direction for advancing this area of research. It is critical that the values, normative principles and tacit assumptions which have been made by those designing and developing AI models are considered, because it is somebody’s conception of what it means to be a reliable, controlled system which is being aspired to. In future essays, I will delve further into the difficulties of encoding values into AI models, and the importance of being aware that our ideas of whether or not an AI model is reliable and controlled are not objective – stay tuned!
3. Responsibility attribution
Aside from a short exploration of existing legal liability frameworks, the report lacked a clear explanation of the difficulties regarding attributing responsibility when risks from malfunctions arise. Because decisions regarding how AI systems are designed, tested, approved and deployed are distributed across many actors, including data scientists, software engineers, domain experts, compliance specialists and project managers, accountability gaps are created, making it difficult to attribute blame when AI systems cause harm. Further nuance is added when AI agents cause harm because agents cannot be morally responsible themselves, and those that designed and deployed the agents did not directly cause the harm themselves. A more adequate future framework must go into more depth regarding how developers of AI systems are to be held responsible not only for harms themselves, but also for creating scenarios where users are vulnerable to harms due to systems being, by design, difficult to control and unreliable.
Conclusion
The International AI Safety Report 2026 offers a useful starting point for understanding how malfunctions in general‑purpose AI create risks through unreliability and potential loss of control. However, I hope that my critical reading has shown that a truly responsible approach must go further, foregrounding questions of who is harmed by and who benefits from AI systems, making explicit the values and normative assumptions that shape treatments of reliability and control, and confronting the complex challenge of attributing responsibility.
