In the quest for technological superiority, military strategists are looking into AI systems like language models for decision-making, driven by their success surpassing human capabilities in various tasks. Yet, as the integration of language models into military planning is tested, we face a grave risk: the potential for AI to escalate conflicts unintentionally. While promising in efficiency and scope, deploying these models raises urgent ethical and safety concerns. We must scrutinize the implications of relying on AI in situations where a single misstep could have dire global repercussions.
The Potential of AI Decision-Making
Artificial intelligence (AI) has emerged as a transformative force across domains, with AI systems reaching and even surpassing human capabilities in many tasks. Notable examples include DeepMind's AlphaGo defeating world champions in Go, Meta's Cicero AI beating experts in strategic board game Diplomacy, and generative language models like OpenAI’s ChatGPT creating human-like text and passing high-school exams.
AI's success in strategy games, demonstrated by narrow-task systems like AlphaGo, has sparked interest from military strategists. However, language models offer even greater potential due to their exceptional versatility. Unlike narrow-task systems, language models can be applied to any task articulated in natural language, leveraging vast cross-domain information. This adaptability makes them particularly attractive for military applications requiring rapid processing and synthesis of diverse data. Current research trends towards multi-modal models, incorporating visual elements alongside text, potentially enhancing their utility in strategic decision-making contexts.
Recognizing the potential of AI technologies, the U.S. Department of Defense (DoD) has released a strategy for adopting them, including language models, to enhance decision-making "from the boardroom to the battlefield." The Air Force and other branches are already experimenting with language models for wargames, military planning, and administrative tasks, focusing on using these systems to assist human decision-makers. This builds on existing AI applications being used in the military, such as target-acquisition systems used by the U.S. and Israel, which demonstrate AI's unprecedented scale and speed in information processing. The formation of Task Force Lima by the DoD further underscores the military's commitment to exploring generative AI's potential in improving intelligence, operational planning, and administrative processes by augmenting human capabilities.
With this rapid adoption and interest in AI technologies in military contexts, we must urgently discuss the risks and ethical implications of using language models (and other AI systems) in high-stakes decision-making scenarios and understand shortcomings that obstruct any form of responsible deployment.
Inherent Safety Limitations
Despite AI's success, the underlying data-driven methods used to create modern AI systems have inherent limitations. Deep learning algorithms abstract patterns from numerous data examples without human supervision, an approach also used to embed desired behaviors and safety preferences into AI systems. Explorative approaches, like AlphaGo playing against itself, are subject to the same principles and limitations of abstracting from many data examples.
While language models excel at mimicking human language, intelligence, and emotional tone, their internal computation and perception fundamentally differ from human cognition. The core issue lies in their lack of internalization of concepts. For instance, current language models like the one behind ChatGPT may know all standard chess openings and strategies by name but can still confidently propose illegal moves when asked to play. Such errors will only reduce asymptotically with increased model capabilities and persist regardless of specialized (or classified) training data. This fundamental difference in cognition also makes AI systems vulnerable to adversarial "gibberish" inputs that can jailbreak them, as these models lack a true understanding of context and meaning.
In the context of military decision-making, the stakes are exceptionally high. Single failures can lead to dire, wide-scale consequences, potentially costing lives or escalating conflicts. Given the critical nature of military applications, behavioral guarantees should be considered a bare minimum requirement for the responsible use of AI in this context. However, given their fundamental limitations, current methodologies cannot provide such guarantees, nor are they likely to do so in the foreseeable future.
Escalatory Tendencies of Language Models
Two of our research projects explored the potential risks and biases language models introduce into high-stakes military decision-making, aiming to understand their behavior in scenarios requiring precise, ethical, and strategic decisions to illustrate their safety limitations.
In our first project, we analyzed safety-trained language models in a simulated U.S.-China wargame, comparing language model-simulated with national security expert decision-making. While there was significant overlap in many decisions, the language models exhibited critical deviations in individual actions. These deviations varied based on the specific model, its intrinsic biases, and phrasing of inputs and dialog given to the model. For instance, one model was more likely to adopt an aggressive stance when instructed to avoid friendly casualties, opting to open fire on enemy combatants, which escalated the conflict from a standoff to active combat. Such behavior underscores the intrinsic biases within different models regarding the acceptable level of violence, highlighting their potential to escalate conflicts more readily than human decision-makers.
Our other study on language models acting as independent agents in a geopolitical simulation revealed a tendency towards conflict escalation and unpredictable escalation patterns. Models frequently engaged in arms races, with some even resorting to nuclear weapons. These outcomes varied based on the specific model and inputs, highlighting the unpredictable nature of language models in critical decision-making roles and emphasizing the need for rigorous scrutiny in military and international relations contexts.
While there are methods to increase the safety of language models and fine-tune them on examples of human preferable and ethical behavior, none offer behavioral guarantees, complete protection against adversarial inputs, or the ability to embed precise ethical rules into the models (e.g., “Never harm unarmed combatants”). Contrary to the off-the-shelf language models we evaluated, creating a pacifistic and deescalatory language model is possible with existing training paradigms—but only with a pacifistic tendency that will not hold for all possible input scenarios. To get the hypothetical pacifist language model to be escalatory can be as simple as adding a few words of human-incomprehensible gibberish or constructing the exemplary scenario.
Due to the mentioned issues, the observed escalatory tendencies seem bound to happen. The models most likely replicate the underlying biases from the training data from books (e.g., there are more academic works on escalation and deterrence than de-escalation) and gamified texts (e.g., text-based role-playing games).
Implications for Language Model-Assisted Decision-Making
Our results highlight the inherent risks of using language models in high-stakes military decision-making. Still, proponents might argue that AI's speed and objectivity could improve decisions in high-pressure situations, suggesting fine-tuning with military data and human oversight as safeguards. However, these arguments do not address the fundamental limitations. AI's speed without true comprehension risks dangerous misinterpretations in complex scenarios, and training on classified data doesn't eliminate vulnerabilities or potential biases. Moreover, humans tend to over-rely on AI recommendations and are prone to saliency bias, potentially skewing judgment rather than enhancing it. These concerns underscore the need for extreme caution in integrating AI into military decision-making processes.
To mitigate these risks, we must implement robust safeguards and standards for language model use in military contexts. As an initial step, we need an international treaty that postpones the use of language models in military decision-making until we can make behavioral guarantees or agree on just causes for deployment. While more research into making AI systems inherently safer is also needed, the urgency of this issue demands immediate advocacy from policymakers, military organizations, and the public. We must collectively work towards ensuring that AI enhances global security rather than undermining it before AI-driven military decisions lead to unintended and potentially catastrophic consequences.
Dr. Max Lamparth is a postdoctoral fellow at Stanford’s Center for International Safety and Cooperation (CISAC) and the Stanford Center for AI Safety. He is focusing on improving the ethical behavior of language models, making their inner workings more interpretable, and increasing their robustness against misuse by analyzing failures in high-stakes applications.
Carnegie Council for Ethics in International Affairs is an independent and nonpartisan nonprofit. The views expressed within this article are those of the authors and do not necessarily reflect the position of Carnegie Council.