During the Carnegie Ethics Accelerator’s November 8, 2023 convening on AI in diplomacy, participants co-created potential scenarios (within the next five years) from the use of AI for translation, research, and ideation tasks. The scenarios below are inspired by that working session. These scenarios were developed in conjunction with a communiqué: The Trade-offs of AI in Diplomacy.
Drivers are key technological, economic, societal, cultural (etc.) factors that increase the likelihood that the scenario will become a reality. Confidence scores reflect the estimated likelihood of a scenario. A score of 0 indicates that the scenario is impossible, a score of 0.5 suggests that it is just as likely to occur as it is to not occur, a score of 1 indicates that it is certain to occur.
Translation & Interpretation Scenarios
Commentary by Mucktarr Darboe MY
Scenario 1
Tess Baker, a diplomat from the Netherlands, is tasked with leading a multilateral negotiation aimed at finalizing a trade agreement between her country and Greece. During the negotiation, a large language model (LLM) is used to translate the discussion in real time. Due to several translation errors and miscategorizations caused by the system, confusion regarding the stipulations of the agreement spreads in the room. This leads to further debate and shakes stakeholders’ confidence in the process.
Drivers:
- Complex language and terminology
- Contextual and sociocultural sensitivities
- Technological limitations (real-time system constraints)
- Negotiation text ambiguity
- Lack of feedback loops
Comments:
Tess Baker's scenario highlights the challenges that can arise when using LLMs for real-time translation in multilateral negotiations. Complex language and terminology, context and cultural nuances, and the pressure of real-time constraints and system limitations are all barriers that contribute to potential errors and misunderstandings. Additionally, ambiguity in negotiation text and the lack of feedback loops for translation improvement has further complicated the process, leading to confusion among stakeholders. Despite this, with careful consideration of these barriers and proactive measures, such as using trained translators and providing contextual information to the translation system, the risks associated with real-time translation using LLMs can be mitigated.
Confidence Score: 0.75
Scenario 2
Selam Hailu is a young translator working at Canada’s diplomatic mission in Ethiopia. With a background in software engineering, he oversees the rollout of a project which uses LLMs for daily translation of thousands of documents of Ethiopian media, news, and intelligence for Canadian diplomats’ daily briefs. In testing, the system achieves 99.5 percent translation accuracy. Hailu’s team members, who are proficient in Amharic, Oromo, French, and English, then manually review outputs and correct the errors they find. According to a three-month study by an external audit firm, the new workflow frees up 30 percent more time and resources for Hailu’s team and increases overall mission performance.
Drivers:
- Background in software engineering
- Manual review by proficient team members
- External audit firm study
- Use of LLMs for daily translation
- Increased efficiency and resource allocation
Comments:
Hailu's software engineering background and expertise enables him to oversee the LLM project effectively, ensuring its seamless integration into the mission's workflow. The manual review by his proficient team members adds a critical layer of quality control, augmenting the accuracy of translations. The practice of external audit firm study provides objective evidence of the project's success, boosting the confidence of stakeholders. The decision to use LLMs for daily translation significantly improves efficiency and accuracy, as demonstrated by the rate of high translation accuracy achieved. This, coupled with the manual review process, results in a 30 percent increase in time and resources for Hailu's team, leading to overall improvements in mission performance.
Confidence Score: 0.85
Research Scenarios
Commentary by Eduardo Albrecht
Scenario 1
Two countries agree to engage in nuclear negotiations with the goal of systematically reducing their stockpiles. One of the countries deploys an LLM to summarize previous international negotiations and agreements and to predict possible outcomes of different proposed negotiation tactics. The model hallucinates, generating fictional information about tactics and those tactics’ causal relationships to successes from the Strategic Arms Limitation Talks (SALT I and II) and the Joint Comprehensive Plan of Action (JCPOA) negotiations. Those recommended tactics are employed, causing negotiations to fail.
Drivers:
- Automation bias, the tendency to trust computer output more than human judgement
- Careless, disjointed, and disorganized use of LLMs among diplomatic staff
- Poor understanding of why a model needs to hallucinate to be effective
Comments:
On driver three, it is important to understand that LLMs are purposefully built with the express intention to hallucinate. The idea is for them to mimic human behavior and creativity, and therefore they must have a propensity for imprecision. The closer we get to artificial general intelligence (AGI), the more this will be apparent. For example, we want AI to be able to write poetry. The burden will be on us to accept that. I do not see this scenario as likely, as there are plenty of safeguards before this may cause negotiations to fail.
Confidence Score: 0.25
Scenario 2
Multiple states in Oceania enter environmental negotiations with the aim of preventing biodiversity loss. Charlotte Wilson leads a research team that’s responsible for analyzing vast amounts of data and policy documentation on species populations, habitat loss, climate patterns, ecosystem changes over time, and environmental legislation. She uses an LLM to examine a dataset compiled and inspected by members of her team, which identifies a previously overlooked correlation between subsidizing environmental education and growing regional biodiversity. This insight provides the basis for a new agreement centered around educational reform.
Drivers:
- The emergence of domain-specific LLM tools that are specialized on certain types of knowledge areas; these could come from the academic community and/or the private sector
- The fusion of LLMs with other types of statistical computing and ML approaches that can analyze nontextual data (i.e. stats on biodiversity) alongside textual data
- Novel agentic AI interfaces that permit the interaction of non-data science experts with complex statistical systems
Comments:
To make this work, there are several moving parts that must come together. First, to identify meaningful correlation, LLMs need to become better honed on specific types of domain knowledge, that is comprehension of the concepts and theories in a specific field, like climate science in this case. Second, in order to mine correlations, LLMs will need to be integrated with other tools that are better at that task and that rely on structured numerical data for patterns extraction. Third, to be effective, LLMs will not be stand-alone solutions, but to tease out correlations like the one above, they will need to have a smooth interface with humans, as would be provided by a tailored AI agent.
Confidence Score: 0.95
For more on the ethics of emerging technology, subscribe to the Carnegie Ethics Newsletter
Ideation & Prediction Scenarios
Commentary by Pavlina Ittelson and Sorina Teleanu
Scenario 1
A federal government licenses an AI system to simulate the economic effects of policy actions. The model is trained on data from fiscal policy documents, trade and tariff policies, labor market directives, R&D briefs, and intellectual property patents. After conducting its analysis, the system returns a scenario by which tighter border restrictions lead to decreased unemployment and job growth. A jingoistic wing of the legislative body uses the simulation as the basis for enacting a new restrictive immigration policy just weeks before one of its neighboring countries experiences a refugee crisis. This leads to chaos at the border and reinforces growing tensions between states in the region.
Drivers:
- Federal government licensing AI system. The federal government sets the boundaries for the design, use, and implementation of the AI system in simulating the economic effects of policy actions. The scenario does not outline what economic effects of policy actions are to be simulated, or to what extent the policy actions are to be informed by the AI simulations.
- AI system. The current scenario does not include AI training on all data sets on employment, migration, skillsets available, etc. The AI system only accounts for national-level indicators and policies. The information is missing on the link between automation and prediction, correlation between border restrictions and employment, and the cross-border policy impacts.
- Jingoistic wing of the legislative body and its ability to implement restrictive immigration policy based on AI simulations.
Confidence Score: 0.5
Scenario 2
A multinational South American nonprofit has been using an AI system to forecast and predict potential storms and climate-related disasters. The system—supported by reprogrammed LLMs for time series forecasting—references satellites, weather sensors, and data from other monitoring devices to track weather conditions, rainfall, sea surface temperatures, cloud formations, and more. Using predictive analysis, the model issues a high-confidence prediction of flooding in heavily populated regions and generates a list of policy recommendations to mitigate the flood’s impact. The nonprofit compiles a report for local governments and briefs officials who come to an agreement that preparations should be undertaken. Collective resources are then allocated to address increased levels of climate refugee movement across borders. As a result of this agreement, loss of life and property during the weather event is minimized significantly.
Drivers:
- Multinational nonprofit based in South America. There is an unknown in terms of how this nonprofit operates on a multinational level and how it connects to the local government. The unknowns include legal, cultural, intergovernmental aspects of nonprofit operation that would have an impact on the outcome of this scenario.
- Relationship between government and other stakeholders. The scenario assumes a high level of trust between the local government and multinational nonprofit organizations. Additionally, it assumes that there is a channel to input nonprofit findings into policymaking procedures, to the extent that the government would implement policy changes based on the recommendations by nonprofit.
- Allocation and availability of financial and human resources. The scenario assumes that the financial and other resources to be allocated to combat flooding and related migration are available and that there is an overall international agreement on how such resources should be allocated.
Comments:
Assuming all enabling elements are in place (trust, resources, etc.), our confidence score would be closer to 1. AI surely can do that prediction, and if all other elements are in place, then the scenario is highly likely.
Confidence Score: 0.5