Responsible AI & the Ethical Trade-offs of Large Models, with Sara Hooker

About the Series

Can AI be deployed in ways that enhance equality, or will AI systems exacerbate existing structural inequalities and create new inequities? The Artificial Intelligence & Equality podcast seeks to understand the innumerable ways in which AI affects equality and international affairs.

In this episode of the Artificial Intelligence & Equality podcast, Senior Fellow Anja Kaspersen speaks with Sara Hooker, head of Cohere for AI, to discuss her pioneering work on model design, model bias, and data representation. She highlights the importance of understanding the ethical trade-offs involved in building and using large models and addresses some of the complexities and responsibilities of modern AI development.

Responsible AI & Ethical Trade-offs AIEI Spotify link

Responsible AI & Ethical Trade-offs AIEI Apple Podcast link

ANJA KASPERSEN: As artificial intelligence (AI) continues to reshape narratives and paradigms, I am pleased to welcome Sara Hooker. Sara is the founder, vice-president of research, and head of Cohere for AI, a research lab dedicated to solving complex machine learning (ML) problems and conducting fundamental research that explores the unknown. A link to Sara’s impressive bio, her significant body of research, and her podcast exploring the known and unknowns of machine learning can also be found in the transcript of the podcast.

Welcome, Sara. It is a pleasure to have this conversation with you.

SARA HOOKER: It’s lovely to be here.

ANJA KASPERSEN: Before we dive into the complexities—as you would say, the “known and unknowns of machine learning and AI”—and maybe dismantle a few misconceptions along the way, could you share what inspired you to enter this field? What pivotal moments or key influences brought you here and also to adopt this very specific outlook that I was just referring to?

SARA HOOKER: My career has always been driven by answering interesting questions. I am a computer scientist by training, but I would say a lot of even Cohere for AI is based on this idea of to be at the frontier of ideas.

Typically, you have to have different ways of looking at existing components, and Cohere for AI is a good example of that. We are a hybrid lab. We have a large industry lab that is very typical for how progress is made these days, especially building large language models (LLMs) where we have a lot of computer and full-time staff, but we also have a hybrid open-site component where we collaborate a lot with different experts across different fields. I have always been equally interested in what leads to breakthroughs and maybe what combinations of people and spaces lead to interesting ideas.

We released a multilingual model earlier this year that serves 101 languages, and that was almost entirely the product of almost 3000 researchers across the world working together. That is really interesting because that is something that is what I would call a very big science project where you have to have a common consensus amongst researchers that this is an important question.

ANJA KASPERSEN: You have written about what could lead to good research, and we will come back to that.

I want to start us off on a slightly different topic. I was reading a research paper you contributed to that came out last year about transparency, an area in which you have been quite influential and contributed in various capacities as a scientific advisor and reviewer to some of the government initiatives that are happening right now as well.

The paper you contributed to states: “Foundation models have rapidly permeated society, catalyzing a wave of generative AI applications spanning enterprise and consumer-facing contexts. While the societal impact of foundation models is growing, transparency is on the decline, mirroring the opacity that has plagued past digital technologies, for example, social media. Reversing this trend is essential.”

Can you talk more to this work and also what your personal experiences have been looking at the field changing, who the actors are, and this notion around how do we assess transparency, how do we measure it, how do we enforce it?

SARA HOOKER: That work was done with a lot of really great collaborators across different institutions. I would say it’s really interesting.

We have always wanted transparency out of algorithms. This is not unique to large models. I worked on this for a lot of my career, so even simple models that we have complete feature sets for, typically sometimes they do things that are unpredictable, and how do we understand better the transparency?

What has been made more difficult is both the volume of data that these models use to learn with as well as they are large models. Whenever you have larger models there are more combinations and it is harder to extract individual meaning.

What does that mean for society? I think that typically when we want interpretability we want it in a few ways: we want to be able to predict what could go wrong, and we want to retroactively understand what went wrong. Both of these are complicated by a lack of transparency. Sometimes you can overcome this idea of understanding retroactively what went wrong with enough historical data. So one way we try to overcome the lack of interpretability is we just have enough data over the years that we don’t have to precisely know how something works but we know what input leads to an output.

The real difficulty is also predicting what could go wrong. I think the last part of that gets at this in that phrase that you looked at. I build models, but models interact with a wider infrastructure, and there is this question of the model interpretability is one thing but it is also the dynamics of a model with a system.

You pointed out social media. Social media is a perfect example of this. My main concern right now frankly is things like misinformation because I build models which are very good, they are very performant, and they can produce generations which are indistinguishable from a human.

In isolation this is not a high-risk issue, but it is combined with a system that this is a very high-risk issue, and that is the dynamic that we have to understand better: How do models amplify existing problems and existing trends that we see, things like lack of traceability amongst different social platforms? I think that is really critical.

There transparency remains an issue mainly because we do not have good techniques for understanding what generation comes with what models. It is something we are working on. This is what we call watermarking, but so far it is very easy to evade. These are like Band-Aid stickers for the issue.

Just to frame this, watermarking is this idea that we can trace what output came from what model. Why it is exciting, and I think why a lot of people at that particular meeting were excited about it, was that it would be a way for you to be able to understand if something was not a real image if it was attached to a news story or it would be a way to understand if text was generated when it is talking about a person’s biography.

The difficulty is that these techniques are very immature. What this means is that they can be easily evaded, they can be broken, and that means that while there is a difference between what can raise the barriers slightly for misuse, so it might take out the people who are not intentionally doing harm, and techniques which will actually address large-scale actors who are intentionally doing this harm.

Definitely watermarking is promising for helping solve this problem for people who may be intentionally trying to harm. For example, if a teenager wants to generate a photo that maybe is a fake photo of someone they know, watermarking as it stands may catch that kind of behavior, where the person in mind does not really know to remove or evade the watermark, and it might already be quite good at catching some of this middle behavior.

But I think the behavior that people are most concerned about in these settings is typically large-scale malicious actors who want to do more aggressive campaigns, and watermarking for those types of sophisticated actors is not really a solution because it is very easy to break and evade.

ANJA KASPERSEN: That is interesting, Sara. I attended the AI for Good Summit, which happened a couple of weeks ago, and there were some interesting thoughts around notions of content provenance, watermarking, what do we do; discussions around what is mutable data, do we want to make sure it is immutable data, especially as we use technologies to help us demonstrate and prove the provenance.

And then, of course, we had an interview with Geoff Hinton during the summit, which was also quite interesting because he went down the pathway of: Do we need a certain amount of disinformation to inoculate minds to be able to be more resilience to identify what is disinformation or misinformation in this case?” What are your thoughts about this?

SARA HOOKER: Geoffrey Hinton is always very fun with this. He is coming to it with the view of a computer scientist. To some degree I think this is a super-interesting question: How sensitive are we to this type of misinformation? If you anchor it with a bad example, do we become more calibrated for what it looks like to have bad misinformation?

The tricky thing is that misinformation is rarely just the content itself. It is also really how the content is shared. We tend to attach more information to people we trust who share information with us.

It also depends on the medium. There are very interesting examples that do not involve AI but are very interesting examples of people being creative where voters receive phone calls about the changing location of where they vote. This does not involve currently any type of generated script, but you could understand that in that case it is not just the information; it is the fact that someone is calling from a position of authority.

To be cheeky like Geoffrey Hinton, just seeing bad information is not going to solve those issues because we are actually very tolerant of bad examples of misinformation if it comes from a trusted source. So it is not just a quality issue, but is still an important step in this puzzle of understanding firstly; How good are our current models and how calibrated are they; can we understand when we see misinformation?

It is still a valid experiment, it is just not the whole piece of the puzzle, but I am glad he is throwing out ideas that are making people think and have divisive opinions about because that is one thing that I think is important about this moment. At least misinformation is getting much more momentum around it and there is a real consensus that there has to be some active dialogue around this along with some protocols to try to mitigate it just because misinformation has existed for a while, but the urgency now is in the lowering of thresholds to participate.

With regard to data provenance, it is interesting. So what do we mean by “mutable” and “immutable?” I think this idea of mutable and immutable, this idea that certain information should be fixed and other information should be allowed to change, is very difficult for our current generation of models because typically we have these large models and they train on all the data up until a certain point, and then when they don’t know an answer they typically interpolate between different previous historical data points, which means that sometimes we get amazing creativity, and this is why we love the models and love using them; but the flip side of that is that sometimes they are not anchored to the historical record if they don’t cover the information in the training data or if time changes, as it does, and they might not update, and certainly they do not unless they have something like retrieval augmentation techniques, which is a new technique where you have a combination of a large language model that is trained on historical data and then you have a database that you connect.

To be concrete, what it means is, let’s say that I decide to change my name in the future for whatever reason. If a model has been trained on my historical data, if you ask it based on the new name it will not associate it with my old name and it won’t bring up my biography, so this is a very good example.

There are certain societal reasons where we want data to be fixed. For example, the details of your birth certificate, things like that, those are fixed; they do not change over time. These models still have a significant gap with that.

For me, this problem I would say in terms of how it is approached is almost also a question of: When do you use models for certain things and when do you use existing technology for certain things? To be a bit cheeky, even though I build these models, I don’t think these models should be used for certain things.

I think there are repositories of truth that we need for certain aspects, and definitely over time these models are getting much more nimble to update/ Retrieval augmented generation is a good example where you can add in a formal database and combine it with a model. This means it can have some access to up-to-date information, but it is not there yet.

So we should be realistic. Sometimes what gets conflated in these conversations is a sense that we are all going to be using a large language model or a large multimodal model no matter what, but I think it is important to understand that as a society we can pick and choose the use cases. We have very good existing technology for some use cases and we have this for new.

The real conundrum comes when so many people are just using large models for everything that it is not possible to steer them to another source when the information is sensitive, and that is a legitimate concern. Will someone no matter what ask a large language model for my bio and then have the incorrect linkage? This is valid.

Where would I place an ordering of things? I think in some ways the progress on making sure models are up to date is moving quite quickly, but also we will probably enforce certain hard constraints on these models depending on regulation. So I see it as a very solvable constraint, but I understand why it is a concern and it is top of mind.

ANJA KASPERSEN: We are going to get to the hardware issue in a second because you have written a lot about it and I think this is probably one of the most underappreciated and under-communicated aspects of AI.

But following up on what you just said, there is a lot of terminology, and you have certainly been a very strong advocate for the need for a more shared nomenclature of what we are actually talking about. There are so many different interpretations of the same things which makes measuring any form of robustness or reliability in systems quite difficult, but there is also a lot of confusion of what are we actually talking about with these large language models.

I mentioned the interview with Geoff Hinton, and he was talking about how we came to this point and his realization not that many years ago that, rather than trying to mimic and emulate the human brain, to move into something entirely different, and that is what actually allowed us to make these breakthroughs that are now the foundation for these new models.

For the benefit of our listeners, could you talk a little bit about what are these models and the techniques that are currently deployed?

SARA HOOKER: It is interesting. Maybe I will frame it in terms of what perspective Geoff Hinton is probably coming from. Geoffrey Hinton worked a lot on this early idea of not having symbolic systems. Symbolic systems basically are rule-based, so they will say, “Oh, if we both find ourselves in this audio recording today, that means that we likely know each other in some way.” So it was very codified knowledge, it was a way for a model to traverse a body of knowledge, but it has to be specified how.

Geoffrey Hinton worked for many decades on this idea that, instead of specifying everything, we teach the model how to learn, so we basically let the model decide what is important and extract meaning from different features. This is a profound shift because it means that we give up the specification. We talked about transparency earlier. We have to relinquish transparency for this because we are in some ways trusting the model to learn. What happened with this latest shift? There are a few things.

One is that we scaled our models a lot. We did stick with dense passage retriever (DPR) networks, all these modern transformer networks are also DPR networks, so it is still trusting the model to learn, but we did two big bets and then a few optimization things.

One is we said: “Let’s see. Let’s go bigger. Let’s see how we do once we scale our models.” It turns out this was very important for extracting more meaning from language.

But also we took a bet which at the time was quite controversial. For a long time, the whole philosophy of machine learning has been that you train on something that is similar to the distribution you want to model when you deploy your model. This is the idea that you have to train on data that is representative of what you actually want to do.

That is totally different from the Internet. In fact, most of what we want to use these models for is quite curated; it is, “Give me an itinerary.” The internet is huge and it is full of a lot of bad-quality data. For a long time there was, firstly, not a lot of optimism that you could have high-quality models from something like the internet; and, secondly, there was not a lot of optimism around unsupervised learning techniques.

What do I mean by this? I mean extracting knowledge when it is not labeled. A lot of the computer vision era focused on label for everything—“This is the cat; this is the dog”—and when you learn from the internet you are just trying to predict the next word; so you create a fake label, but it did not exist really, so it is an unsupervised learning technique.

This turned out to be probably one of the most interesting developments that does not get talked about today, which is that in 2015 people started training on larger and larger parts of the internet in an unsupervised way, and then with the transformer in 2017 this got combined with a powerful new architecture.

These are the key core trends—more data, more scale. The big optimization trends that are really interesting to think about are: (1) instruction fine-tuned data, which was this post training, so you have this instruction fine-tuned; and then (2) there are the preferences, so aligning these models with viewer preferences. This combination is what led to these models being more performant.

ANJA KASPERSEN: You wrote an excellent paper a few years ago that I alluded to earlier addressing what I believe is one of the most under-communicated and perhaps underappreciated aspects of AI, notably that of hardware. You titled the paper, “The Hardware Lottery.” In it you discuss the trifecta needed to advance in any AI field—compute, reliable data sources to underpin models, and of course good algorithms.

You write: “Scientific progress often occurs when a confluence of factors allows scientists to overcome the stickiness of the existing paradigm. The speed at which paradigm shifts have occurred in AI research has been disproportionately determined by the alignment between hardware, software, and algorithms.”

Can you tell us more about how this alignment in your view – or the lack thereof—has historically influenced AI development?

SARA HOOKER: This idea of the hardware lottery is based on the fact that our field has mainly developed when there has been hardware that is compatible with the algorithm. The best example, of course, is deep neural networks.

I alluded to this when I talked about symbolic approaches dominating for so long. Symbolic approaches came in after World War II, and we had several AI winters where there was not much progress or funding, and it was only with hardware basically 50 years later—graphics processing units (GPUs) were starting to be repurposed for machine-learning workloads. Now of course we all know about GPUs and we talk a lot about Nvidia and the type of GPU and things like that, but at the time GPUs were used only for video games; they were graphic card accelerators, and they were used for essentially a lot of the game engines. They needed these more powerful matrix multipliers.

It was only in the 2000s that we started to slowly repurpose these GPUs, and it was pretty painful for about ten years making them work from machine-learning workloads, and typically it happened from very small research groups who took the time to adapt it.

This turned out to be crucial, so in 2012, when we had convolutional neural networks, it was GPUs that allowed for the empirical success of those networks, and overnight everyone switched to GPUs, so it was really profound. In 2012 what happened was GPUs allowed for both deeper networks, more depth, which turned out to be crucial, as well as better distribution. These two factors are what led to really the first big wave of neural networks, and that was just a bit over a decade ago.

Hardware now is just as critical because we are leaning more and more into certain types of hardware. Because we have had the success of neural networks, basically everything has become just a neural network accelerator. What that means is that even for something like Nvidia all their chips have implicitly been trying to accelerate matrix multipliers, which is what dominates neural networks.

What does this mean for the next big idea? It means that we are likely throttling the next big idea. It is very difficult to stray from the hardware path if hardware does not work well for your idea. So it means that as we get toward domain-specialized hardware and more and more specialized hardware for a certain type of idea it becomes harder to prove the validity of the next thing.

This is important because typically hardware bets are not well placed in the private sector, they are too risky, so for future innovation governments are going to have to have some fraction of resources dedicated to hardware space discovery for the next generation of models. It takes political stamina to decide this is important, but it is critical because so much of it, even with something like quantum, depends on hardware. Jurisdictions that are very excited by quantum, which is going to be a leap in the available compute power, need to also back hardware development, and this is very interesting to think about.

ANJA KASPERSEN: There is a lot of talk about the AI “race.” In your view it is actually less than a race and more of a lottery.

SARA HOOKER: It is both. The race actually amplifies the lottery because the race has resulted in hardware which is more and more particular, that there is one type of machine-learning workload. But, for anyone who has worked in machine learning and anyone who has serious experience, no one would say, “Oh, transformer is the architecture we are going to be using for the rest of this field.” There are sharp limitations to transformers and there are clear things that could be done better.

The race has compounded this issue of putting all the chips on one part of the table, and really that is the hardware lottery. What it means is that we are missing out on the next big idea.

ANJA KASPERSEN: You stated in the same paper—and this really resonated with me because I think it is something that is common in most fields now, not just in the machine learning community but also in policy communities—that: “Machine learning researchers do not spend much time talking about how hardware chooses, which ideas succeed and which fail. This is primarily because it is hard to quantify the cost of being concerned.” What do you mean by this phrase?

SARA HOOKER: I mean that it is very hard to articulate the counterfactual. For example, we actually see this right now in the wider AI discourse where people who are very concerned about the future—you mentioned Geoffrey Hinton, and I think he is a good example of a researcher who is very concerned about a type of long-term risk.

There are other researchers, like myself, who have articulated: “Well, we currently have a lot of issues with how models are currently deployed. We can talk about long-term risks, but it is hard to articulate. What is the probability that we should attach to it? Where should we place emphasis amongst all these potential problems? That is one good example of the cost of being concerned with hardware.”

The main difficulty of justifying to a government or a cross-institutional organization that you need to care about a different form of hardware is that it is hard to convince someone to deviate from the path when it is hard to articulate, “Well, what is the sequence of choices we need to unlock the next wave of innovation?”

This is traditionally the difficulty with science. It is why some of the most productive periods in science come from, really unfortunately, places of great urgency, national urgency, where scientists are given blank checks and few restrictions, and that typically leads to a lot of very interesting innovations.

I think COVID-19 is a great example of this. There was a huge amount of resources was given to researchers to work on vaccine development and you saw a huge amount of innovation come out of that. The same was true for the moon race; that was fueled by very nationalistic concerns, but because of the blank check it had a lot of ripple effects for many different areas of science that were not specifically about the moon race, but it trickled down.

What I mean by that is that typically profound changes in science come from having arenas that have some constraints, like an urgency to make progress, but a freedom in how you get there. That is what is interesting. That allows for much more willingness to explore the counterfactual.

Whenever science is quite rigid or we see it as, “This is the only thing we can do and we can’t have a risky portfolio,” you tend to not explore the counterfactual, you only exploit what has already been proven out, so you cannot really measure what could have been, and that is the difficulty.

ANJA KASPERSEN: I heard you talk in a podcast about the difficulties of actually measuring robustness, measuring the reliability of the system, and measuring its output because—again this issue of nomenclature—we do not have a shared understanding of what we are talking about so it becomes quite difficult.

SARA HOOKER: This idea of momentum is very interesting. It comes in many forms. Typically the main drivers of momentum on who builds a technology are the huge cost to entry. We now have the massive compute infrastructure that is geared around these models. I work a lot on efficiency. I think efficiency is very important for these models in terms of how we make them smaller, but undoubtedly this decreases the points of entry. The other thing is the specialization needed to build these models.

I think you are talking about something which is equally important and interesting, which is this idea of: How do we frame how we want as a society to interact with these types of models and what do we want out of this conversation?

There I see two separate trends. I see everyone racing to be the first to define. This brings its own issues because it means that you have a plurality of different views and different considerations. I understand the urge to be the first because it becomes the default framing of the conversation. This is quite a powerful thing. It means that when you engage with something you have to engage with it through a sort of framing. I understand that urge.

I suspect regardless that this is a valuable moment. I think often about climate researchers who have been working for decades and have shown exhaustively the impact of climate change and they never get access to policymakers or to these mechanisms of change in such a fundamental way. And then AI researchers build this model and in some ways it is a profound moment to take a wider consideration of not just these types of models but how we want AI and algorithms interacting in society.

I think we would be wrong to miss that opportunity because it is rare that there is a consensus amongst so many different levels of change makers, like policymakers and citizens, to reflect for a moment on how we want these algorithms to interact with us. That is something that other very deserving research communities never really get. Climate researchers have to articulate consistently all the evidence and all the reasons why climate change should be paid attention to, and still they don’t get the type of momentum that currently is placed on AI policy.

In some ways I see it as a good thing. It is just quite messy right now. There are a lot of different frameworks and there are a lot of different groups that all want to have a say. This is overall a good symptom of the amount of energy at the moment we are in, the opportunity to do something beneficial for people. I generally am a fan of a much wider view: When you have momentum for something like this we should not just make it specific about these models but have a wider view of how AI should interact in our society.

ANJA KASPERSEN: A lot of work has been around the hardware issue that we just talked about. Of course there is a different component to the hardware, which is the energy access because these are highly energy intensive computational systems to run. So there has been a lot of concern that I know you have been involved with around the environmental impacts of building and running these big systems and how to mitigate against the negative impacts and how to measure them, so going back to the measurement side of things again.

Where do you see that discussion evolving? I know some of your work is around making smaller, reliable models that are less compute-intensive, but of course who has the energy and access to energy is also becoming quite a defining parameter of who holds the power in this space, because fundamentally AI has become almost a synonym for a new class of power—who holds it, who weighs it, who wields it, and who is accountable for it?

SARA HOOKER: I do think this question of energy is profound. It is also very related to the question of hardware. These are two separate trends.

In the medium term I would say the most pronounced trend is the hardware trend—who has access to hardware—especially when hardware is increasingly seen as strategic and some countries are buying vast inventories of hardware. There is a super-interesting dynamic there of what countries have national policies and have clear compute agendas, what countries don’t and who has access to those GPUs. This is fascinating.

The energy question is a bit more medium- to long-term, but it is equally important. Right now we train with a lot of energy—that is one component; how can we make the models smaller?—but the real energy cost is in deployment because every time you release a model at scale and billions of people end up using it that is a lot of energy, that is a massive AI workload, and if it is continuously used that is a lot of energy.

Right now it is in no way comparable to the energy from agriculture or the energy from flying, for example, but I think the valid conversation that is being had is that it is expected to grow considerably as AI becomes a much bigger component of everybody’s workload.

I work a lot on how we make these models more efficient. Also, increasingly I am very interested in how we communicate energy costs to consumers. I think in the same way when you look up flights or when you have the chance to choose a product there should be some communication of a common unit of energy that was used or an estimation of how much energy it will be. I am interested in the same for models: Can we try to convey to users that this model was made with this much energy and this is the performance, or you could choose this one and this is the performance?

It is at least a start it. Is not perfect, the same way estimation of carbon emissions is not perfect for airplane flights, but it is a starting point for people to gauge this as part of their consumer purchase. That is something I am interested in working on. I am working right now with collaborators on whether we can communicate this in a better way to end-users.

ANJA KASPERSEN: It is the work that is ongoing on creating something similar to nutrition labels, model cards for AI systems, which I know Meg Mitchell and others have been working on.

SARA HOOKER: Yes. This is with Sasha and Meg, and there are a few people involved across different institutions, but this one is very specific to energy: How do we communicate compactly so that users are aware when they make choices about models?

The first round will be imperfect, but it is an important part of starting to convey to a general public some of the costs involved in making these models because ultimately more and more people are going to use this as part of their daily workflow.

ANJA KASPERSEN: Which brings me to another important question that you have been working on, that of bias and toxicity in data, and that there are so many ways that these systems have the potential to be used nefariously or used to exacerbate certain patterns in societies that are already troubling and exacerbate inequalities that may already be shaping the human narrative, and how to prevent that from happening, and how do you protect data, how do you assess it. Can you talk more to this?

SARA HOOKER: There are a few parts. And, by the way, I do think it is important we consider phobia of risk. There is this notion, especially when you are not sure about introducing a new technology, and it is good that some people are concerned about existential risks and are doing important work there. My main concern is that it has just dominated the conversation and that maybe we are placing too many resources there. That is my main concern.

But it is not that we should not have any resources there because it’s like earthquake study is a very important field of research, but we don’t have as many studying earthquakes as we do maybe people who are doing other types of core biology or medical sciences. It is very interesting. As a society we should have coverage for all risks, but my main grumpiness is that we have really overfit certain types of risks.

When you think about the threat model, I am not quite clear what it will look like from the past. My favorite example of this is bio risk: There is considerable discussion around bio risk, but if you think about bio risk in terms of LLMs, you have to stretch to understand what is a viable threat model because of various aspects. Most of bio risk is in access to labs and it is also in access to supplies and to technical knowledge. It is not in access to a particular combination of how to create a bio threat. That knowledge exists. It is comparable on the internet and some of these language models, but bio risk has dominated the conversation. It is very possible maybe in the future that models will get better than the internet at surfacing the problematic bio threats, but the majority of the risks is still then: How do you actually make this feasible in a lab and how do you get the supplies?

So, for me, when we talk about where you prioritize risk, I always think, Well, what is a viable threat model? It is very surreal to be a researcher who has worked on this and now these models are deployed and being used everywhere. So how do I grapple with threat models? Where should we spend our time?

I find much more convincing that there are certain threat models about how our current models can cause very clear harm every day just by things like the lack of representation, lack of access, and the propensity for misinformation. A lot of my work on privacy I’m mentioning is actually understanding how our models and our choices about model design can amplify or mitigate some of these issues.

For a long time there was this reluctance in my field to even care about data. It was impossible to get researchers to look at data. The common expression was: “Oh, algorithmic bias is in the data; we can’t do anything about it because we only control the model; it is in the data.” This was a very common expression for a lot of my early career, and it was hard to get people to understand that they equally have choices about how they reflect the data in the algorithm.

What I mean by that is that even though the data sometimes—especially with much larger data sets we don’t have a good understanding of what is in it and there may be many different patterns—remember it reflects humans on the internet, it does not reflect the humanity that we want to be, so there are very big gaps there. This is a very big difference.

But we also have a lot of control over how we choose to represent that data and where we decide that the model should be expressive and where we constrain the model. This is one of my big things, that this is not just about the data, it is about the model and it is about how we design a combination. Now we call them “guardrails,” but when I first started working on this, it was this idea of: How do we vary the expressivity of the model so that it learns the things that we think are important but also does not learn certain parts of distribution? This is very important, and this is now a growing shift, which I am quite proud of, where people now recognize that these algorithms are equally important for how the bias is represented.

ANJA KASPERSEN: When you talk about data bias versus model bias, it has a lot to do with the contextuality of bad information. Do I understand correctly?

SARA HOOKER: It does, but it also concretely has to do with how you choose to represent it. Do you use a massive model? If you use a large model, it has benefits, like it can learn very rare parts of your distribution, which means that sometimes for underrepresented groups you serve them better; so typically when you have a larger model you do better at different languages, for example.

But there is a trade-off. It also means that there is a tension where you may be learning much more problematic toxicity patterns. This is something we have noticed: the larger you go, the more toxicity patterns you might learn. Understanding that tradeoff and understanding how you can constrain the model to maybe avoid some of these things is super-important, and I think that is what sometimes gets missed, that it is not just the data but how we represent it and how we choose the models that we train.

ANJA KASPERSEN: Which brings me to this issue which I know has also been something you have been focusing on in building these models, which is that of interpretability. Most conferences I have been to lately on AI—interestingly enough, it is an old issue, but it has come up much more in terms of what are the next advances and where are we going with these large models. What are your thoughts on this?

SARA HOOKER: I hate to dash a good dream, but I think it is very difficult. Interpretability is like a shiny gem from afar, and then you get into it and it is like a swamp, I will just put it that way. What I mean by that is that we all want it, but if we ask everyone what that means to them it means something different. It is very context-specific.

Also, remember that oftentimes for a long time—for example, after the right to explanation in Europe—there was a whole wave of funding for work that was meant to give you an individual explanation, your right to understand why a model made a prediction for an individual explanation.

Typically these methods failed—I don’t want to be too grumpy because it was interesting—and many of the techniques that emerged were unreliable because it was a difficult problem. You are essentially trying to train a model which is being trained without any interpretability constraints and then afterward you are trying to layer on this requirement that you explain a model decision for a single example, even though it was not trained on that example alone but on many millions and billions of data points. So with these issues you would end up with some type of saliency heat map or something like that, but it would not make much actionable sense.

What I think people sometimes miss about interpretability is that really for humans what matters at the end of the day is actionable interpretability. Typically where these models fail is that they fulfill the letter of the requirement, like providing some type of explanation for your prediction, but it is not actionable, and largely that is because a lot of how we actually arrive at interpretability is relative.

For example, during COVID-19 we all accepted staying at home with restrictions because we knew it was uniformly placed on us as a society. If I had just been told to stay at home and I did not have a relative understanding that this was a shared burden that we were all getting through together, I would not have felt that that was interpretable and I would not have felt that it was fair. So it has to be actionable and it has to be a relative understanding.

In terms of how this applies to current models, I think we need to be much more concrete about what we actually expect as an actionable output of some of these interpretability tools because otherwise they end up being very pretty but not very useful. I think that is one of the main issues where they struggle to be adopted naturally as tools.

You can have a revolutionary model, but if it is not useful to people it is never going to be revolutionary, and the same way with interpretability tools. We can talk a lot about interpretability, but unless we create an interpretability tool that feels actionable, it can be enforced through regulation but it won’t be adopted intuitively. I think that is the gap I sometimes see with discussions of interpretability.

ANJA KASPERSEN: Which relates to the discussions on governance, because what I am hearing from you is that you can do all the acrobatics you want at the end stages of it, but it fundamentally is not going to help you to get where you need to be. It has to be at the outset of designing the model itself.

SARA HOOKER: Yes. I think people miss this. We are always shocked that these models are not interpretable, but the truth is there is no reason these models should be. None of the objectives that we use in training model deep neural networks have a constraint that demands interpretability.

It is the opposite. We have moved away from symbolic approaches. We are training models just to minimize loss and we don’t care how they do it. The fundamental disconnect is that after training them we spend all these acrobatics trying to get back interpretability, but it is almost like your degrees of freedom by the time you get to that point are quite narrow.

I think there are two options. One is that I do not discount that there are relative techniques that can give you a good intuition. A lot of my work now on interpretability is: How do we surface parts of the distribution that the model is not good at? That is important because that way you get much more of a relative sense of the distribution.

There is another option. I think typically the way we build trust with anything, any tool that we use, is we see some early adopters use it, for example, driverless cars right now. I have been given access. I have not taken a driverless car yet. I am looking out and seeing people I know who are using them. Typically after a few times you get a sense of the behavior, then you yourself adopt it.

Another way that we don’t talk about as much to build interpretability is repeat use. The difficulty with this is that it does not account for the extremes of behavior. It gives you trust in the middle, the most common behavior patterns.

So there is a valid question of: What are our tools for understanding the most extreme issues; and does that always have to be retroactive or can we have some proportionary interpretability tools? Because I am in so deeply with interpretability, maybe I am not as rosy-cheeked as some of the people you talked to previously, but it is a formidable problem.

ANJA KASPERSEN: I think you are making a very strong point. We treat AI as a monolith, and it is all these different things that you have been talking about, so our governance of it has to be equally adaptive. You speak about adaptive learning. Governance also needs to be adaptive to respond to all of these different facets of what machine learning represents in the different applications, but also that we are so focused on the median that we are not sufficiently picking up on the anomalies and the outliers.

To me one very difficult concept—and I often get pushback on this—is the use of the word “trustworthy.” With my background, I am deeply uncomfortable with trustworthy being used about machine-based computational systems. I think for folks who are not going to know, although these are common terms used in engineering and technical fields—interoperability, interruptability, and interpretability—but in the policy and society domain it does almost kind of put the onus of focused responsibility onto the user to have the necessary technical knowledge to know that there are these anomalies and outliers and that they should not dominant your experience with the system that is in question.

So I often use words like “reliable” and “robust” because those are metrics you can actually test, but once you embed trust into it and especially because we are not really addressing those anomalies and those outliers—you build these models. What are your thoughts on this? I know you also run this big initiative around trustworthy machine learning, so I am sure it is something that you have thought very deeply about.

SARA HOOKER: That is funny. I actually do agree that “trustworthy” is like “data for good.” What does it mean? I know you were just at AI for Good, but it is a similarly ambiguous term. What does it imply? Does it imply that some techniques are not for good? Does it imply that some techniques are not trustworthy? What does that actually say about the criteria of becoming prepared or becoming trustworthy?

ANJA KASPERSEN: And also get caught up in these binary propositions—good/evil, safe/unsafe.

SARA HOOKER: Yes. And it is rarely a single finish line where you actually sign off on something as trustworthy—“Okay, we’re done; it’s trustworthy.”

That is not how technology works. I share the thought that this is a catchall term. Sometimes it is good for getting people in the same place. I think it is okay that the conference was called AI for Good, that “trustworthy” now is a way to get people who care about this term in the same place. But, yes, as a technical term to measure progress, I find it deeply unsatisfying, and I completely agree with your concerns about it as well.

I think about this a lot. I just worked on this project which is about robots.txt. Few people know about it. Robots.txt is a file that goes on every website. It is a gentlemen’s agreement. When the internet first started you had a robots.txt file that could tell people, “Don’t scrape my website.”

We did this fascinating study—we are going to be releasing it soon—which shows how the use of robots.txt has changed over time. What we have noticed is that a lot more people are using robots.txt. That is because I think implicitly there is a changing anxiety around the notion of consent for data. People want to express, they don’t want their data scraped, so more and more websites are using robots.txt.

Here is the difficulty with robots.txt, and it gets to the issue that you are talking about. Robots.txt puts a lot of pressure on the user. First, they have to know that it exists; second, they have to specify every bot that they don’t want. Most companies have a very specific bot. Google has a certain bot.

Cohere has a certain bot that we use, and we always respect; if someone has robots.txt we don’t scrape. This is an important part of our commitment. We need ways that people can signal consent to us about should their data be used.

But that structure of expecting a user to know what the bots are called and expecting a user to know that robots.txt exists is a perfect example of what you described with trustworthy ML putting all the pressure on the user.

We need more viable consent systems because the notion of how people use the internet has changed, how people want to have agency over their data has changed, so for me that is a great example of another shortcoming of how we think about concepts like trustworthy. The burden cannot be placed on the user to arrive at trustworthiness. It also cannot be placed on the user to know how to ensure privacy. There need to be tools somewhere in between. That is much more interesting to me.

Also we need more precise terms. You have seen my papers. I am very much a fan of reliability, understanding, and anchoring what is our definition because the key thing is it is very hard to make progress unless we have shared vocabulary.

Otherwise I now see this increasingly I am sure—maybe you saw it last week—people talk about “agents” endlessly. What does an “agent” mean, and what does it mean to you and me, and what is our actual threat model? I think this is very important because otherwise we could be talking about it all day and it does not really have a tangible way to mitigate that model or to understand progress on it.

ANJA KASPERSEN: Of course, one of the most talked-about terms these days is the notion of “reasoning.” I have heard you speak about it a few times. But what are we actually seeing these models doing, because obviously they are not trained to reason, they do not represent anything that resembles our human ways of looking at consciousness, intelligence, and these things, but they represent something entirely different. That requires us to engage with it differently. There is the realization that language is so representative to how our mental formations shape up, and I think that has also been very difficult for people to grapple with.

SARA HOOKER: I share the view that this is intertwined with society. It is the same view that technology—you can build very beautiful models, but it is not the models or how we measure them in isolation that matters; it is how people end up using them.

I grew up in Africa. It is very interesting. Even with something like Uber in Africa you immediately see the difference in how people use technology. I was in Ghana when I was with Google. We were starting an engineering lab there, and I remember I would call Ubers in Ghana, and Uber drivers did not want to be paid by card. They just don’t want to. They don’t want the interruption to their earnings because they would have to wait until the end of the week to go to the Uber office to pick up their earnings. So they would just let you wait. They didn’t want to cancel because it would impact their score. It took me ages to figure this out. It took me a whole month. I was like, “Why does everyone wait me out and then I have to cancel on everyone?”

This is a great example of how technology has to interact with the people it serves. The idea that you apply just the same interface in different parts of the world and expect it to work the same way—you are always going to see sharp crests, but it also means that over time it changes behavior on both sides. This gets to the point that the way that it interacts with us is a reflection of how we coexist with each other but also our cultures around current institutions.

Uber drivers in Ghana did not want to do cards because earnings there and the regularity of earnings are much more critical; they don’t want to wait until the end of the week to get their earnings from the Uber office. Also, perhaps there is a skepticism about delayed earnings or being dependent on a third-party payer when you can just earn because you just gave a ride. These are dynamics that we have to understand because the dynamics inform how technology is used but also how technology evolves.

I see the opposite trend in some cases, like even with image systems. One of my main concerns is that when image systems first were released you would see all of these creative use cases of really interesting art and super-fascinating things, aspects of how people played with them. Part of it was that the models were quite bad at the time. When DALL-E was released, it was bad enough that you had to use your creativity to work around it; but as image models got better and better everything became like pinup photos and things like that.

You see this regression. It is like the more we are given in terms of tools and technology the less creative we become, and it just becomes this—if you ever look at some of the ways people put these images together, there is less sense that people are using these for creativity and maybe there are maybe more concerns about how women are portrayed or how these tools are being used in ways that may disproportionately impact the way that women are viewed, or even things like recently issues with child sexual abuse material (CSAM). There is this real understanding that we have to be aware of the ways that technology reflects us as our humanity but also the extremes. This is important too.

I think diversity approaches are beneficial. There is a lot we don’t know about modeling languages. Here’s the thing: Basically all optimization techniques have been set around English and Chinese. These are the only two languages that are served well right now by models, and that is largely because of very strategic bets that we want the best models in English and the best models in Chinese.

This is kind of absurd when you think about it because the truth is if we consider this as a powerful technology—and we are probably talking today in part because we both think this is a powerful technology—it has to serve more than two languages. A lot of my work on multilingual is this belief that it also is core to the field of machine learning, that this idea of how we adapt these massive models to fit new distributions and to fit patterns that are changing over time is really important.

I think a lot of it is let’s create these models that are better used for different types of scripts, better used for different ways that people want to talk; because it is not just languages, it is also local preferences. For example, if a model is only trained on U.S. data, it is probably not very good in the United Kingdom, Kenya, or India because it is not really serving local preferences, even though these are all to some degree English-speaking countries.

I think it is a very good endeavor in part because the biggest difficulty with training these models is expertise. You have to have a huge degree of expertise as there are very few people in the world who know how to train these models. You may think of it as just building a language model for your country, but it is also investing in your technical infrastructure. That is why I think it’s okay. I think the diversity approach is meaningful and that we will learn a lot through this.

I already see some governments taking a very proactive approach here. Singapore has AI Singapore. They have been focused on a regional model that not only connects them with building up their own technical expertise but connects them with their neighbors and also makes sure that they are collaborating and taking in a pool. So it can be quite strategic for a government to provide subsidies and support for companies that are doing this.

ANJA KASPERSEN: Because you actually host a podcast, which I really like the name of,Underrated Issues in Machine Learning, my last question to you is: What in your view are the underrated and perhaps underappreciated issues in machine learning, and are there any of them that give you hope?

SARA HOOKER: There are a few.

One that I care about a lot—I mentioned I work on efficiency—is how do we avoid going bigger? Why I call it underrated is that the easiest thing to do is to keep growing bigger because it is the most predictable way to get performance. But the core of the problem is: How do we do more with less? This for me is super-interesting. Part of it is optimization at the hardware level, which is even more underrated.

The other thing that I feel is super-important is: How do we create new spaces for research? Most of the breakthroughs—why we talked about Geoffrey Hinton on this channel, why we talked about a lot of these events happening and dynamics involving the United States, Europe, and China—are because of a strategic bet to build up resources there.

I think a lot about how we can create different patterns of collaboration that can help bridge the gap because even in Europe there are only a few key places where there is sufficient technical talent to build these models. That is a gap. We should not be restricted in our ability to make breakthroughs depending on where you are in the world. I think this is very underrated because most researchers just want to get on and build the model.

Aya was a great example of this. It involved 3,000 researchers around the world in Africa, Latin America, and Asia. For me this is very critical because this creates ecosystems that go beyond a single model, and you end up collaborating in very rich ways, but it is also fundamental to this question of what leads to breakthroughs.

ANJA KASPERSEN: Which is a really interesting answer because what I am gathering from you is that, going back to your phrase, “the cost of being concerned,” how do we quantify it? Without having sufficient technical expertise globally, it is hard to have also a deep technical conversation about what to be concerned about and how to respond to it.

SARA HOOKER: I completely agree with that. I think this is one thing that is very important for policymakers as well. Traditionally it has been quite hard to attract technical talent within government, and now we are seeing for the first time this real opening where governments can create AI safety institutes and have a chance of attracting people who really want to be involved with thinking about the responsibilities of this technology to work within governments. They shouldn’t lose that chance.

I think this is very important because one of the most important ingredients for being able to question and ask, “Should this be better?” is the ability to have a technical anchoring to it. A very important new opportunity within a lot of different institutions is to build up that technical talent, which is critical.

ANJA KASPERSEN: Thank you so much, Sara. This has been a very insightful conversation spanning many fields on a topic as daunting as it is important.

To our listeners, thank you for joining us, and a special shout-out to the dedicated team at the Carnegie Council for making this podcast possible. For more on ethics and international affairs, connect with us on social media @CarnegieCouncil.

I am Anja Kaspersen, and I truly hope this discussion has been worth your time. Thank you.

Responsible AI & the Ethical Trade-offs of Large Models, with Sara Hooker

Guest

Sara Hooker

Hosted By

Anja Kaspersen

About the Series

You may also like

AI & Warfare: A New Era for Arms Control & Deterrence, with Paul Scharre

Cybernetics, Digital Surveillance, & the Role of Unions in Tech Governance, with Elisabet Haugsbø

AI, Military Ethics, & Being Alchemists of Meaning, with Heather M. Roff

Contact

Responsible AI & the Ethical Trade-offs of Large Models, with Sara Hooker

Guest

Sara Hooker

Hosted By

Anja Kaspersen

About the Series

Share

Subscribe to the Carnegie Ethics Newsletter

You may also like

AI & Warfare: A New Era for Arms Control & Deterrence, with Paul Scharre

Cybernetics, Digital Surveillance, & the Role of Unions in Tech Governance, with Elisabet Haugsbø

AI, Military Ethics, & Being Alchemists of Meaning, with Heather M. Roff

Ethics Empowered

Sign up for news & events

Contact