Linguistics, Automated Systems, & the Power of AI, with Emily M. Bender

Jun 17, 2024 47 min listen

In this AI & Equality podcast, guest host and AIEI board advisor Dr. Kobi Leins is joined by University of Washington’s Professor Emily Bender for a discussion on systems, power, and how we are changing the world, one technological decision at a time. With a deep expertise in language and computers, Bender brings her perspective on how language and systems are being perceived and used—and changing us through automated systems and AI.

Why do words and linguistics matter when we are thinking about these emerging technologies? How can we more thoughtfully automate the use of AI?

KOBI LEINS: As artificial intelligence (AI) continues to reshape narratives and paradigms, I am please to welcome Dr. Emily Bender, a professor of linguistics with deep knowledge and experience in multilingual prompt engineering. It is not that complicated, but we will explain it as we talk. A link to her incredible bio and body of work will be found in the transcript of this podcast as well as links to the topics that we discuss along the way.

Welcome, Emily. It is such an honor to have this conversation with you. The first question I want to ask you is about linguistics, which is not usually the framing for large language models (LLMs). How did you get into it and why?

EMILY BENDER: I got into linguistics because it is cool. I had not ever heard of linguistics before I got to university, but someone gave me the very wise advice in the summer before I started to look through the course catalog and just look at anything that sounded interesting. There was this class called “An Introduction to Language.” I circled it, and in my second term I was looking for a distribution credit, and that fulfilled one of them. I went, sure, I’ll give it a try, and I was hooked on the first day.

The funny thing is that the first day was actually about animal communication systems, like how do bees do their bee dance to indicate where the sources of pollen are and stuff like that, but even still this thinking about communication systems was exactly what I had always been interested in, but I did not know there was a whole field of study until I took that class.

All my degrees are in linguistics. I moved into computational linguistics sort of in grad school. My research was not computational, but I was doing research assistantship work on grammar engineering, which is basically getting computers to diagram sentences.

KOBI LEINS: Grammar engineering sounds like we are studying again now, but it must seem very old to you when people are talking about prompt engineering and other kinds of controlling or managing language.

EMILY BENDER: Yes, although prompt engineering and grammar engineering are very different things.

KOBI LEINS: In what way?

EMILY BENDER: Prompt engineering is figuring out strings to put into the LLM so that you like what comes back out; grammar engineering is creating hand-built models of the grammars of languages that are both machine- and computer-readable so that you can design systems that map from surface strings to semantic representations, so it is very much into the details of how the language works rather than poking at an LLM.

KOBI LEINS: How do I get a thing to come out?

EMILY BENDER: Yes.

KOBI LEINS: The bridge between that and LLMs and where we are now, you are an incredible voice in this space. I enjoy your cynicism. Your podcast, for those of our listeners who do not know it, is one of the best podcasts in this field with Alex Hanna. It is a lonely endeavor trying to challenge and take down some of the notions of the hype that we are hearing.

You know how these things work, which is a hard and lonely place to be. How do you manage that in an ecosystem where so much of what is said does not actually reflect what the systems can do? So here is this lover of language. We know these systems are not new and they have been around for a really long time, which is another part of the hype, something new we have never seen before, when you situate yourself in all of that noise, when you have such a good understanding—

EMILY BENDER: So basically the connection—you asked, what’s the bridge? I am a computational linguist, so I study language and how language works, and I use computers to do so. That means that in my academic life I share space with people who are doing natural language processing, so interested in how do we build language technology and how do we make it work well.

Prior to the hype that engulfed the whole world when OpenAI released ChatGPT I was already in conversation with people who were claiming that large language models understand, and I was saying: “Hang on. From a linguist’s perspective that cannot possibly be true,” because as linguists we know that languages are systems of signs, so there is always the form and the meaning, and what a language model is is a system for modeling the distribution of word forms in text, a very useful technology for things like automatic transcription, machine translation, autocorrect, and even before that spellcheckers. If you remember the T9 interface to cellphones, all of that uses a language model.

KOBI LEINS: I am feeling very old.

EMILY BENDER: It is cases where what you need is a system that can say: “This string is more plausible than that string. Compared to the corpus that I have been trained on, this looks more like the corpus than that.” So you have the output of the acoustic model in an automatic transcription system for English that takes in the sounds. It is difficult to recognize speech, and that might be that it is difficult to recognize speech or it might be that it is difficult to “wreck a nice” speech—very similar acoustically. One of those two is going to be more common in the training corpus.

That sort of language model is very useful, and there is a lot of interesting work on making them more and more effective over time, but as you said it is old technology, so the ideas go back in one sense to the work of Claude Shannon in the 1940s. Old stuff. We are talking 80 years at this point.

KOBI LEINS: People are horrified. They don’t like us talking about this. They like pretending that this is shiny and new. As soon as you say it is old I can see people’s eyes glaze over a little bit and then go, “Oh, no, you’re getting historical on me.” It is very relevant to these conversations.

Do you speak other languages, just as side note?

EMILY BENDER: I do. I had the good fortune to study abroad twice. I lived in France for a year in high school and I lived in Japan for a year in university, so I am quite comfortable in both French and Japanese. This is a dangerous question to ask a linguist. I can give you a five-minute answer talking about how I have studied other things, but let’s just leave it at “competent” in French and Japanese.

KOBI LEINS: The reason I ask that is that I think there is also a framing around these conversations of those who speak other languages and understand the nuances and complexity. You just mentioned that single-sentence string and the challenge of transcribing that, but once you speak or understand other languages you understand the complexity of language and the complexity of the context of language.

I am half-German, so I speak German and I behave, have a different tone, and, as you know as a linguist, we have different ways of moving in the world with different languages, which is one of the issues I also have with these systems that represent the world as a unilateral system where everyone has the same communication. Anyone who speaks other languages knows that is not the case and that there are so many other signals that are not picked up in these systems. That is just one minor point, but I do think those of us who speak multiple languages are more critical naturally of these systems because we understand how language works in a different way.

EMILY BENDER: Exactly. I think even the experience of studying another language—people who had the good fortune of being brought up bilingual have access to two languages in a way that is different to what happens if you try to do it through study at school.

KOBI LEINS: Oh, the humility of learning another language is horrible. You have to go back to ground.

EMILY BENDER: Also when we are using languages that we are competent in it is very hard to perceive the form of the language without also perceiving the meaning because we do that reflexively, we do it instantly, and the only way to turn it off for a language you speak really well is that desensitization thing where you repeat a word over and over and over again until it means nothing. Have you had that experience?

KOBI LEINS: Yes.

EMILY BENDER: But if you are working on a new language, especially a new language with a different writing system, then you have a lot of experience with: “Okay, I know that is language, I know it means something, but I don’t have access to it so I can look at the form of it and see that the form and the meaning are not the same thing.” Because we so quickly and reflexively interpret language when we encounter it that is part of where the solution is coming from that the large language models are understanding.

A side point that I think is important and another one of the ways that linguistics informs this conversation is what we know about what we do when we understand language. We are not just picking up something that has meaning in it and unpacking it. Instead, we are saying, “Okay, what is everything I know about the person who wrote or said this,” which might be very little or a lot, “What do I think they believe about our common ground,” and given that plus the words they said what could they possibly have been trying to communicate to me using those words?

Or to somebody else. Maybe I am an overhearer and not the direct addressee, but it works the same way.

Enmeshed in all of that is that in order to interpret what somebody else has said we have to imagine the mind behind the text. That is how we interpret language. So when we are playing with ChatGPT and out comes some text that does not come from a mind, that does not represent communicative intent, in order to interpret it we still have to do that same thing and then we have to remember, “But there is no mind there,” and that last step is difficult.

KOBI LEINS: Some people listening to this podcast might wonder why we are talking so much about language and why does it matter. The context for me is important in this broader conversation of these systems and how, not speaking to a mind, but also how they are shaping us and the behaviors I am seeing in the corporate world of how people are affected by this beyond the hype, the people who are actually engaging with these systems.

I have a colleague who said she found the communication challenging. She is on the spectrum and said: “It is not how I communicate it. I don’t understand it. I cannot get it to write how I want to.” She gave it a persona and said, “I have autism,” and the language just spat out, “I have autism,” and wrote as it normally does.

How do you have these representations in the world, and what does that mean more broadly for a societal, ethical positioning, where you have again this homogenous, mindless language word salad that is being sold as a persona? What do you think it is doing to us? Where are you looking with most intrigue?

EMILY BENDER: I feel like a big part of it is connected to this issue of automation bias, this notion that computers are math and math is objective, so whatever the computer is doing is unbiased and we should believe it. If you apply that way of thinking to any machine-learning system, which is pattern recognition over data, so we are going to be reproducing the patterns of whatever was in the data, and that can never be unbiased, it has this bias-washing thing going on because the computer is doing it now.

So we have this one homogenized thing that reflects some sort of average of the points of view represented in the data that it was trained on, which we do not know anything about because OpenAI is not open about that, and coming back in this shiny, polished, “It’s a computer; it must be right.” It also has this layer of what they call “reinforcement learning from human feedback,” which was done through these incredibly exploitative labor practices, making people look at awful stuff for poor pay in Kenya.

KOBI LEINS: Kenya is one of them but a number of countries.

EMILY BENDER: I think Kenya was where Karen Hao and Billy Perrigo did the reporting.

So you have this shiny, corporate, it’s-a-computer thing and it is reflecting back one view of the world which not everybody fits into and which is not the one true objective view of the world because that does not exist, and it is going to more closely mimic the hegemonic view of the world. So the more you have experienced marginalization, neurodiversity, or other kinds of marginalization, the less you are going to fit with what is presenting, but because it is a computer it gets to skate past some of the systems that we have been trying to put in place to address those systems of oppression, and I think that is a big issue.

KOBI LEINS: Have you given much thought to Karel Čapek in R.U.R. and how these systems ultimately always push us toward a more authoritarian and homogenous way of being? Is that a position that you hold, or do you think there is a different way that we could be doing this, especially given what we have found out in the last few weeks about how this was released, the lack of controls, and the safety removed. Could it be done better or differently?

EMILY BENDER: I think there is liberatory potential in data collection and in doing automatic pattern matching over data, but in order to realize that potential the data collection has to be done by and for the interests of the people who are liberating themselves, and it has to be done in ways that people have real consent over what is happening with their data and in ways that are not universalizing so we are not making one chatbot that works for everybody.

Rather a community might say, “We are collecting data because we are interested in these patterns” or “We are designing”—the Māori have done excellent work designing Linux pattern-recognition tools for their language and saying, “This data is our data, we are the stewards of it, and we are not sharing it with the broader corporate interests.” They have produced a nice model for how to think about that.

They do not call it a license. I think it is “stewardship” or some sort of statement for how people relate to the data that is a thinking through that makes sense in their own context and the folks at Te Hiku Media are not saying everyone should use this license-like agreement but rather: “We are putting it out there so other people can learn from what we have done. Please make something that works in your context.”

I don’t think that computers, computation, or automation are necessarily going to take us to authoritarianism, but there is a huge risk, and that risk has to do with these notions of scale, one-size-fits-all, and someone is controlling the system and everyone else has to use the system.

KOBI LEINS: So it is about democratic process and safeguards along the way. You have mentioned unfair labor practices, which is one major part that I think most people are not aware of or think about in general circles.

The other thing they are not necessarily thinking about is the environmental impact, which I am insistent on raising at every single turn: Who makes those decisions about what kind of energy we should be consuming for whose purpose, which again comes back to those models for groups I suppose in a way.

Do you have any thoughts on that from where you sit? You speak with a lot of people who think about these things all the time. Climate change is such a front-of-mind issue. How should we be joining these conversations? How should these systems be measured in terms of value for society more broadly?

EMILY BENDER: In the context of thinking about environmentalism and the environmental impacts I like the way that Alan Borning, Batya Friedman, and Nick Logler pushed back against the metaphor of the cloud. They have this article called “The ‘Invisible’ Materiality of Information Technology” that came out in 2020 in Communications of the ACM.

They point out that when we talk about cloud computing, clouds are light, fluffy, and harmless little bits of water vapor until you are somewhere where you experience tornadoes, hurricanes, or other severe weather events, but think the pretty, fluffy, white clouds in a blue sky on a not-too-hot day, and that is the metaphor of, “The data is just somewhere out there, and we do not have to worry about it.”

In fact, cloud computing is driven by these massive data centers that require a lot of energy, that require rare earth minerals to create, and that require clean water for cooling, which is a lot of environmental impact, and it is done in a way where the people who are making decisions about whether or not to use it, making decisions about asking Stable Diffusion to create an image for you, or asking ChatGPT to do something, it is all invisible to you. You do not actually know what exactly the environmental impact is.

People like to say, “Oh, well these are going to be situated in places where water and electricity are plentiful.” It is not the case. They are getting situated in places where the regulations are lax and they can have access to water and electricity, but it is not the same thing as it being plentiful.

KOBI LEINS: It is also one of those areas I am finding overlapping with arable land. Singapore has just put a ban on more data centers because that requirement of clean water and energy is actually what is required to produce food as well, so we are hitting up against some of these tensions. I look forward to reading that article; we will link to it as well. I think there is a lot more that we need to think about in this area.

How do we present these systems responsibly? We both follow the lives of the characters who build, sell, and market these tools, but what could we do differently? To your point, do the models need to be smaller? Do we need to have greater civil engagement? How do we make sure that these tools are creating stronger democracies and not the opposite? And is it too late?

EMILY BENDER: I never think it is too late. One of the metaphors I find hopeful is, remember the ozone hole? It was a big deal. I think it is not a big deal anymore. We as a globe banded together, passed some regulations, and that issue is in the rearview mirror at this point. Some of the younger listeners probably have no idea what I am talking about.

KOBI LEINS: I remember going to Europe. I live in Australia, and they were like, “How do you have so many freckles?”

We were like, “Ozone.” It is fascinating because it was largely impacting Australia, but that is a side story. People who don’t know can look it up.

EMILY BENDER: All of that is to say that you can have global-scale problems even if it was impacting Australia before Europe and global-scale pushes for regulation can actually address them. So it is not too late.

What do we need? I think we need transparency and accountability. There should absolutely be transparency for any applications of machine learning—what the data was, how it was collected, how the thing was evaluated—and one of the places where the way people are using large language models now completely falls down is that they are what Timnit Gebru calls “everything machines.” They are not for anything, so they cannot actually be evaluated to see how good they are at everything. That is not something you can test.

When we are doing automation, which can be valuable, we should be asking: “Okay, what is being automated? Why are we automating that task in particular? What is the input? What is the output? What was it trained on? How was it evaluated? Who is benefiting from automating this task, and, when it works well, who is being harmed and when it fails how does it fail and who is harmed by that?” If we ask those questions every single time about every kind of automation, I think we would be in a better place, and I think some systems you cannot get off the ground. So, what is this for?

Also, transparency around things like the environmental impacts: How much electricity is this using, how much water is it using? It is a little bit harder to do a calculation of carbon footprint because you have to know where the electricity came from. Also it is a little bit harder to allocate the mining that goes into building the chips to specific uses, but you can make that transparent, so we are starting to see some airline sites that say, “This is the emissions for this trip.”

If you had that front and center, it would be like, “Okay, maybe I won’t ask ChatGPT.” Or, if you had a choice between search engines and there is the version that was just going for an index of pages versus the version that was doing an AI summary but you could see the environmental difference between them, you might have more informed choices.

All that said, I feel myself going down the problematic path of making this individual consumer problems.

KOBI LEINS: It is easy to solve that way.

EMILY BENDER: Certainly regulation that pushes it back onto the corporations offering this to actually deal with the environmental impact rather than just making it a choice for consumers is probably a good thing.

Finally, I want accountability. We have not talked about this at all, but there is the issue of the metaphorical pollution of the information ecosystem as we have all this synthetic media going out into the world and it is not watermarked except incidentally. People have searched in Google Books for phrases like “As an AI language model” or “As of my last knowledge update,” which are sort of accidental watermarks of ChatGPT output and they are finding it, but it should actually be more robustly watermarked so that it can be cleaned up and moved out of the way.

I think that anybody who is running one of these systems and opening a tap to spill synthetic information should be accountable for making sure it is watermarked so they can be cleaned up but also accountable for—imagine if OpenAI actually was on the hook for everything that came out of ChatGPT? If it outputs libel, OpenAI is liable; if it outputs medically dangerous information, OpenAI is liable. That would be a better world.

KOBI LEINS: I don’t see how it is not liable as a legally trained person. I am waiting for all of these cases to come and they don’t, and even with the pollution of the internet ecosystem, to your point, there is this question around how can you do this and not be held responsible, and yet still the pushback is on regulation.

When I said it is “simple” for individuals I was being slightly sarcastic, but individuals cannot fix this. An individual cannot sit back and make those calculations. It is a little bit like environmental consumption as well. It has to be a societal approach, but how do you push that back? How do you make sure that these companies are accountable?

To your point, we still do not even know what is in the data sets of these large companies. They are not being transparent about where things have come from. They are not saying or showing. I am just seeing things going wrong in so many places, and I am astonished at how—litigation is always slow and it is not my preferred method in the toolbox of responding, but it is going to come.

EMILY BENDER: Yes, and hopefully it will be done skillfully and be heard by judges who are not—one of the issues here is that every time this is hyped as artificial intelligence people set aside their critical thinking about it, it seems. “Oh, well, it’s AI.”

KOBI LEINS: “This is a thing that is smart and people should know that and they should just,” yes.

The chatbot Canadian airline case, was an interesting one because it was so simple and so obvious. I love that as a precedent. I think we are going to need some more precedents that are also fairly straightforward because the other issue I think you have alluded to is that judges do not necessarily understand the technologies. Beyond magical thinking there is a lack of skills around being able to interrogate and challenge what is being generated and how, which we are seeing in the pollution of the ecosystem in the last two weeks. There have been some hilarious and tragic examples.

EMILY BENDER: There was a blog post that came out from Google just yesterday I think, basically saying: “Yes, we know it is making some mistakes but here it is mostly good and we are adding some guardrails so that it is going to full-on satire less, and also a lot of those things you are seeing circulating are from someone who is basically fishing for a funny answer,” like they are asking questions that nobody would ask.

One of the examples I saw that I appreciated because it was local to me here in Seattle was: The query was, “Mount Rainer eruption prediction.” We live near volcanoes here.

KOBI LEINS: It’s a good question to ask.

EMILY BENDER: One of them has erupted in my lifetime. Mount Saint Helens erupted in 1980. Mount Saint Helens is not as close to a population center as Mount Rainier is, so someone typed in “Mount Rainier eruption prediction” and what came back was this thing that said something like, “According to a 2022 study by Washington State geologists Mount Rainier is unlikely to erupt in anyone’s lifetimes except perhaps the very end,” meaning that if it blew a bunch of people would die.

You click through, and that came from a site called The Needling, which is a Seattle-specific satire site.

KOBI LEINS: The Onion articles are great too.

EMILY BENDER: The Onion is widely known. The Needling is local to Seattle. There are a few things I like about this example. One is, it is not someone fishing for a bad answer. That is a reasonable query.

The other thing is, and these are points that I make in some work together with Chirag Shah. We have a pair of papers, one called “Situating Search” from 2022, and one from 2024 that has a longer title so I cannot do it off the top of my head, but it is something about information access and a healthy information ecosystem.

One of the points we make is that even if the chatbots were able to return only “correct” answers—and I am putting scare quotes there because what does that mean?—it is still a bad idea to use a chatbot for information access because when you are doing information access you are doing two things at once. You are both getting an answer to your immediate question and you are building up your sense of the landscape of information around you.

I give an example: Imagine you put in a medical query and you got back answers from the site WebMD and maybe the Mayo Clinic, which is a famous medical center in the United States, maybe Dr. Oz, who is a famous I would say charlatan in terms of the way he presents medical information, and then as a fourth link you got back a link to a forum where people with similar medical questions were talking to each other. You could then situate each of those pieces of information according to what you thought about those sources, but if what came back was just an answer from a chatbot you have lost that context and the ability to connect it to the context and the ability to build up over time what you think of WebMD, Dr. Oz, and the Mayo Clinic. Finally, you lose the ability to connect with the other people in that forum. So, chatbots as a replacement for search is just a fantasy.

Coming back to The Needling example, I found it to be a useful test case to show how the connection between the text and the context is where a lot of the interpretation happens.

KOBI LEINS: I love the article that said that poorer schools could not afford the past tense. I think that was my favorite. It was pulled from The Onion and presented as fact. I wonder where they made it, but I will definitely be looking it up.

Let’s go back to words. As a linguist you raise this interesting point, which I do not think we talk about enough, about clouds being white, fluffy things. You also touched on hallucinations. I find “hallucinations” as a term problematic. “Cookies” is another one. It is this thing that trolls the web and collects all this data about you but is presented as something that has chocolate chips that everyone snuck out of tins as children. How is the language—and prompt engineering, which we touched on at the very beginning—we are using around these systems themselves shaping our understanding of them, and what would you like to change? Do we need a new glossary?

EMILY BENDER: I think we could use a new glossary. I think it is worth tracking the way people are talking about things and asking about what unmerited assumptions are those metaphors bringing in. Cookies sound fun and pleasant, sure—“Here, have a cookie. It wants a cookie? Okay, yes, it can have the cookie, or it is giving me a cookie? Excellent. I will accept the cookies”—when in fact it is a privacy nightmare. That cookie that you are swallowing has little microphones in it that are now recording everything about you.

I am glad you brought up the word “hallucinations,” which I think has a couple of types of issues with it. One is that if it is being used tongue in cheek then it is making light of what can be a serious symptom of mental health issues. That is not nice, but also hallucinations suggest experiencing things that are not there, so you are having an experience and it does not match what other people around you would be seeing; it does not match what is in the outside world.

If you talk about LLMs having hallucinations or doing hallucinations, you are suggesting that they are having experiences, and they don’t. I think the term “artificial intelligence” itself is problematic. I try to avoid terms like “recognition,” but I know I have used it at least once already in this podcast, so talking about “pattern matching” instead of “pattern recognition,” talking about “automatic transcription” instead of “automatic speech recognition,” all of these things where when we talk about the systems as if they had minds we are reinforcing this idea that there is a mind there, so keeping an eye on the metaphors is important.

Also, keeping an eye on how broadly we use given terms, so when the term “artificial intelligence” is used to refer to automatic transcription, protein folding, synthetic text coming out of LLMs, license plate readers, or whatever, that sounds like there is one thing that can do all of these things, so if I am impressed by ChatGPT or I am impressed by Stable Diffusion then I am more likely to accept something like ShotSpotter, which is a system that is supposed to detect gunfire and send an alert to the police.

Not heard of that one? That one is pretty awful.

KOBI LEINS: “I think there’s another solution to that problem,” she says from Australia.

EMILY BENDER: Yes, and the marketing copy —the company has rebranded it as “sound thinking,” which is also awful. They have never presented any evidence that their system works as advertised, and what they are basically set up to do is send the cops into a situation where the cops believe they are encountering someone with a gun, which is just a recipe for police violence the way things work in the United States. Their advertising copy says that they are trying to save lives because, “We are going to detect when a gun has been fired, and we are going to get the police there before the person bleeds out.” If you actually cared about somebody bleeding out, it is an ambulance that you would send, medics, not cops.

KOBI LEINS: I think you have just encapsulated what I see day to day on the commercial review situation of these projects, the level of hype. If someone says AI, a sort of jazz hands and a bit of hip movement, and people look away, when in fact we need to look more closely. Often to your point much of what is being sold does not match what is actually under the hood. Is it even capable of doing that thing, and if it does, is it done ethically and in a way that is sustainable and does not exacerbate the problem, like managing, let me say it out loud, gun control. You actually are then diverting from other problems, not to mention the money that is being spent on these systems when the actual problems could be solved better.

I think the words that we use are important because they do shape softly the way that we think about these systems. Maybe an alternative glossary is something we could potentially think about doing.

EMILY BENDER: I am super-intentional when I am talking about ChatGPT to resist saying things like, “I asked ChatGPT and it told me.” I actually never use it, “So-and-so asked ChatGPT and it told them.” No. Actually what the person did was input a prompt into ChatGPT and then they interpreted the output that came. It gets a little bit cumbersome, but it feels like a good practice to resist this idea that that is a thing you can ask questions and it is answering you. I think that some of that resistance can be linguistic.

KOBI LEINS: For listeners who are less academic, they may wonder why words matter—for those of us who pay attention to words, we understand—rounding this conversation out with why does this matter for a more ethical and fairer society, which is the mandate of this particular podcast and group, and who has got the power to make these decisions? We have talked about how the models should be more locally based and more individually controlled and more personalized to meet the needs.

Unfortunately, neoliberal capitalism does not necessarily work that way. What could we do or what should we be doing—solve all of it, Emily. I hate these questions when I’m a guest, but from where you sit what could we do other than managing our language more carefully, holding to account, and all of those sorts of things? Where do you see the need or possibility for doing this differently?

EMILY BENDER: Picking up a couple of threads from earlier in the conversation, we were talking some about transparency, and I think it pairs nicely with questions of privacy.

This stuff is working the way it works right now. When I say “this stuff,” I mean the large language models—Gemini, ChatGPT, etc.—because some companies have felt entitled to just grab a bunch of data, and not only do we not know what is it in so we do not know what it is good for, but we also do not know what information about us is included in it. There is a privacy angle that I think dovetails nicely. I think that on an individual level we can decide not to go with the flow and decide not to agree that: “Oh, that’s just the way things are, I have put it on the web so therefore anybody can use it.”

That is not what you put it on the web for. If you were having a conversation on social media, you were doing that because you were talking to people and you wanted other people to be able to hear. You were not agreeing to have it scooped up into these large data sets. You were not agreeing to being able to have other entities be able to make inferences about you by combining your data across lots and lots of sites.

I think on an individual level pairing issues of privacy with these questions of over-centralization makes sense, and I think that back to these questions about transparency, even though we do not want to make environmental problems something where it is an individual-level solution as we were talking about before—and I did catch your sarcasm; I hope the listeners did too.

KOBI LEINS: Sometimes it is quite subtle, but, yes, excellent.

EMILY BENDER: I think that transparency can help because if we are going to make policy changes we need broad support for those policy changes, so if everyone sees the use of ChatGPT as the same as just pouring out clean drinking water in the desert—

KOBI LEINS: Visualizations really help, right? This is a bottle of drinking water every time you are doing this.

EMILY BENDER: “Here is a nice glass of water that I am pouring out on the desert, and I am going to take one video of it, and then you are going to replay that video a bunch of times using energy.”

But, yes, the association that this is not clean, this is not white, fluffy clouds, and the more hard information we have about it I think the more political will there will be to come in with regulations. I think there is a lot of common purpose between protecting privacy, protecting the environment, and pushing back against the centralization of power. I think all of those point in the same direction, and the more that we can get people accustomed to thinking about each of those pieces the easier it is going to be to build the political will.

KOBI LEINS: I have to say that I cannot see them not being aligned if you want to do it properly. Circling back to the Māori example, I was fascinated a year ago when I was at a conference to hear the Māori talking about the Declaration on the Rights of Indigenous Peoples, which I was in the room for when it was adopted and helped to lobby for in New York, and to see that treaty being used in that way was fascinating because it certainly was not envisaged to be used for data at the time, but of course it applies because there are rights around data as well.

I am just thinking out loud, but a lot of these areas, some of us have been saying for years that the law applies. Back to your point of cases, I am not sure why there have not been cases on the right to be forgotten, for example, or misrepresentation of people in these—the way that people are being portrayed is not accurate. There will be cases, but that is slow and I’m thinking about all the other tools that are available in the meantime to challenge and hold to account. I think the more powerful who are pushing forward without any regard for broader humanity is something we are all keeping an eye on.

EMILY BENDER: I am not a legally trained person, but it seems to me that some of those cases should be brought by regulatory bodies. I think the U.S. Federal Trade Commission is doing a nice job. There was a moment where the Italian data protection agency—I don’t know the name of it—was pushing back against ChatGPT, and then they folded. The grounds there as I understood it was that the OpenAI was not General Data Protection Regulation-compliant in the way they handled data, and then somehow the Italian regulator was mollified by giving people an opt-out to having their data collected from what they were putting in as queries, which is ridiculous because how is a thing outputting any Italian at all? Most of that is going to be from Italian speakers in Italy. That is data they have collected.

My feeling is that the more the public sees through the hype and the more the public understands this stuff to be not any kind of intelligence—even intelligence as a concept is problematic—and to be polluting both metaphorically and literally and to be something that is a power grab by people who already have a lot, then the more regulators are going to feel empowered to actually implement the regulations. I think these things are connected.

KOBI LEINS: In 2018 or 2019 I remember reading about judges in France prohibiting this tendency to analyze judicial decisions, some of the early-use cases of some of these systems. The French pushed back and said: “Not on our watch. We don’t want you to know that we make poorer decisions just before lunchtime.” It is interesting when the groups in power who are impacted also say, “No, not with us,” which obviously the judiciary can influence that fairly heavily, but I think for groups that are impacted to raise questions, engage, and raise profiles as well on what is going on and where things are broken—there is that registry as well of things that are going on. I will have to look up the link.

EMILY BENDER: The AI Incident Database?

KOBI LEINS: Yes. I have not been following it very closely, but I think it would be interesting to keep an eye on that as well.

EMILY BENDER: They are probably getting completely snowed under right now with the Google AI overviews. It is shocking to me how much people have jumped on this bandwagon.

KOBI LEINS: Also, it was released without board approval. This week just hurt my head.

EMILY BENDER: The whole OpenAI thing, part of what is going on there is that their notions of safety are not actually grounded in the real world. They are down in that TESCREAL—transhumanism, extropianism, singularitarianism, cosmism, rationalism, effective altruism, and longtermism—rabbit hole. TESCREAL is this acronym that Émile Torres and Timnit Gebru coined for this bundle of ideologies which are not concerned with what is happening to individual people’s rights, not concerned with the information ecosystem, not concerned with misinformation, disinformation, or the impact on democracy, the impact on public health, and all these things. They are concerned with these weird long-term fantasies.

When I read those stories that it was released without board approval, some of the subtext that I am reading is that the board or maybe OpenAI are actually really only concerned with what they call “long-term risks.”

KOBI LEINS: Longtermism.

EMILY BENDER: But they are not long-term risks; they are fantasies.

KOBI LEINS: Again, the words matter here.

EMILY BENDER: Exactly. I can be very pedantic about it because I am a linguist. It is shocking that they did not, but also I think that probably one or both sides of that were really just concerned with: “Well, is this going to lead to human extinction? Probably not. We will put it out there and never mind every individual person who is going to be adversely impacted by this.”

KOBI LEINS: I don’t know about you, but I am clinging onto my books harder than I ever have before. I am certainly not giving my publishing rights to—I have also been around the traps, being asked to submit academic publications for use in AI, which I think is highly problematic, but I think those ground truths and sources in the literary sense are actually going to matter more and more as we go forward to be able to verify.

EMILY BENDER: Authenticity, provenance. Where did this come from?

I get so frustrated when I see people blithely “asking” ChatGPT, putting a prompt into ChatGPT and taking its output as if it were interesting about anything.

I was just looking at this academic paper about how norms get set. The author was asking questions about, “Given how fast AI is evolving”—first of all it is not fast; secondly it is not evolution because it is not biological.

Yes, we are going to see some changes in norms or some need for adaptation of norms or reassertion of norms around the use of these kinds of automation, and then apparently from the abstract part of this article involved “a conversation with ChatGPT” about the topic. Why would anybody think that is worthy of an academic publication?

KOBI LEINS: Yet so many conferences you go to, people say, “I looked myself up.” I think there is still that magical sort of shine to it even though we know it is not new and we know it is clunky, broken, and constrained in what it can achieve, and yet the fascination that people have with quoting it like it is a thing—

EMILY BENDER: The phrase, “stochastic parrots” was selected as the American Dialect Society’s AI-Related Word of the Year in the most recent meeting. I was there. It is a really fun meeting. There was this moment where you have the four or five candidates, and people get to stand up and speak about them.

As the coiner of that phrase, I got to speak about it. One of the things that makes the thing a lot of fun is one of the people running the show—I think it is Grant Barrett—runs a projector and is doing running commentary in a text format bouncing off what people are saying, and it is really funny. While I was speaking, up popped this interaction with ChatGPT. I said: “Oh, that is ChatGPT output. I make a habit of not reading any of that.” What I learned afterward was that apparently Grant Barrett had prompted ChatGPT to describe who I am. That is only interesting as a reflection of what is in its corpus, but we don’t actually know what the corpus is, so I don’t care.

KOBI LEINS: That is so interesting. I was actually going to start this talk with stochastic parrots, and I was so excited to see you that I forgot, so I will make sure we put the link into the chat as well for those who are not familiar with your work because I think it is one of the most groundbreaking papers in this field in the last decade to actually reframe these conversations, and many of us have leant heavily on your work to have better and more thoughtful conversations.

We look up to you, and you tower above and around us, providing us with language and context, and a sense of humor as well. I also love your website, where you require people to mention certain words in their headings if they have read your website to approach you. I have actually copied a number of your approaches.

EMILY BENDER: You mentioned humor, and that is certainly part of what I am doing together with Alex Hanna on the Mystery AI Hype Theatre 3000 podcast. One of our catchphrases due to Alex is “Ridicule as praxis, honestly.”

KOBI LEINS: Which is wonderful. My honors thesis was on dictatorships and humor. I was looking at literature in Germany and how conflict was resolved through humor.

EMILY BENDER: Oh, amazing.

KOBI LEINS: We also have an AI Australia podcast here which is similarly disrespectful in a good way, and I love what you do, so thank you so much for putting that in the world and giving voice to so many who otherwise feel very alone in this work.

EMILY BENDER: I appreciate the way it leads to community building. That has been a great outcome of the podcast.

KOBI LEINS: Dictators don’t like humor.

EMILY BENDER: Or community.

KOBI LEINS: Thank you for being such an incredible role model and thanks for sharing your time with us on this podcast. This is such an important topic, and I could talk to you for hours, but I am also conscious of your time, so thank you.

EMILY BENDER: I appreciate that. I just want to say about the stochastic parrots paper it is interesting to hear it described as groundbreaking because it was a survey paper. We were pulling together work that had already happened. From what you said basically it was answering the question, and the question came from Timnit Gebru. She approached me and asked, “What are the things we should be concerned with as we push to make bigger and bigger language models, and has anyone written a paper?”

I said, “Not to my knowledge, but here are the things that I think would be on that list.”

She said, “Hey, that sounds like a paper outline. Shall we write it?”

So it was her wisdom to be asking that question, and then the shared effort of actually seven of us—and there is a whole story that people can look up about how come it has fewer authors—pulling together what we had already read and putting it into the answer to her question there of what should we be concerned with. I think that is really what you are reacting to in saying: “We don’t have to just follow the hype. It is worth stopping and asking as we are choosing research directions and as we are choosing directions for investment what is known about the dangers down those paths,” and that is what that paper was.

KOBI LEINS: I think it was the first paper to talk about environmental impact in the way that it did. I had not seen anything at that time and was looking, so I was delighted, and I think it opened that gate much to the chagrin of certain characters involved.

EMILY BENDER: Yes and no. Again, we were surveying, so we surveyed several papers about environmental impacts, and the one thing we brought together with those empirical results was the point about environmental racism and that when people talk about cost-benefit analysis it is not enough to weigh costs against benefits. You have to think about who is getting the benefits and who is paying the costs.

KOBI LEINS: Similar to AI for Good, which I have trouble with, because it is always who is good in what context and at what time.

EMILY BENDER: I have a hard time with that too. I think part of it is that it feels like a dodge from actually dealing with the negative impacts of stuff to say, “No, we’re doing it for good, so it is all okay,” where a much clearer view would be to ask: “What are the downsides, who is paying, and what does justice ask for in this moment?”

KOBI LEINS: Complexity is something I think the world is less and less willing to engage with unfortunately as a simplified system, boiling things down to a unitary factor to your point, so keeping this complexity in our conversations and having these more nuanced conversations about words and uses of these tools is important.

Thank you so much for your time and for doing this today. It has been such a joy.

EMILY BENDER: My pleasure. Thank you for some great questions.

KOBI LEINS: Thank you so much, Emily. This has been an enlightening conversation spanning many different fields. To our listeners, thank you for joining us, and a special shout-out to the dedicated team at the Carnegie Council for making this podcast possible across continents and time zones. For more on ethics and international affairs connect with us on social media. I am Kobi Leins, an advisory board member of the Artificial Intelligence & Equality Initiative, and I hope this discussion has prompted some new thoughts and has been worth your time. Thank you.

Carnegie Council for Ethics in International Affairs is an independent and nonpartisan nonprofit. The views expressed within this podcast are those of the speakers and do not necessarily reflect the position of Carnegie Council.

You may also like

JUN 27, 2024 Podcast

AI, Military Ethics, & Being Alchemists of Meaning, with Heather M. Roff

Senior Fellow Anja Kaspersen and Heather Roff, senior research scientist at the The Center for Naval Analyses, discuss AI systems, military affairs, and much more.

JUN 3, 2024 Podcast

The Intersection of AI, Ethics, & Humanity, with Wendell Wallach

In this wide-ranging discussion, Carnegie Council fellows Samantha Hubner & Wendell Wallach discuss how thinking about the history of machine ethics can inform responsible AI development.

MAY 15, 2024 Podcast

Beneficial AI: Moving Beyond Risks, with Raja Chatila

In this episode of the "AIEI" podcast, Senior Fellow Anja Kaspersen engages with Sorbonne University's Raja Chatila, exploring the integration of robotics, AI, and ethics.