When it comes to machine learning, are safety and ethics the same thing?

KUSH VARSHNEY: My name is Kush Varshney. I'm with IBM Research at our T. J. Watson Research Center, which is a few miles north of here. I'm an electrical engineer by training—not an ethicist, not a philosopher, not a lawyer—so this has already been a very good learning experience for me to see what the different perspectives are. I'll give some of my perspectives. There will be some math in the slides as well, so hopefully that's not going to be too off-putting. Let me get started. Professor Wallach mentioned this nice thought about outward- and inward-looking actions. One thing that I do at IBM is working on our Science for Social Good initiative in which we try to work with NGOs and other partner organizations to do exactly the sort of things that he mentioned. On the inward-looking side, relating to yama/niyama that he also mentioned, it's more of the yama side. We have this Sanskrit quotation, Ahimsa paramo dharmaha, which is saying that the first thing that we should be concerned about is the non-harm aspect of things. If I start thinking about what that means as a machine learning researcher, what does that mean? What I'm going to try to say in this presentation is that we should really have a common framework to understand many of the desiderata that we've already heard throughout the day so far. It's all about having considerations beyond accuracy in machine learning. So it could be interpretability, causality, fairness, various types of robustness, open data and so forth. If we are dealing with socio-technical systems, which we are—and we've heard some really nice examples already of all the different ways that AI is affecting our daily lives—then we should start questioning whether they are safe, and what do we even mean by "safe." Hopefully, through the rest of today and tomorrow, I will myself learn more about whether safety and ethics are the same thing. Hopefully, you all can help me learn as well. "Safety" is a commonly used term across engineering disciplines. It is usually used to connote the absence of failures or other things that render a system dangerous. So we can talk about safe neighborhoods, safe foods, safe toys, all sorts of things. But oftentimes when used in those particular fields there is no common definition that could be applicable to a new field. So it's a very specific thing, like "the road shouldn't curve more than this much," or "the toy shouldn't have lead paint on it" or whatever. But there are some pieces of work that try to define safety based on a very precise decision-theoretic definition that can be applied more broadly. That definition is based on harm, risk, and epistemic uncertainty. Harm in particular is when you have an outcome and that outcome is undesired and the cost when measured and quantified by society is above some threshold. If that is true, then you have a harm. If the unwanted outcome is of small enough severity, then maybe it's not a harm and it's not a safety issue. If you get recommended the wrong movie, maybe that's not such a big issue; it's maybe not a safety concern. And then, with risk. So we are always in the situation where we don't know what the outcome will be, but with risk we know its distribution, its probability distribution, and we can calculate the expected value of its cost. Then the risk is the expected value of the cost of harm. In contrast, with epistemic uncertainty, we are still in the situation where we don't know what the outcome will be, but here we don't know its probability distribution either. So it is this lack of knowledge that we have, and because of it we cannot assign any sort of expected cost. If you are a Bayesian sort of person, then you might argue that there is no distinction between risk and uncertainty, but for the purposes of this talk let's say that those are two different things. So then what is safety? Safety is the reduction or minimization of both risk and epistemic uncertainty of harmful events. Again, costs have to be sufficiently high in some human sense for the events to be harmful. Safety involves reducing both of these things, the probability of expected harms and the possibility of unexpected harms. Now moving to machine learning, don't get bothered by a lot of math. The key point to note is that the founding principle of machine learning that's practiced today and all the theory really is risk minimization. So you have these probability distributions, you have functions, and you have objective functions through losses, and you are trying to find some function to minimize the risk. The reason why machine learning even exists as a field is because you never actually have access to those probability distributions. You only have a finite number of samples from which you want to estimate things. You can only then empirically minimize that risk, and it is not always the case that the thing that you would do with infinite data is the same thing that you would do with finite data. In particular, among different functions that you can use you want to restrict the complexity of those functions to be within some hypothesis space, and that helps you generalize better for these new unseen examples. This is all risk, so that's what everyone is working on that is doing anything machine learning-related. That particular way of thinking about things does not actually capture uncertainty. The sorts of uncertainties that can exist are these training samples not being from the true underlying probability distribution being the main thing. So if you even cannot know what distribution they came from, then you cannot do things like domain adaptation and other things that people have come up with. If your samples are from a different distribution, there can really be a lot of harm. Another source of uncertainty is even if the samples that you are basing your machine learning on are from the actual distribution, they might be very under-sampled in certain parts of the feature space. So you might be a very rare example and you have no other examples like yourself, so the machine learning algorithm just has to fill in things there without any information. A third one is that even if you've done everything optimally, you have the best possible classifier, best possible regressor, you are only going to get a handful of test samples as well. So those might actually just by chance be in part of the space where we are not doing the best job possible. With all of these things we have to start thinking about what are considerations beyond risk. Then, finally, with the loss functions, as I showed earlier, there is a loss function that is defined and you have to optimize it in a typical machine learning formulation. Usually, it's just on these labels—so what movie, what diagnosis, whatever, these sorts of things—but oftentimes there is actually a human cost, which makes it such that different parts of the feature space actually have different costs. If you got an email from your boss telling you to do something important and that got mislabeled as spam versus some other email that is not so important and that got mislabeled as spam, if you are using a typical machine learning formulation, both would have the same cost to you, but there actually will be different costs depending on the semantics or the features of the actual sample. So some parts of the feature space might be such that they are harmful and safety issues and some that are not. We've already heard about value alignment. Obtaining that loss function is actually a nontrivial and context-dependent process. It really is about eliciting the values of society and encapsulating morals. There might be different views on this. My view is that really what we're trying to do is from a practitioner's perspective encode all of the morality or all of the values into this loss function so that the algorithms can then use those to move forward. Again, it can be context-dependent, so the loss doesn't have to be constant across all situations. Moving to how we might achieve safety, from the engineering literature there have been four categories that have been identified. We'll see how they might apply to the machine learning context.

  • The first is inherently safe design. An example from engineering is not putting hydrogen in a blimp, putting helium there, so there is no chance of it igniting.
  • Another category is safe fail: a system remaining safe even if it is not able to do its intended operation. Train engineers have these dead-man switches; you let go and the train just stops, it doesn't continue into some sort of unsafe mode.
  • A third category is safety reserves. These are margins or multiplicative factors that let you be in situations where you aren't over-stressing things. A boiler having a wall thicker than it needs to be is a safety factor here.
  • The final category being procedural safeguards. These are things like audits, training, and posted warnings.

Now let's try to see if these can apply to machine learning as well. First of all, inherently safe design: We've already talked quite a bit about explanation and interpretability of models. These are examples of how to introduce inherently safe design into machine learning. If we want that robustness against particular uncertainties of the training set not being sampled from the test distribution, then if you are using some really complicated deep learning model, it's actually not possible to know all of the corner cases and seeing if you actually have that robustness.

But if you have an interpretable model, something that people can understand, then you can actually just directly test, look at it and see if it has that robustness or not. Similarly, if you have causality of your predictors, rather than just correlation-based things, then you can also ensure much more inherently safe design. Both of these things introduce extra constraints on this hypothesis space that I mentioned before.

It might lead to a reduction in accuracy or risk when measured in the traditional sense, but, because of the reduction in uncertainty, it actually increases safety.

In terms of the second category, safe fail, there are ways to do this as well. There is a reject option approach, where if the algorithm is uncertain and not confident, then it can ask for human intervention. Similarly, if it is operating in a part of the feature space where there are very few samples on which it is basing its decision, then it can also ask for more human intervention.

With safety reserves, we can look at two different types. One is related to uncertainty. The second I'll get to is related to loss functions.

There are ways of developing robust formulations for machine learning which are different than what is in every sort of thing in practice right now. There are things where you can minimize the maximum ratio of the optimal risk over what you see or the difference.

Another way of looking at safety reserves is through the loss function. So another thing we've already heard a little bit about today is algorithmic fairness. In terms of disparate impact and things of that sort, it really is a type of safety reserve where you want a particular ratio of conditional probabilities or conditional risks to be close. Then, finally, in terms of procedural safeguards, there is a lot of bad stuff that can happen if you define training sets and evaluation procedures in bad ways. So user experience design can actually be used to guide and warn you and increase safety. In addition, open source software and open data, which I think is the more important thing now, actually allows for the possibility of public audit and the identification of safety hazards through public examination. In terms of application domains, again Professor Wallach kind of was asking this question: Can we say which ones we need to be more worried about and which ones less? I would categorize these as decision sciences, being things that are more of a high-stakes sort of thing—medical diagnosis, prison sentencing, loan approval, and things of that sort. Data products being the other category—this would be like a video streaming company deciding on the compression level of video packets based on machine learning, or a web portal deciding which news story to show you, or classifications within transcription systems and so forth. What's common about the decision sciences applications is that they do have high human costs of errors; they also do have a lot of uncertainty of the training set being representative; and also few predictions being made. That's why they are safety issues and we do need to consider strategies for achieving safety beyond risk minimization. In the data products category, usually the costs are only a matter of quality of service rather than some more safety-relevant things. They often actually also have large training sets, large test sets, and the ability to explore the feature space. That leaves very little epistemic uncertainty, and because of that they actually are not safety issues and the focus can squarely be on risk minimization. I guess we also heard earlier that if you are sleep-deprived, lots of testosterone, working on these data products things, then you should actually be in a good position to work on risk minimization squarely, and if you just focus on that you're good. But maybe if you're worried more about the decision sciences things, you should sleep more and have a more diverse team. The third category is cyber-physical systems—things like self-driving cars, robotic surgery, autonomous weapons, and so forth. One thing to note about those is that they have humongous state spaces. So the value alignment problem is the crux of it. Really focusing in on that is quite key. We started with a very basic definition of safety in terms of harm, risk, and uncertainty, and pointed out that the minimization of epistemic uncertainty is what's missing from standard modes or machine learning but have been fully predicated on risk minimization throughout the field. A research agenda for people like me, people who develop the actual math and the algorithms, should be to actually focus on the uncertainty minimization aspect and use that to inspire new and exciting technical problems for us. I gave some examples of strategies for increasing safety, but it wasn't really a comprehensive list in any sense, just to kind of give a starting point for that conversation. To end let me ask the same question I asked at the beginning. Please teach me. Are safety and ethics the same thing or am I missing the boat? Thanks.

You may also like

NOV 21, 2024 Article

A New International Order Is Emerging, We Must Bring Our Principles With Us

On the heels of a new international order, Carnegie Council will continue to champion the vision of peace and cooperation that remains our mission.

NOV 13, 2024 Article

An Ethical Grey Zone: AI Agents in Political Deliberations

As adoption of agentic AI increases, it is critical for researchers and policymakers to agree on ethical principles to inform governance of this emerging technology.

OCT 24, 2024 Article

Artificial Intelligence and Election Integrity in 2024

This final project from the first CEF cohort discusses the effects of AI on election integrity as billions of people go to the polls in 2024.

Not translated

This content has not yet been translated into your language. You can request a translation by clicking the button below.

Request Translation