In this "Artificial Intelligence & Equality" podcast, Senior Fellow Anja Kaspersen sits down with Caltech's Professor Anima Anandkumar, also director of machine learning research at NVIDIA, for a captivating conversation. They discuss the "Trinity of AI" (data, algorithms, and infrastructure), Anandkumar's work on tensor algorithms, and the state of AI research, including the critical importance of diversity in the field.
ANJA KASPERSEN: Anima, a huge welcome to you and thank you for taking the time to be with us today and share your insights. Your work to date has been very impressive and your personal journey an interesting one.
For those of our listeners who are not familiar with who you are and your work, allow me to first do a very quick introduction. Anima holds dual positions in academia and industry, and she is the Bren Professor at the California Institute of Technology, where she is also co-leading the AI4science initiative. Additionally, she is a director of machine learning (ML) research at NVIDIA, a multinational technology company producing graphics processing units (GPUs) and leading efforts to develop next-generation artificial intelligence algorithms. I cannot wait to learn more about your pioneering work as I am sure all of our listeners are, too.
One thing I really appreciate about you was, when we met some months back at a joint event at the European Organization for Nuclear Research (CERN), how eloquently you were able to bring rather abstract concepts and discussions on AI research and AI more broadly, talking about its applications, which are often not accessible to people outside of the research community, down to very practical considerations. I think this speaks volumes about your dedication to the field and also your willingness and eagerness to make sure that people are empowered to engage in this discussion with the knowledge required.
I thought it might be interesting to ask you to cast your mind back to growing up in India and later moving to the United States. How did it all start? I have heard you speak about the importance of curiosity and how your family played a very key part in encouraging you to be inquisitive about nature, about being human, mathematics, and even the quantum bits of our existence.
ANIMA ANANDKUMAR: Thank you, Anja. This is such a pleasure, and it is so great to reconnect again after our epic meeting at CERN, which to me is like a pilgrimage we take to celebrate the advances in the sciences.
Yes, growing up back in India I was really lucky to have a family that valued science and technology. Both my parents are engineers. My grandfather was a math teacher. All this made it very natural for me to explore math and sciences as a child.
Especially, I think, some of the struggles my mother had to go through to get into engineering has always inspired me. She had to in fact go on a hunger strike for three days to make the point that she really wanted to do engineering because back then there was a lot of notion that if women got too educated, maybe it would be hard to find someone suitable to marry them and so on, so there were concerns in the traditional community that she grew up in. To hear her talk of her journey and her story of what got her so fascinated about engineering, a lot of that somewhat came naturally into my childhood experiences. I would say that shaped me a lot and I was lucky to have that.
It was also having my family encourage my curiosity and inquisitiveness. I think I loved climbing as a kid, so they made these contraptions where I could climb around, and we also had this mango tree in the backyard where I would be climbing all the time and spending time there. I think having that kind of place to explore nature and explore science made it a fun and pleasurable thing, rather than as only about homework, only like a chore that is not fun. I think that attitude was an important aspect.
ANJA KASPERSEN: And your journey into AI came later?
ANIMA ANANDKUMAR: Yes. I would say that growing up, generally I was interested in math and sciences, but I was also curious about my own mind, like how am I just able to grasp things. I would be able to be like, "Uh huh, I see this mathematical concept," but I wouldn't be able to fully explain how I got there. I would arrive at the answer, but it would take me much longer to explain how I arrived there. Indeed, sometimes I was wrong, but many times I was also correct. I thought, How did I get these intuitions? I was curious about my own mind and how it worked.
I clearly remember this childhood incident. I was maybe three or four, somewhere around that age, where I was playing, and all of a sudden I had this realization that came to me that my identity almost forming. Until that moment, I had not questioned: What is me? Suddenly, it's like there is this concept called "me" and somehow I had this notion.
Also I remember telling myself, "Remember this moment, and don't forget this because this seems to be a powerful moment where I am suddenly wondering: What do I like? Who am I? That is just etched in my memory.
To me, now that I study AI and think about agents that maybe one day will become conscious—although the definition of what consciousness is up for debate—what would it be like to suddenly be having that awareness of self and the relationship of self to the environment?
ANJA KASPERSEN: Very interesting. We are going to revisit the issue of consciousness very soon.
First, I wanted to ask you, because you have been straddling for some time now both industry and academic research and also interests embedded in both fields daily but also many different fields of AI: Can you tell us more about working at the interface between industry and academia? I think you have said a few times yourself that one of your core purposes in managing these two roles is also trying to bridge these two domains in a better way than is currently happening.
ANIMA ANANDKUMAR: Certainly, Anja. I think the most fun about being in AI is the boundaries and the barriers no longer matter. You cannot be successful in achieving great AI by putting it into silos. Back in my high school days, it was hard for me to pick which specific area of science or math, because I loved all of it, but now, in a way, you do need concepts from all different areas. Neuroscience is so important, like understanding our own brain, and also mammalian brains or even insect brains, how they function, and what are the key mechanisms or high-level abstractions, and how do we build those capabilities into AI? That is of strong interest to me.
But the other aspect is also computation. With the skills that we are looking at today, this can no longer be in the theoretical realm. Unfortunately, this does require very heavy computation, and so much of this behavior only emerges at scale. We wouldn't be able to do a small test in some cases and say, "We already understand the behavior, and this is how we would expect as it scales." That is where there is a strong need for very efficient hardware for AI, and where being at NVIDIA has enabled me to explore that side and also contribute to designing better hardware to use for all kinds of beneficial applications that incorporate AI. These aspects have to come together in academia.
At Caltech there are such strong foundations in the sciences. That is where I founded the AI4science initiative to build AI into all these different scientific domains. We have to work closely with the domain experts to understand how to integrate AI into also existing work flows, or maybe disrupt and build new ones, but we have to understand the domain quite well to do that and closely work with domain experts.
Then, on the industry side, having NVIDIA as a partner and having large-scale computation as a means to achieve these goals is really important, too. These are all pieces of the puzzle, and we need to bring that together to make it successful for AI to have a strong impact.
ANJA KASPERSEN: Often the debate on AI focuses on the software component, but you also speak about the hardware component of AI.
ANIMA ANANDKUMAR: In fact, when you say "software," there are different levels of the stack. You can write your program in Python, or, if you really worry about getting very efficient hardware implementation, you may go even to a lower level. There are different levels of abstractions that are incorporated into our programming frameworks, and then ultimately it has to run on some hardware. The question is how to make this very efficient to load on so that we are energy-efficient and are able to also meet sustainability goals as we deploy these AI frameworks.
At the same time, we need it to be easy to program. In fact, there are all these notions of "code-free" environments, where people without a programming background could easily specify what kinds of AI capabilities would they like—how they can easily bring in the data, often first, and specify the tasks that they would like AI to do. So there are these ends of the spectrum, low-level languages, like the Coda framework, to specify efficient computations on GPUs.
On the other end, we have code-free frameworks where you could even visually or through some easier means of interface specify what you would like AI to do, and then, the question is how to build that, how to build layers of abstractions to connect these extreme ends. The ideal case is that we have both very efficient implementation as well as an intuitive, natural-to-use interface that ideally would not even require programming and people in different domains across the world can easily use AI in their work flows.
ANJA KASPERSEN: I have heard you speak about your research career, Anima, and how some of your groundbreaking research goes to a very practical question: How to aggregate intelligence supervised and unsupervised? I was hoping you could contextualize this statement, and maybe add a little bit of AI history to it as well for our listeners.
ANIMA ANANDKUMAR: Certainly, Anja. To me, intelligence is the ability to learn from experiences and adapt. Both these aspects are important. You need to be able to not just experience the world but learn from it, then, based on that, can you adapt and change behavior? We see that with almost every living thing. In fact, we are seeing that with the coronavirus too, so there is intelligence there.
But if you see popular cases, like, say, some of the Boston Dynamics robots, they do impressive moves; to many people that seems [like] intelligence because it is able to do back flips, it is able to do all of these impressive moves. But to me that is not intelligence because it is all completely coded by hand beforehand by some very smart people, but it's not adapting, it is preplanned and doing that maneuver. That is, I think, one thing for the public to keep in mind: what is intelligent versus what is not?
Also, if you look at Deep Blue, the chess-playing program that was constructed many years ago, that does not involve intelligence because it is like brute force, searching through everything and coming up with a move. But we humans don't do it that way. We don't look at all possible chess moves. We have some intuitions and are going by making those. I think that is something for the public to keep in mind. There may be lots of very impressive outcomes in some cases, but that doesn't mean it is intelligent.
And, vice versa, there could be some very mundane things that require a huge amount of intelligence. In fact, at NVIDIA the robotics group in Seattle has a kitchen environment, and robots should be able ultimately to autonomously cook something or do other kitchen tasks. For many of us, it's like, "Oh, this is so trivial, we do it without thinking." But just the aspects of being able to grasp different kinds of instruments and understand what to grasp at which point and how to relate that to a high-level goal—if I want to make an omelet, what are the subtasks and how do I break it down. It is still extremely hard for our AI agents to understand how to break down tasks—that's the notion of compositionality and how to easily generalize to new tasks. If you have been cooking many different dishes and you suddenly come across a new one or you want to get introduced to a new cuisine, sure there is some learning, but you would learn faster than somebody who doesn't have that ability or background. But that aspect is still not completely true. We are exploring these, and they have been getting better with AI.
I wanted to give that high-level overview of what it means for intelligence and how we assess intelligence. We shouldn't just go by our intuitions because our bias is toward what we find, say, in a fellow human being as impressive. Normal people are not that great at chess, so if a fellow human being is amazing, then we give them a superhuman ability. On the other hand, it can be very easy—easy in the sense that if you program it well, give it enough memory—for a computer to do well. So it's not a level playing field. That's why we need to carefully assess: Is this impressive as an ability of intelligence or is this impressive because of the computation, memory, and other capabilities that computers have?
ANJA KASPERSEN: It brings me to this term that I have heard you use quite a bit—and you have been giving talks about this as well—the "Trinity of AI." Can you explain to our listeners what it means and why it is important?
ANIMA ANANDKUMAR: Yes. I mentioned that intelligence is the ability to learn from experiences and adapt.
- For the learning aspect the most important thing is data: How do we have enough data and maybe domain knowledge or other kind of prior information to first start learning?
- Then, to do this learning we of course need computation, like I mentioned, in some cases maybe large-scale computation.
- Of course, then what are the algorithms to enable this learning?
So the trinity is data, computation, and algorithms. We need to think of all these three facets.
Depending on the background from which people are coming, you may think a lot about algorithms if you are an AI researcher. The goal is to come up with new algorithms, but in so many practical applications the bottleneck would be data. We see that in so many domains, like in health care there is not enough data or there are lots of constraints to getting the data that we need, or in cases like sciences there may not be a ground truth. If you talk about imaging a black hole, we don't have a ground truth of what a black hole looks like to give as an example. Or like understanding earthquakes and their seismic activity deep underground, we don't have the measurements, so we can only come up with that indirectly. We don't have those direct observations. So data tends to be a big constraint in so many practical and real-world applications.
The computation people mostly don't think about it: "Okay, I have quite a cluster, and I am going to run this." But it matters. It matters in both extremes. In one extreme, even the enormous clusters are not good enough if you are thinking about, say, understanding a kitchen environment and all possible tasks and being able to do these maneuvers but also to generalize to new ones. You may want a big model to learn these skills, and that requires large-scale computation.
On the other end of the spectrum is tiny ML or human-centered AI or the Internet of Things—all these different words we use for small devices with only limited capabilities, limited memory, and limited batteries. In those cases we cannot afford to have a large model, so then we have to be very mindful of what is the computational budget we have and how do we design the right algorithms. So we can't think of them in isolation.
In so many applications we have to jointly take into consideration all the three facets, the data and the computation and design the AI algorithms based on these constraints and availability.
ANJA KASPERSEN: I know you spearheaded development of tensor algorithms, which, if I understand correctly, are central to effectively processing multidimensional and multimodal data and for achieving massive parallelism in large-scale AI applications. This may seem awfully technical, especially to a layperson like myself, and I am sure some of our listeners as well, but I was wondering if you could explain a little more in detail what these are and the importance that they hold for future applications of AI.
ANIMA ANANDKUMAR: I will allude to a book from the 19th century by Edwin A. Abbott, called Flatland: A Romance of Many Dimensions, to give a notion of why more dimensions matters. The story is about a fictional two-dimensional world where people are represented as different geometric shapes. They could be polygons, they could be circles. It is also, I think, a dialogue on the class system in the Victorian age. That was one of the notions with which the book was written.
However, the aspect that I wanted to highlight here is that in this two-dimensional world there is now a three-dimensional alien that visits, and as it is going through this two-dimensional world it is shape shifting rapidly, and that is scary for everyone in this two-dimensional world. I think that is why for a lot of us thinking of more dimensions is weird and not intuitive because we try hard to visualize everything, but we can't.
But data lives in a lot of dimensions. We are not just limited to what we can visualize. It is again the same notion that if you are only in a two-dimensional world, this three-dimensional object could be a nice geometric shape, but you cannot fully understand it because it is rapidly changing and there is only limited information at each point of time as it is changing.
It is again the aspect of the elephant and the blind men. If the blind men are only touching one part of the elephant, they can't see the whole picture. That to me is the notion that if you can't visualize, or instead of visualizing in your mind if you can lift into many dimensions to represent your data and process the data, it can be much more effective because the true nature of the underlying phenomenon emerges because it lives in more dimensions than we can visualize.
Hopefully, that was not too abstract. I think philosophically it makes sense, that there is so much that lies beyond what we can see through our own eyes.
I think mathematically what is interesting about this is that this naturally extends our current operation of neural networks. In neural networks each layer is represented with a matrix, which consists of rows and columns, so you can think of that as a two-dimensional object.
But the reason why this happened is historical. We have lots of linear algebra—matrix multiplication is the most commonly implemented routine, and it can be highly parallelized—so we have that more as a historical reason that we design all our neural networks based on this matrix foundation.
But we can easily extend to more. We can make these layers do multidimensional operations. So a tensor—think of a third order as a cube— has not just rows and columns, but a third dimension. We could think of multiplying across all these dimensions, instead of just multiplying two matrices as two-dimensional objects.
To me, what this opens up is a much richer set of primitive functions on what each layer can do. When we can give them much richer processing, the idea is that they are more capable of understanding the underlying data sets better and seeing what phenomena emerge. We have seen better capabilities using multidimensional processing in our neural networks.
ANJA KASPERSEN: So, understanding the nuances and the context and the symbolics?
ANIMA ANANDKUMAR: Essentially, like giving more flexibility for neural networks to learn multidimensional data sets.
To give you a more intuitive example, before deep learning got very popular, I was working on probabilistic models where the goal was: Can we extract topics from documents? So you have millions of documents without any labels, can you tell me what are the topics in each document? In this case, the challenge is that each document can have multiple topics instead of just one, so you can't just cluster it out and say each document is exclusively about one topic.
In these cases what we gave was an intuitive algorithm using tensors that looks at co-occurrent relationships and data. To give you an example, if a document had the word "apple" a lot, if that's all I told you about the document, you could be like, "Okay, it's probably about fruit." But Apple is also a company. So a word can be used in many contexts. Just saying a few words or a particular word occurs is not enough.
But if I told you that there is co-occurrence happening, the words "apple" and "orange" go together, you are more certain that it is a fruit. And if I further told you it is "apple," "orange," "banana," and "pomelo," then you are even more certain. So this co-occurrence of multiple words occurring together is giving you more information about what the document is talking about, and that can be represented as a tensor because now for a pair-wise relationship you are saying "count all the documents where different pairs of words occur together."
Now if I want to count all triplets and say how many times they occur in different documents, the object that represents it is a tensor, so the way I process it is by using tensor algebraic methods. But of course we don't want to write out that big tensor because if I say "all triplets," that is too large in memory.
I designed efficient algorithms that mathematically process these co-occurrence relationships but without being memory-intensive, so making that efficient and also giving theoretical guarantees as well as practical use cases where this is very effective.
ANJA KASPERSEN: So allowing you to infer certain causalities and not just correlating the data you have?
ANIMA ANANDKUMAR: We have further works where we can also talk about causality using these methods.
The first work where I talked about co-occurrence is still correlation-based, but it is unsupervised learning. It is saying, "I don't know absolutely anything about what the topics are because there are no labels, but I am trying to posit that these topics are related to co-occurrence of words." By that hypothesis I am uncovering the topics.
ANJA KASPERSEN: That's a very good and detailed description of your work on tensor algorithms.
I am curious. In this kind of multidimensional world where you are operating, what are you working on these days and what excites you particularly about your research right now, be that in academia or in industry?
ANIMA ANANDKUMAR: It's like asking "Who is your favorite child?"
ANJA KASPERSEN: Everything and everyone.
ANIMA ANANDKUMAR: I think what I am so thankful for right now about this stage of my career is my ability to work with groups of scientists and engineers from many different backgrounds and many different interests. At Caltech having the AI4science initiative has allowed me to work in this area of designing machine-learning methods for learning scientific simulations or any complex scientific phenomenon—like, say, turbulent fluid flows, or climate and weather models, or how materials tear—and all these right now require large-scale supercomputers. If I want to tell you what the weather is going to be in the next few days, that is a very large-scale numerical calculation.
But the question is: Can machine learning now augment, or even replace, these methods and get orders of magnitude speed-ups? In so many scenarios this is the only way we can make progress on new discoveries or come up with better predictions.
As an example, climate change is something that all of us are highly concerned about. There is so much "now or never," which is very important to guide our politicians and others to make better policies. The question is: How can we come up with models of climate for the next decades but also have the right uncertainties? If I can lower the uncertainties and give the right uncertainties, that will really help us inform policymaking.
The other important thing is also the resolution. If I say, "Globally it is going to be 2°C or 1.5°C," it doesn't help in terms of what happens in the Middle East or what happens in Southern India, where I grew up. You really want as much as possible fine-scale models that can tell what's going to happen as we go through multiple decades.
I think a back-of-the-envelope calculation shows with the current models, if I try to get that down to 1-km resolution for the next ten years, being able to do that requires 1011 more compute than what we have today. Even all of our powerful supercomputers put together is not enough for what we are asking.
Similarly, in terms of understanding molecules to discover new drugs, if I want to go all the way to the quantum level and understand its properties very precisely, we know the Schrödinger equation is the fundamental equation that counts all molecules, but the challenge is that if I have to do that in a brute-force way precisely without using any approximation methods, that would take longer than the age of the universe on the current supercomputers for just a 100-atom molecule.
That is, I think, the challenge in so many of these domains. We have physical laws, we have some understanding of the domain, but the computations required are so enormous that we can't do them exactly. We can't do it brute force. So the question is: Can machine learning learn good approximations, or can they learn it for one domain of molecules rather than all of chemistry? In terms of the climate and weather models, can we incorporate physical constraints, like the conservation of mass and so on, but be able to also use the historical data where there wasn't climate change, and how do we then adapt it to the case now where there are higher emissions than the historical data?
There are all these challenges in terms of computation and in terms of generalizing into new scenarios and extrapolating beyond what we have seen, and those are very fascinating.
What we have seen is machine learning gives us very promising results. Here I should say that this is not standard machine learning in the sense of taking a standard package and running it, but better-designed algorithms that take into account this domain knowledge and constraints give us a lot of promise in being able to speed up from a thousand times, even up to hundreds of thousands of times, over the traditional numerical methods or have the ability to generalize from being trained on small molecules directly to large molecules. Machine learning has shown promise when we work closely with domain experts and build it in the right way. That is where I feel there is so much potential, that we have barely gotten started there.
ANJA KASPERSEN: Which goes back to your point about the trinity and how important it is to get the trinity right if we are to advance on that research.
Which brings me to—and this is one of the points that you made in the introduction of this podcast—consciousness. I am wondering, in your view can consciousness emerge out of the trinity, or do you view some limitations on the extent to which the trinity of AI can give birth to AI?
ANIMA ANANDKUMAR: That's a really tough question. That is of course what everyone thinks about what is going to happen as we start scaling up these models. The question is: Are our current algorithms enough or do we need some new paradigms?
Our current hardware, to me at least, is not enough. We need a lot more computations, not just for emergence of consciousness but also all of these scientific domains that I talked about. That is becoming more and more of a bottleneck as we are finishing the "low-hanging fruits," so to say, in the use of machine learning. So the harder problems remain.
In my mind there are lots of different definitions of consciousness, but one that mathematically is well-posed is awareness of all of the uncertainties, like limitations, knowing what you don't know. If you are thinking of that kind of more concrete definition rather than a self-actualization, which is really hard, I think that is nearer to it. So we want AI agents that understand whether something is safe or not, in safety-critical applications, how safe is this to explore?
For instance, we have looked at drones. Can they automatically decide whether it is safe enough to land? Can they keep increasing speed but make sure they are safe enough? We are seeing many cases where this awareness of the uncertainty of the environment around it is emerging.
I think the harder one is the awareness of their own limitations, knowing what you don't know. Standard neural networks are not good at standard supervised learning because what they are encouraged to do is to become overconfident. It is part of the learning objective that they become overconfident, and we have seen that with a lot of examples especially related to fairness issues. If they see the data that is underrepresented at test time, there could also be wrong mistakes made with high overconfidence. You see these terrible examples of black female celebrities being misgendered but with high confidence, or face recognition going wrong, like members of Congress being classified as criminals but with high confidence. All these are examples where there isn't the awareness of their own limitations.
Has this trained model seen enough training data of a certain nature, but does it have that understanding? At test time it still makes these mistakes but with high confidence.
We have developed methods to try to limit those and have also good uncertainties, so if it is encountering entirely new situations, then it is also much more uncertain. Having it be correct on new scenarios is almost impossible, you have to give it some other domain knowledge or other ways that it can be correct. But an easier thing is to ask: Can you at least be humble and say, "I don't know," or have lower confidence here? We are seeing that emerge.
I think the foundation of consciousness is the awareness of self and awareness of surrounding. To begin with, the uncertainty or the limitations of what the trained agent can do, and also understanding the risks in the environment is the beginning, the underpinnings, of consciousness. That we are beginning to see, but in fairly limited environments, in fairly synthetic environments, or limited real-world scenarios.
ANJA KASPERSEN: You alluded to this, Anima, but I am wondering: It is said that it takes a lifetime to learn how to operate in these liminal spaces as a human. Do we have the right scientific models to code and train an algorithm to operate in liminal spaces?
ANIMA ANANDKUMAR: It depends. I think that is where it is hard to judge what is difficult for AI versus humans. If the limitation is one of memory or computation, there are a lot of gains we can make, and we have seen that happen.
Even in computer vision, if you look at the history, there were all these un-engineered features people caught. If you did this kind of scaling variant features that would be good, or four-year features that would be good, so people had different intuitions of what would be good to recognize objects and images, but letting neural networks on their own figure out and learn features was very powerful.
We have also seen that in some of the scientific applications we are currently working on humans have made all kinds of approximations. For this quantum chemistry I mentioned, the basic Schrödinger equation is so enormously intensive to compute that people have made different approximations. There is now a lot of evidence emerging that machine learning could come up with better approximations, but you have to build in the right domain knowledge and constraints, you have to specify the correct way.
I do think in that sense AI is promising, but, as always, the devil is in the details in how we harness that power of AI.
ANJA KASPERSEN: I heard you quote the late Richard Feynman, the [American] mathematician, who posited, "What I cannot create, I do not understand." To draw a parallel to current realities, the COVID-19 pandemic has arguably accelerated how we adopt and embed AI systems and algorithmic technologies and technologies where the gap between those who create and understanding is widening, causing very significant tradeoffs in my view and also gray areas of ethics.
I know you are deeply concerned with this issue around ethics and AI and you have been a very strong advocate for addressing these implicit tensions tied with the choices we make about how we embed AI in our day-to-day lives and also call for better and more innovative ways of mitigating any potential harms. Do you worry? What are your serious concerns about AI?
ANIMA ANANDKUMAR: To me, it is very important, as with any technology. We have to have the right safeguards. I know it's a cop-out by saying, "Yes, this applies to all technologies," but I think it's even more important with AI because of this exponential power, in the sense of what humans were doing probably at a small scale. In terms of the biased decisions they were making, we are automating that at planet scale, and we are doing that very quickly, and so many of the changes we are bringing about are potentially irreversible. I think that is the aspect that we need to be mindful of.
What happened historically in terms of technology was there was the hype, there was all the promise of new technology, and then people also see the downsides and many scientists become extremely worried about the destructive nature.
I see a similar trend also happening with AI, but now I think the difference is that this touches human lives across the world in very intimate ways because of the data collection that we are seeing at enormous scale and the ways in which we are potentially changing behavior and influencing on such a large scale and at such a fast speed.
I think that is where it is very important to think about having trustworthy AI and having safeguards and regulations in place in terms of all aspects—how does this AI perform on different skin tones or different genders or people who identify as non-binary? It is all these aspects. It could be, say, for 99 percent of the population it's okay, but for the 1 percent it's terrible. Do we want such an AI?
For me answering such questions is hard. We need to make tradeoffs, but the first step is transparency. If we test this out on enough data sets and release that before launching it at scale, then we can make the decisions, but right now that is not how we are proceeding. That is, I think, a place for regulation and for nonprofits and agencies that think about the impact of technology on the public to have a say.
ANJA KASPERSEN: Two questions that come out of that once it is embedded. It begs the question: Can it be discontinued? Can you safely interrupt an AI agent once it is embedded into a system, and will this agent also be able to interoperate with other systems that may have been created and trained in a different type of environment? What are your views on that?
ANIMA ANANDKUMAR: I think a lot of awareness has been raised, and a lot of people have worked very hard to do that, sometimes even going against Big Tech. But I think now there is awareness across all different areas that we need to put a mechanism in place for transparency: Can we enable good disclosure, and what should that disclosure be?
We already have licenses for code. We have enough use cases of how they should be used and what are ethical uses in which this code can be deployed. But can we go further in terms of what types of data would this work well or where it doesn't work? I think this is the aspect that is always hard.
I think we were talking about it before the podcast. The incentive is to hype up and to say "this is always great," but no technology can be perfect. So I think disclosing the failure modes is the most important starting point.
ANJA KASPERSEN: I wanted to just touch on the issue of AI and power, and you alluded to it earlier. I remember in a previous talk you gave you spoke about the transformative power of AI to, as you alluded to with AI4science, to amplify other sciences, to help other sciences advance in quite unique ways. But there is also the story of AI as shifting power—Who holds the power? Who defines the rules of the game, if you may, on how AI is being advanced?
There is also this misnomer, which I know has been a concern of yours as well, where we speak about AI systems as apolitical, which obviously they are not. Once they are embedded into a system, you give it direction.
This brings us back to your issue around trinity and how does that trinity look, and each part of that trinity will have its own unique power dynamics and power mechanisms attached to it. In your view, keeping that in mind, how do we design AI ethically, addressing those inevitable tradeoffs to ensure that it benefits humanity at large?
ANIMA ANANDKUMAR: This is where there are several stakeholders to enable so that we can harness the beneficial uses of AI and minimize its harmful effects.
To begin with, I think what we need are more research resources for academia. For instance, in the United States we have the government pushing for the National AI Research Resource. Can we have enough computation to enable these large-scale models to be done openly or in academia? So much of that is locked up with big companies that researchers cannot carefully understand the potential harmful effects or what is the bias that is there or how do we mitigate it, how do we use this in downstream tasks knowing that these issues exist, and can we still separate them or sandbox them to make use of them in a beneficial way? I think that is one big bottleneck. There is a big gap between industry and academia with respect to resources. Helping bridge that is important.
The other important thing is the role of regulatory agencies. When it comes to health care versus transportation versus different areas, there cannot be "one size fits all" in terms of even what to disclose and how should we test these AI systems. It has to be domain-specific. For safety-critical domains like health care or transportation, there have to be even more stringent requirements of how good the AI is. The aspect of how we can build it in a way that is equitable and fair, how we can put this in place, and how we can have companies disclose this to the public, I think, is a good starting point.
ANJA KASPERSEN: I like what you said, if I heard you correctly, that collaboration could be a very important tool to actually implement some of those checks and balances for responsible science, which is taking a page out of the playbook from CERN, bringing together researchers from different domains and different countries and instilling checks and balances, a responsible science approach.
ANIMA ANANDKUMAR: Absolutely. This requires an interdisciplinary approach, and this is where the people from the humanities and social sciences have a lot to add.
For instance, I work with Mike Alvarez here at Caltech understanding conversations that happen in Twitter—How did the Me Too movement unfold? What were the counter-movements? We use machine learning tools to extract the insights, but then how to interpret that, and how does it inform about the social discourse that comes with collaborating with domain experts?
Similarly, when it comes to AI trust and ethics, what are the legal frameworks? What are the ethical frameworks? How do I map all the stakeholders? How do we understand historical biases?
All this I think is needed. It cannot be done in isolation.
ANJA KASPERSEN: In 2018, Anima, you won The New York Times Good Tech Award for your long-term efforts to promote diversity in the tech field at large with a specific focus on making the AI research community more inclusive. Can you speak more about this?
Also, I am wondering, given your courage in speaking truth to power about the limitations or the importance of investing in the trinity of AI and also about diversity inclusion, have you experienced pushback for some of your work and for speaking up?
ANIMA ANANDKUMAR: Thanks, Anja. I think what is important to keep in mind is that diversity and inclusion is about seeing people from different backgrounds and areas come together and giving everyone a platform to speak and to contribute.
What we see in so many AI areas is that there is a huge underrepresentation of many communities, of women, and that hopefully will change with these efforts. But we also want to make sure this is not a zero-sum game, in the sense that this is not something against the majority. A misconception that I think a lot of people hold is that what we want is to give opportunities to people across different areas but to do so in an inclusive and a healthy way. Creating a healthy environment will enable everyone to thrive, including the majority.
I think the core issue here is: What are some ways that are limiting minorities from pursuing AI? What is responsible for the leaky pipeline of women dropping out at different levels of their careers?
These kinds of underlying toxic issues also affect others, so it's really about having a healthy environment that enables people to thrive and how people with different capabilities or different backgrounds can contribute in creative ways to building AI, because I think that is the aspect. AI cannot be solved with just one set of tools or one-dimensional thinking. Bringing that community together is so important to see AI being successful.
I think a lot of the pushback that I receive, especially online, speaks more to the structure of social media in terms of amplifying the marginal views, the extremist views, rather than the moderate views. The majority of people, when I talk to them in person, are highly supportive, and they are aware of what are some of the issues that we face in conferences in terms of harassment or other issues. Most people want that to change, but I think the nature of social media does not encourage such good discourse, and that creates trolling and all these issues, and also misinformation. People misunderstanding what this is about, feeling insecure, and thinking it is something against them gets people into this mindset.
I see this as a broader issue of how social discourse through social media gets distorted, and I think we need to find other venues to do that. In a lot of the efforts on diversity and inclusion what I have found most effective is talking to people in person or virtually, but person-to-person, and thinking of concrete scenarios to improve that, such as the wonderful organizations like AI4All doing that at the high school level, or the WAVE program at Caltech bringing members from underrepresented communities in for summer research. To me focusing on concrete efforts rather than social media will give us a lot of concrete gains.
ANJA KASPERSEN: Do you feel that the very polarized social media or online discussions on this are preempting an honest scientific discourse on these issues because people are afraid to speak up?
ANIMA ANANDKUMAR: Yes, that's right. It also gets misconstrued on both sides. I think that leaves a lot of people afraid because they may worry that anything they say may make them targets. I think we need a better venue to discuss that.
For instance, I participate in admissions of students and faculty at Caltech, and the notion that we would have a different bar based on which background people come from is definitely misinformed. Instead, what we want to do is evaluate people as a whole, and hopefully we want to find people who positively impact the community.
One of the recent discussions that happened on Twitter over the last month was somebody alleging that—I won't repeat it because it is inflammatory—so many people in academia are there but shouldn't be there because they are not qualified and came through these programs, which is highly misinformed. I think those kinds of extreme personalities talking about it is more about building a mob than about healthy discussions.
But I think we need much safer environments for people if they have questions, because a lot of others may be having questions—is this really happening? How do we address that in an environment that makes it possible for those who are in good faith wanting to have these arguments to do that? There is always a faction of the community that is so polarized and so extreme and doing it in bad faith that we don't want to engage with. We have to, I think, have that differentiation and then that can lead to productive discussions.
ANJA KASPERSEN: We need greater anthropological and scientific intelligence not just in the development and use of AI but also how we discuss it and encourage diversity of use.
ANIMA ANANDKUMAR: And so much of the design of social media is one-dimensional, and that clearly shows the lack of diversity in the people who designed it.
ANJA KASPERSEN: They need more tensors.
ANIMA ANANDKUMAR: Yes, tensors of people. I will use that. That's good.
ANJA KASPERSEN: Indeed, tensors of people to discuss AI in the way it deserves to be discussed and ought to be discussed.
Let's shift to a different topic, Anima, and one that I know is near and dear to your heart. You state in your bio that you grew up in Mysore, and for any yoga practitioners or those interested in yoga listening to this podcast this will be a very familiar name as it is known to be one of the hubs of yoga in India. I practiced there myself many moons ago and definitely have developed a great fondness for this place.
I know that you yourself are a very dedicated yoga practitioner and are also active in many types of sports, in climbing, and hiking. I know this is something that is important to you, not just yoga for the physical discipline, the asanas, but also yoga more importantly for its philosophical orientation.
One of the fascinating aspects of your work and to your interests—and I have seen this feature in some of your work and in your public speeches—is your interest in the Dvaita or Vedanta philosophy. For those of our listeners who are not aware of what this is, these are texts that are more than 2,000 years old. I am curious: What is it about the Dvaita philosophy and its tradition that interests you, and do you see a connection to this when you talk about the ethical issues of AI, if any?
ANIMA ANANDKUMAR: Yes, those are fascinating. I recall how we talked about Mysore. I have such great memories. It is such a wonderful place to grow up, and for me to really discover yoga during the pandemic was such a bright spot when there are so many disturbing things happening around the world and all our worldviews are challenged and upended.
At that place of uncertainty I took the time to delve deeper because I had practiced on and off but never had enough time in our busy lives to do that. I realized how much time I had spent on flights, how much time I was sleep-deprived, jet-lagged, and going from that to where I use yoga to just calm my mind and be in a place where I can listen to my inner self, I felt that was missing for so long.
There was so much of me just functioning in the external world, but what about my own internal world, and how do I understand my emotional state and my stress levels? I think that aspect was important and how the physical and mental is so connected. We talk about embodiment, like how when I am stressed how my muscles are also sore, so how I work out better when I have the motivation.
I think that also informs me about embodied AI. We need to bring the mind and the body together, like the saying, "A healthy mind in a healthy body." I discovered that was missing in my own life because I was so focused on my career, and I was passionate about AI and about research, but I had forgotten the ability to take care of my own mental, spiritual, and physical health, all that integrated and put together.
That, I think, helped me do even better research and have better creativity and better ways to connect with everyone, and so that is what I hope others can also explore because this is such a wonderful mechanism to experience that.
Regarding Dvaita philosophy, I think that is also intimately connected with yoga and its practice because what it says is also to look inwards. The famous saying there is aham brahmaasmi, or "I am the god" essentially if you translate it directly. What it is really saying is that the god is within you, experience the power that is within you, you don't need to go out looking for that.
So if all of us take the time for—which a lot of the world around us prevents us from doing, the social media always being on or always being connected, all of this really prevents us—disconnecting and taking the time for internal introspection, and just seeing the beauty and the power we have within ourselves I think is wonderful. I don't know if AI will be able to do that one day, but if it can, that would also be an amazing feat.
ANJA KASPERSEN: Being at the vanguard of change requires a lot of courage, and we have heard this from other speakers on this podcast as well. This is not an easy path to choose, especially, as you alluded to yourself, that the pushback, warranted or unwarranted, can be quite soul-destroying.
What are the one or two insights that you can share with those listeners who are or will find themselves being at the vanguard of change?
ANIMA ANANDKUMAR: The one thing I have learned so much about is the positive aspects of humanity. There were stages where I was really sad or lost some hope in humanity, but ultimately it has come around. I think to have the patience to work through issues. I think, except for a tiny sliver of the population, most of them want for the common good. Most of us want to be part of a community. Most of us want the planet to thrive. I think identifying that and being able to connect the people who may look like they are from different backgrounds or who may look like they are on different sides of the political spectrum, seeing what connects us, what is common to us, and allowing that to be celebrated will bring us closer. That is what I have learned.
Yes, there was a lot of pushback online, but it also amplifies that. A few extreme views get a lot of traction, and then you think, Oh, my god, is everybody thinking this way? But then you talk to actual people and it's so different. Even people who misunderstood what they were supporting come around, and then it is different.
For instance, the name-change movement to change the Conference on Neural Information Processing Systems (NIPS), which was the original name of the most important or the most popular machine learning conference, to one that doesn't talk about a body part had so much pushback. For me it was really shocking because many women are feeling uncomfortable about it. I somehow made peace with it because I was in the community long enough, but so many angry men felt that it was not welcoming and that it related to also how they were treated at these events and how these events had become so toxic and not healthy.
But a lot of people online only see that as, "Oh, there is this mob of women trying to change the name of the conference and this is useless, this is a waste of time." But in person, when you tell them, "Look, this is the experience we had at these conferences. It is not just name change for nothing. It is symbolizing all the struggles we have and how this name becomes an avenue for harassment because people use it jokingly and in a way to further make an unhealthy environment," then they get it, that it is much more than just the name.
But it is really hard for people to get this context online. We even published a paper on it, but people don't have the attention span. They are quickly liking things. I feel that the whole interface has to change and, until that changes, we can't have healthy discourse.
ANJA KASPERSEN: How do we ensure that everybody in the community, meaning all of us, is able to get the best benefit from current AI research and future AI research, but also to be able to contribute in a meaningful way? How do we empower people to engage, for those who may not have the insight and the literacy to fully grapple with what AI symbolizes or means to them in their day-to-day lives? Where do we go from here? What is your final insight on that? What do you recommend people to do?
ANIMA ANANDKUMAR: I have a lot of hope about this aspect because we have seen so much AI research in the open. It has only grown to be even more open.
If you see traditionally what has happened with technological development, it is highly closed. Think of an extreme view like nuclear weapons that is really closed off and along with it particle physics and our understanding of the basic nature of atoms. A lot of the population cannot participate in that even if they are interested.
On the other hand, with AI so much of it is out in the open, there is excellent course material available, there is lots of code. The next question is: How do we go further from there? Can you do it in a code-free manner? Can you specify your knowledge in more intuitive ways rather than through program code? That is still in its infancy, but I have good hope that we will make further progress there.
ANJA KASPERSEN: But you remain a positivist and an optimist that we are going to get it right.
ANIMA ANANDKUMAR: Absolutely, absolutely. I think there is so much good to come out of this.
That is the beauty of life. We see all these concepts that connect and so much of scientific discovery is making those connections that seem unrelated. That always fascinates me.
ANJA KASPERSEN: Thank you so much, Anima, for sharing your time, stories from your growing up, what inspired you, your deep expertise, and honest reflections about the state of AI research. This has been a thought-provoking and marvelous conversation.
Thank you to all of our listeners for tuning in, and a special thanks to the team at the Carnegie Council for hosting and producing this podcast.
For the latest content on ethics in international affairs, be sure to follow us on social media @carnegiecouncil.
My name is Anja Kaspersen, and I hope we earned the privilege of your time. Thank you.