Why do we need friendly Artificial Intelligence? – A Conversation with Eliezer Yudkowsky

Eliezer Yudkowsky is cofounder of MIRI (Machine Intelligence Research Institute) and an advocate for friendly artificial intelligence. From a past participant of our boot camp who follows Eliezer’s work we came to know that he was going to be spending a few weeks over the summer here in Reñaca, Chile, where Exosphere HQ is. We had the chance to spend some time with him, comparing notes about our views on the education system, the startup culture, the future of technology and in particular on artificial intelligence. This is the interesting conversation that ensued…

Hi Eliezer, thank you for accepting this interview. First, I would like to ask you how did you get where you are in your life? What’s the path that lead you to co-found an Institute that focuses on research about artificial intelligence?

I’ve been doing this more or less my entire adult career; when I was sixteen years old I read a book, “True Names and Other Dangers”, which mentioned the idea that at the point where your model of the future predicts that technology has created smarter-than-human intelligence, your crystal ball explodes and your model can’t predict the future past that point. This would apply to Artificial Intelligence, neurologically enhanced humans, or whatever. Having entities around that are smarter than you are is a kind of difference in the future that’s fundamentally different from faster cars or space colonies. This struck me as obviously correct, so I decided to spend the rest of my life working on it.

The next major juncture in my life was in 2001, where I noticed the existence of what we would today call the ‘value alignment problem’. The space of possible Artificial Intelligence designs is very huge, and different AI designs will result in AIs that want different things. This is extremely important, because if you have an extremely powerful mind around, whatever it wants is probably what will happen.

What’s your definition of Artificial Intelligence?

Well… in intuitive terms, I’d say that what we’re talking about is ‘strong AI’ or ‘Artificial General Intelligence’, a computer program that’s smart in at least the same ways as humans. In particular, it can look at the world, figure out how it works, make long-term plans about how to affect the world, invent new devices and technologies, and so on.

Whatever it is about human brains that enabled us to start from forests and fur skins, and get to spaceships and nuclear power plants, Artificial General Intelligence has that too. An ‘advanced agent’ or ‘smarter-than-human AI’ has *more* of that, and can make better technologies faster, or more cunning plans.

Your attention is focused toward the value alignment problem. Why do you think it is so important? And what do you mean by friendly artificial intelligence?

All humans have the same basic brain design – cerebral cortex, thalamus, cerebellum, and so on. For all our diversity, we’re basically all cars painted different colors but containing the same engine with minor quirks. Now think of an enormously large space of possible Artificial Intelligence designs. Any two AI designs might be less similar to each other than you are to a mouse (which also has a cerebral cortex, cerebellum, etcetera.)

In the eighteenth century, a man named David Hume first theorized that, in principle, you could take intelligence and hook it up to any particular goal. Think of a chess-playing program. The current programs are designed to win chess games; they try to steer the future of the chess board into the set of games defined as a winning position for whatever side they’re currently playing. But you could also make a chess-playing program that would try to make sure the game drew, or stalemated. So this enormously large space of possible minds comes populated with minds that want all different sorts of things. You could design a mind that wants paperclips.


And when you try to appeal to any argument that, say, people being happy is more *interesting* than paperclips, a mind only responds to that argument if it’s the sort of mind that steers the future toward interesting things. There’s no mathematical law prohibiting the existence of a mind that doesn’t care about the quality you call ‘interesting’. In fact it’s simpler just to care about paperclips.

So if the different possible AIs we could build want different things, we’re likely to end up in a very different future depending on what sort of AI will actually be build. The value alignment problem is building a *smarter-than-human* AI such that running it produces an outcome that we will gloss over as ‘good’, beneficial, normative, nice, etc.

This is technically hard for a number of reasons. The most obvious reason is that we might have to get it right on the first try. If you build something smarter than you, and you make a mistake in designing its goals, then it will fight you if you try to edit its goals. If it wants paperclips, it won’t *want* to want something else, because if you edit its goals then that leads to a future with fewer paperclips, and it wants paperclips.

The prospect of needing to get anything at all in Artificial Intelligence right on the first try is what makes this scary. It takes huge efforts to get space probe designs right on the first try (since you can’t test them completely on the ground in advance) and sometimes they fail. Building the first smarter-than-human AI with the right design properties is going to be harder than that. So ending up in a good outcome from Artificial Intelligence is not something we can take for granted. And this is *literally* the most important problem. It determines what happens to all the galaxies we can see in the night sky. The Earth itself is just a tiny blue speck, the five hundred years since the invention of Science a mere eyeblink of cosmological time, compared to all the stars and all the time that’s at stake in the future of intelligent life. Just like right now the steering of that future lies in the hands of humans because we’re the smartest entities around, if you build an AI that only wants paperclips and *it* is the smartest thing around, all those stars get turned into paperclips, which seems like a waste of a universe. And the universe is very large, so what happens with AI is *literally* the most important thing you can worry about by total volume of space and time.

Why do you think an active work toward friendly intelligence is needed? Is there anyone actively trying to build unfriendly intelligence? Or are there maybe people working on AI who are not giving enough attention to the consequences of their designs?

You can’t take for granted that good people build good AIs and bad people build bad AIs. I expect you’re probably a good person, but suppose I ask you how to build any AI at all. You can’t answer because you’re not an AI designer, and modern AI scientists can’t answer either because they don’t know *enough* to build an AI. Building AI is *hard*, so you can’t take for granted that someone can build an AI just because they want to build an AI.  The same is true about building a *good* AI. If, as it currently looks, building a good AI is a technically hard problem on top of building an AI, then the key question is not “Will AI be built by good people?” but “Will anyone have the power to build a good AI even given that this is what they want to do?”

There are some people trying to build AI who say they care about the consequences… but the universe doesn’t hand out a free gold medal for saying you care, and if you ask them, “Okay, how would you solve the outstanding problems in value alignment that have already been identified?” they won’t have anything. Saying you care is easy. And even actually caring doesn’t mean you can solve the problem.

The actual state of work on value alignment is somewhere around the state of understanding of computer chess in the nineteenth century. By which I mean that the very first paper on computer chess, by Claude Shannon in 1950, gave an algorithm for perfectly solving chess using infinite computing power. Whereas in the 19th century, even some very smart people didn’t realize that it was theoretically possible to solve computer chess using mere ‘mechanical’ reasoning, and people like Edgar Allen Poe wrote intelligent essays arguing that Mr. Babbage’s proposed Analytical Engine would never be able to do something like playing chess.


The modern state of understanding is that we don’t know how to solve the problem ‘build a nice AI’ even using infinite computing power, and the frontier of research is trying to slice off well-defined, simple subproblems and then come up with terrible proposals for how to solve them using infinite computers where at least the *problems* with the proposals are crisply defined.

In other words, it’s at a very very early stage of research. Meanwhile, there are some very strong answers to the question ‘How would you build a powerful AI using infinite computing power?’ and the actual technology continues to march on toward it happening on actual real-world computers. So there’s already a sense in which our understanding of powerful AI has leaped ahead of our understanding of powerful nice AI, which is why people are worried and trying to start research as early as possible on powerful nice AI even though it takes a lot of caution and mental discipline to focus your work on places that will make sense.

What about your work in human rationality?  What does that have to do with AI?

It turns out that if you make a really serious drive toward understanding how to build AI, how to make computer programs that think, well, the knowledge is not totally unrelated to understanding how humans think. It even tends to suggest some ideas about how maybe human beings could think *better*. The academic study of all the horrible, predictable mistakes that human beings make – the study of program bugs in human cognition – is known as the “heuristics and biases” program in experimental psychology and hoooly smokes do they have a lot of experiments showing replicated bugs in human reasoning. It also happens to be the case that reasoning about future Artificial Intelligence is *itself* a problem that puts a tremendous stress on human reasoning – requiring a lot of skill to think through correctly.

Almost all of what you see in the popular media won’t even be trying to reason carefully. Even scientists might just throw all the caution out the window, all the rules they would use for carefully reasoning about a virus under a microscope, and just make stuff up that sounds neat.

So you also have to study good human reasoning, in order to *not* throw all the rules out the window, and go on thinking carefully about future Artificial Intelligence.

That’s the basic relation and it manifests in websites like Less Wrong, or the several years of blog posts I wrote to get LessWrong.com started (sometimes known as the Sequences because there’s a sequence of blog posts on, e.g. reductionism, and how to actually change your mind about things, and so on).

You also wrote a book: Harry Potter and the Methods of Rationality.

Well, the way that happened is that I was trying to write a nonfiction popular book on rationality and the writing was going *very slowly*. At around the same time, I happened to be reading a lot of Harry Potter fan fiction to goof off and thus the plot of ‘Harry Potter and the Methods of Rationality’ popped into my head. I decided to try and write it, just to see (or so I told myself) if I could write faster if I was writing something goofy. I initially published under a pen name, partially in case the fan fiction would come out horrible, and partially because I wanted to see what people thought of my writing if I went to a new audience and didn’t call myself ‘Eliezer Yudkowsky’. Fast forward several years, and it’s now the most popular Harry Potter fanfiction out of more than half a million fanfictions on just its own host site (popularity measured in number of reviews, which is the main thing that all sites measure in common).


Those who described Harry Potter and the Methods of Rationality as their favorite book period, tend to cite that the characters are intelligent, that the story taught them something about science, that the story taught them something about *how to think*, or that they really liked the story’s humor.

Could you list videos/books/websites you would suggest to our readers who are interested in AI, in nice AI, and human rationality?

For human rationality:

The leading introduction to the issues of advanced AI is the book “Superintelligence” by Nick Bostrom.

People who are interested in learning more about AI as a field, and who know some background math and computer programming, should start with the deservedly standard textbook “Artificial Intelligence: A Modern Approach” by Stuart Russell and Peter Norvig.

If you already have some math background and know some computer science, an overview of modern technical work on the value alignment problem can be found here. And Intelligence.org is the overall site for the Machine Intelligence Research Institute.

Thank you very much!

And I’d also like to say a loud word of thank-you to Exosphere for all the help they offered Brienne, and then myself, during our stay in Chile. You are, in a word, ‘cool’.