AZEEM AZHAR: Hello, I’m Azeem Azhar. Eight years ago I started Exponential View to explore the underlying forces driving the development of exponential technologies. I’m a child of the exponential age, born the year after Intel released the 4004, the world’s first single-chip microprocessor. I got my first computer in 1981. I’m holding it as I speak. And as I moved into adulthood, the internet was growing exponentially, transforming from a tool of academia to the worldwide utility it is today. I at least did not grow exponentially with it. Now, back in 2015 when I started writing Exponential View, I’d noticed something curious was going on. You could feel the pace picking up. New technologies were clearly on some kind of accelerating trend. One of these technologies is artificial intelligence, so today, I’m going to bring back a conversation with one of the original AI pioneers, Jürgen Schmidhuber. Jürgen is one of the most influential figures in AI research, having made groundbreaking contributions to the field over three decades. In fact, many of the most advanced AI systems that we use today rely on techniques that Schmidhuber developed. One of these is long short-term memory, a type of artificial neural network that found its use in a wide range of applications including speech recognition, natural language processing, and robotics. Jürgen is known for his emphasis on the importance of developing AI systems that can learn and adapt on their own without relying on human input or supervision. He believes that this type of self-improving AI has a potential to revolutionize many industries and solve some of the world’s most pressing problems. Jürgen, it is a pleasure to have you on Exponential View.
JÜRGEN SCHMIDHUBER: Azeem, the pleasure is mine.
AZEEM AZHAR: Now, you are clearly someone who is driven by a great passion for your work and not by any of the accolades that may come your way. When did this passion for artificial intelligence first grab you?
JÜRGEN SCHMIDHUBER: That happened during teenager times when I was a boy. First, I wanted to become a physicist because it seemed to be the most fundamental science, understanding the nature of the universe. Until I realized no, there’s something even grander. I could try to build something that is going to learn to become much smarter than myself, such that it can solve all the problems that I cannot solve myself. Such that it can become a much better scientist than I ever could hope to be, and such that I can retire.
AZEEM AZHAR: Did you, during your school days, have access to early computers? And were you able to play around with some of these early ideas of artificial intelligence?
JÜRGEN SCHMIDHUBER: Yeah, we had access to tiny little personal computers like these Z81, if you remember that one.
AZEEM AZHAR: I have one on my desk, Jürgen, the ZX81. I’m really holding it as you talk about it.
JÜRGEN SCHMIDHUBER: That’s incredible. That’s an incredible coincidence. Yes, and on that thing, I programmed little video games for playing tennis and stuff like that. And since then, we have greatly profited from the fact that every five years, computing is getting 10 times cheaper.
AZEEM AZHAR: That is the impact of Moore’s law, I suppose.
JÜRGEN SCHMIDHUBER: It’s much older than Moore’s law because Moore’s law is about microchips, which have existed since the ’60s, and it basically says every 18 months you can pack twice as many transistors on a chip. And that law apparently has broken, but there’s an older one, which goes back at least to 1941 when Konrad Zuse built the first working program-controlled computer, and he could do roughly one operation per second. And since then, every five years, computing has become 10 times cheaper. Which means today, almost 80 years later, we can do more than a million billion operations per second for the same price.
AZEEM AZHAR: Absolutely. So, today’s chips are running at the hundreds of billions or even trillions of operations per second compared to the first computer from Konrad Zuse, which was running at one operation per second. Is that correct?
JÜRGEN SCHMIDHUBER: We can keep this law going only because we are making more parallel computers where you have lots of parallel processors computing something at the same time, which is a little bit more like what the brain seems to be doing.
AZEEM AZHAR: We’re going to return to the relationship between parallel computers and the brain later in our conversation. Perhaps, let’s start with the particular type of neural network that is most associated with your research, and that’s so extensively used by the world’s largest internet companies. Can you give us a flavor of what it is and where it’s in use?
JÜRGEN SCHMIDHUBER: I guess you are referring to the long short-term memory, which is a recurrent artificial neural network that was developed and improved through my brilliant students in Munich and in Switzerland. What is it about? It’s a little bit inspired by the human brain. In your brain you’ve got about 100 billion little processors, which are called neurons, and each of them is connected to maybe 10,000 other neurons on average. And some of these neurons are input neurons where video is coming in through the cameras, through the eyes. Or audio signals are coming in through the microphones, through the ears, and the tactile sensors, and so on. And some of these neurons are output neurons, and when you activate them, then you move your finger muscles, or your speech muscles, and so on. And each of these connections has a strength, and the strength says, “How much does this neuron over here influence that other neuron over there at the next time step?” And in the beginning, the network is dumb. It knows nothing. It is dumb like a little baby because all these little connection strengths are random. But then, over time, through learning, through lots of training examples, the network is making some of these connections stronger and others weaker, such that over time it learns to do something that is reasonable. For example, recognizing speech, or translating from one language to another, or driving a car.
AZEEM AZHAR: Look at those long short-term memory networks. And I think for ease in our conversation, we’ll call them LSTMs. If we look at LSTMs actually in industrial uses, the numbers are quite remarkable in terms of the numbers of humans that we reach. What’s one of your favorite examples of an LSTM being used in the wild?
JÜRGEN SCHMIDHUBER: I think the first time that it was spread widely was in 2015 when Google started using LSTM for all of its speech recognition. And this is now on 2 billion smartphones. It’s much better than what Google had before for speech recognition, and it’s used by many, many people, every single day. Then one year later, Google also used an LSTM-based system to translate from one language to another through Google Translate. Until November 2016, for example, the Chinese, they laughed about the translations from English to Mandarin and back. That is the most important language pair in the world. But after November 2016, they didn’t laugh anymore. It’s much better now. And then half a year later, Facebook rolled it out as well for all its billions of users. And by 2017, Facebook did about 30 billion translations of messages per week using an LSTM-based system.
AZEEM AZHAR: Those numbers are incredible. Four and a half billion translations a day on Facebook using the LSTM work that your lab pioneered. How does that make you feel when you think that this work is so extensively used?
JÜRGEN SCHMIDHUBER: LSTM is a gift to mankind. Everybody can use it, it was funded through taxpayers’ money, and there are lots of open source implementations. You can download it today and use it for predicting time series, or the stock market, or healthcare data and whatever. So, it’s free for everybody.
AZEEM AZHAR: Why do LSTMs work so well for the things that they do?
JÜRGEN SCHMIDHUBER: The LSTMs overcome a problem that was present in the old feedback networks. Standard neural networks, they are of the feed forward type, which means that the network just can see an input and translate that into an output, and that’s it. It doesn’t have a memory of previous things that happened at earlier points in time. It cannot see a video, for example, and then memorize certain events that happened maybe half an hour ago, 10 minutes ago, and so on. But the LSTM overcomes these issues, and it is able to learn to put the important stuff into memory, and to ignore the noise and the unimportant stuff. And that’s how it can learn all these interesting real-world tasks such as speech recognition, but then also interacting in video games with a partially observable world where you have to memorize some things that happened in the past in order to make a successful output action at a much later point in time.
AZEEM AZHAR: One of the things that strikes me about these networks is the speed with which they learn. I read recently that a bot that could play a particular video game learnt through 500,000 or 450,000 different plays of the game. And it was a vast improvement on where some of the board game playing bots had been, where they’d been playing 10, 20, 30 million times. Yet, when we look at the original neural network, the one in our brain, we learn far more quickly or seemingly more quickly. Why is that?
JÜRGEN SCHMIDHUBER: Yes and no. Don’t forget that you have learned to learn quickly because you have decades of experience with videos coming in every day. You have had billions and billions of training examples that tell you something about the environment. And from there, you learned a lot of vision and in particular, you learned to learn faster. And this is actually related to a field that we have been pursuing since 1987 meta learning, where the goal is to learn from previous experiments to do new things, to solve new things, new problems, faster.
AZEEM AZHAR: Better learning is this capability that is about learning to learn, and that could certainly explain why humans and other animals seem to be able to learn without a hundred thousand experiences of avoiding the lion chasing us. And I think we also feel that within these biological brains, there’s a certain amount of what we’d call transfer learning, which is we can learn from one domain, and we can generalize it and apply it to another domain. But as an outsider, it seems to me that neither of those yet adequately explain just how quickly the baby giraffe can learn to stand up, or how quickly the young oryx can run down the almost vertical cliff face, or how quickly a little child can recognize its mother or father’s voice. Do we have any understanding of those mechanisms and how they might pertain to what we might build in an artificial neural network?
JÜRGEN SCHMIDHUBER: We know little about what the human brain does exactly. On the other hand, we have artificial neural networks that in many way seem to be very similar to the human brain, at least in terms of what they are doing. For example, you mentioned transfer learning. If you train a deep neural network on 100 different databases of images, it’s going to learn a lot about vision in general. There, it has learned to learn new things faster because what it has learned from the first 100 databases, already contains a lot of basic knowledge about how to process images through the first 20 layers say, and then only the top layer has to be adjusted a little bit, which can be done really quickly. Transfer learning in artificial networks and in human biological brains seems to yield similar results.
AZEEM AZHAR: When we think about where that artificial intelligence takes place, today, it really mostly happens in what we call the cloud, which is sort of back in someone’s data center in Dublin or the East Coast of the US, rather than in our homes or on our devices. What happens to our experience when we can move that intelligence to the devices themselves?
JÜRGEN SCHMIDHUBER: As computers are getting cheaper and as AI as a consequence is getting cheaper, it’s going to be more and more local. And this is a tendency which will actually prevent single big players from dominating everything. Now, it’s going to be an extremely distributed sort of future, where AI is locally available on billions of different devices without a need to contact a remote server.
AZEEM AZHAR: Oh, fantastic. That’s a vision of the future. Now, if the AI systems are dependent on compute and data, which are closely related, we know a couple of things about what’s happening there. Their capabilities are growing exponentially, and certainly, it’s as true in data as it is in compute. There is lots of new data that’s multimodal of different types. What kind of things happen to these AI systems over the next few years in an environment of this continued exponential growth of both compute and data? What will they become capable of?
JÜRGEN SCHMIDHUBER: At the moment, most of the profits in AI are really in marketing and selling ads, and that’s what the major platform companies on the Pacific Rim are doing with AI. Alibaba and Amazon, Google and Tencent and Facebook. They are trying to use AI to predict what are you going to click at next given the context of your previous interactions with the internet. And that’s good enough to make these companies pretty much the most valuable companies in the world ignoring Saudi Aramco and companies like them. However, it’s just a tiny little bit of the world economy, and the next AI wave is going to be much bigger than that. It’s going to be about machines that not only passively recognize data like your smartphone is recognizing your speech and your fingerprints and so on. No, it’s going to be machines that actively shape the data that they perceive through their own actions. Robots, machines that make shoes and T-shirts, and machines that make other machines that are being controlled by machines that learn to control these processes in a better way to make them more efficient and so on. So, this next wave of AI is going to affect pretty much every single part of the economy. It’s going to change everything around the world, and it’s going to be about smart, active AIs rather than simple passive AIs that just do the pattern recognition but don’t act on these perceived patterns.
AZEEM AZHAR: So, how will that manifest itself?
JÜRGEN SCHMIDHUBER: I guess, in the not so distant future, for the first time, we are going to have something that we don’t have at the moment, which is a little robot kid. A little machine that learns a little bit like a human kid just by watching and by listening to imitate some complex procedure to learn some complex process such as manufacturing a smartphone, or making a shoe, or a T-shirt. At the moment, humans are much better at doing that than robots, and they do it much more economically. And that’s going to change.
AZEEM AZHAR: In a world where these AI systems become more powerful, are better able to interact with the real world to create their own processes, it looks like we’re heading towards what researchers have called artificial general intelligence. How should we think about artificial general intelligence, or AGI?
JÜRGEN SCHMIDHUBER: AGI is in many ways really different from the current commercial applications of AI, which are all about supervised learning, where you have some platform company that is collecting a lot of data, and in the data there are patterns, and the platform companies recognize those patterns and use them to sell you ads. Active AI is quite different from that. How does a baby learn to become smart? A baby doesn’t become smart by downloading data from Facebook or something. No, it’s actively shaping its data through its own actions by playing around with its toys, it’s inventing its own experiments to get results from these experiments, which are data from which it is learning something about the properties of the world. It’s learning to predict the consequences of its actions, and it’s a little scientist. It is setting itself its own goals and it’s trying to figure out what happens if I do that and that. And in the process, it’s becoming a more and more general problem solver. That’s quite different from what we have in current commercial AI. However, we have had similar things in the lab for decades. In particular, we have had something that I like to call artificial curiosity, where you also have active unsupervised agents that, a little bit like babies, learn to invent their own goals and their own experiments to figure out how the world works, how the world reacts to their own actions, how they can interact with the world to solve certain self-invented problems, and so on. Artificial curiosity is something that we in principle at least, have understood.
AZEEM AZHAR: Is a lot of human engineering demanded in making these particular types of networks?
JÜRGEN SCHMIDHUBER: AI research has a long history of pre-wiring certain things that people thought cannot be learned. And if you look back through the decades, you see lots of AI systems, which crucially depend on pre-wired components, which contain lots of prior knowledge about the world. In recent decades, however, systems have emerged that need less and less pre-wired knowledge, and that learn more and more by themselves. 30 years ago, compute was 1 million times more expensive than today, and we did have to pre-wire a couple of things to make it work on little toy experiments. But now, we are greatly profiting from the fact that every five years, compute is getting 10 times cheaper, and now lots of things quickly work that required lots of labor back then. The basic principles of curiosity are still the same, and they’re really, really simple.
AZEEM AZHAR: Could you help us understand what the scale of these networks are? How big was the first recurrent neural network that you built? And how big are the ones that are being used in industry today? And I hear numbers that suggest that they sometimes have maybe tens, hundreds of millions, maybe even billions of individual parameters that need to be learnt or tuned.
JÜRGEN SCHMIDHUBER: When we started this work on recurrent networks in the late eighties, our recurrent networks had maybe three or four neurons, and maybe 20 or 40 weights, and 1000 weights were rather impressive already. But now as compute has become a million times cheaper, we have much larger networks. A large LSTM of today of the type that is used in Google Translate or in Facebook Translate has maybe 1 billion connections. At least hundreds of millions of connections. Let’s quickly compare that to the human brain, which has a million times more. In your brain, you’ve got maybe a million, billion connections. However, this fact of a million translates into just three decades because every five years, compute is still getting 10 times cheaper. So, in 30 years, we are going to gain a factor of a million.
AZEEM AZHAR: 30 years from now seems like an interesting date to watch. Do you think that we need significant theoretical breakthroughs in order to get to the stage of having smart robots that are in the real world? Or do we have a sufficient theory upon which to build those technologies today?
JÜRGEN SCHMIDHUBER: I think that the basic puzzle pieces that we need to build something like that are already known. Artificial curiosity, how to deal with sequences of inputs such as videos and speech and so on. How to build robots that set themselves their own goals such that they can become more general problem solvers on their own. We already know that in principle, however, these puzzle places still have to fall in place in an elegant and simple way. I guess in hindsight, it’s going to be really simple and people will say, “Why haven’t we thought of that 30 years ago?” So, they still have to fall in place in a good way that takes into account all these little issuers. And how long is that going to take? I think it may be just a few years, maybe less than a decade. On the other hand, I’ve been wrong before.
AZEEM AZHAR: Now, if we look at AI in the world, many of the examples we’ve described have been really benign. Speech recognition, and speech translation, and so on. But we also know that there have been unintended consequences of the rapid rollout of information technology in terms of the monopoly market power, in terms of the upsetting of certain types of political discourse. And also in the naive and simplistic way that they’ve sometimes reinforced structural inequalities across societies. The next 10, to 20, to 30 years, which we’ll see another million fold improvement in computational power. It’ll see these artificial neural networks spread even more widely, and into the physical world, and outside of the narrow domain of internet marketing. If it’s going to happen very quickly, we need to, in a sense, have some cognizance and awareness of some of the risks. Some of the second-order consequences or third-order consequences of these technologies rapidly making their way into the world. What do you think the best stance is to take towards that? And if you were advising a policymaker, what advice would you give them?
JÜRGEN SCHMIDHUBER: First of all, I would point out that there is incredible commercial pressure towards good AI. Almost all of AI research of our company, Nnaisense, and of all or many other companies, is about making human lives easier and longer and healthier. Why? Because people will buy only stuff that they think will make their lives better. So, there’s intense commercial pressure towards good AI. On the other hand, of course, 5% of all AI research may be about weapons, military research. And this is also unstoppable because the Chinese will say, “If we don’t do it, the Americans will do it.” And the Americans will say, “If we don’t do it, then the Russians or the Israelis will do it,” and so on. AI has these two sides, a little bit like fire. When fire was invented, controlled fire about 700,000 years ago, maybe people also realized it has pros and cons. You can cook with it, you can use it to keep warm at night, but you can also use it to kill other people. And fire even has this AI-like quality of spreading without further human ado in form of a wildfire. Back then, people realized there are pros and cons, but then they also realized the pros outweigh the cons so much, so let’s keep developing fire. And we are in a similar situation today.
AZEEM AZHAR: It’s such an amazing technology. I will remind myself when I speak to Siri today that Jürgen, it’s you and your collaborators who gave us those breakthroughs that have enabled such technology as speech recognition. Thank you for taking the time today to speak with me.
JÜRGEN SCHMIDHUBER: Azeem, it was my pleasure.
AZEEM AZHAR: Well, thanks for listening to my conversation with Jürgen Schmidhuber on artificial intelligence. To stay in touch, follow me on Twitter. I’m @azeem. And subscribe to my weekly newsletter, Exponential View, at www.exponentialview.co. That’s www.exponentialview.co. I’m your host, Azeem Azhar. This podcast was produced by Marija Gavrilov and Fred Casella. Bojan Sabioncello is the sound editor. And Exponential View is a production of E to the Pi I Plus One, Limited.