Back to Index

MIT Sloan: Intro to Machine Learning (in 360/VR)


Chapters

0:0 Intro
0:51 Course Overview
2:29 How Powerful is Artificial Intelligence
3:38 Supervised Learning
4:55 Augmented Learning
6:3 Machine Learning
9:34 Questions
12:33 Artificial Neuron
13:54 Building an Artificial Neuron
15:32 Neural Networks
20:0 Representation
24:28 General Intelligence
29:13 Data Representation
30:43 Pattern Recognition
33:53 Machine Learning Examples
35:10 How to Detect Traffic Lights
38:8 How to Generate Text
39:54 EndtoEnd Approach
41:23 What Cant We Do
43:8 The Pipeline
44:7 The Machine Learning
46:36 The Open Questions
47:41 Pong
49:14 The Mars Paradox
49:55 Neural Networks vs Natural Selection

Transcript

The video you're watching now is in 360. Resolution is not great but we wanted to try something different. So if you're on a desktop or laptop you can pan around with your mouse or if you're on a phone or tablet you should be able to just move your device to look around.

Of course it's best viewed with a VR headset. The video that follows is a guest lecture on machine learning that I gave an MIT Sloan course on the business of artificial intelligence. The lecture is non-technical and intended to build intuition about these ideas amongst the business students in the audience.

The room was a half circle so we thought why not film the lecture in 360. We recorded a screencast of the slides and pasted it into the video so that the slides are more crisp. Let me know what you think and remember it's an experiment. So this course is talking about the broad context, the impact of artificial intelligence, the global, this global, which is the global impact of artificial intelligence, there's the business which is when you have to take these fun research ideas that I'll talk about today.

A lot of them are cool on toy examples when you bring them to reality you face real challenges which is what I would like to really highlight today. That's the business part when you want to make real impact, when you're going to make these technologies a reality. So I'll talk about how amazing the technology is for a nerd like me but also talk about how when you take that into the real world what are the challenges you face.

So machine learning which is the technology at the core of artificial intelligence. We'll talk about the promise, the excitement that I feel about it, the limitations, we'll bring it down a little bit. What are the real capabilities of technology. We're for the first time really as a civilization exploring the meaning of intelligence.

It is if you pause for a second and just think you know maybe many of you want to make money out of this technology, many of you want to save lives, help people but also on the philosophical level we get to explore what makes us human. So while I'll talk about the low-level technologies also think about the incredible opportunity here we get to almost psychoanalyze ourselves by trying to build versions of ourselves in the machine.

All right so here's the open question how powerful is artificial intelligence? How powerful is machine learning that lies at the core of artificial intelligence? Is it simply a helpful tool, a special purpose tool to help you solve simple problems? If you're which is what it currently is. Currently machine learning artificial intelligence is a way if you can formally define the problem, you can formally define the tools you're working with, you can formally define the utility function what you want to achieve with those tools.

As long as you can define those things we can come up with algorithms that can solve them. As long as you have the right kind of data which is what I'll talk about. Data is key and the question is into the future can we break past this very narrow definition of what machine learning can give us which is solve specific problems to something bigger to where we approach the general intelligence that we exhibit as human beings.

When we're born we know nothing and we learn quickly from very little data. The right answer is we don't know. We don't know what are the limitations of technology. What kind of machine learning are there? There's several flavors. The first two is what's really the first is what's achieved success today.

Supervised learning. What I'm showing here on the left of the slide is the teachers, is the data that is fed to the system and on the right is the students which is the system itself from machine learning. So there's supervised learning. Whenever everybody talks about machine learning today, for the most part they're referring to supervised learning which means every single piece of data that is used to train the model is seen by human eyes and those human eyes with an accompanying brain label that data in a way that makes it useful to the machine.

This is critical because that's one, the blue box, the human is really costly. So whenever every single piece of data that needs to be that's used to train the machine needs to be seen by human, you need to pay for that human. And second you're limited to just the the time.

There's the amount of data necessary to label what it means to exist in this world is humongous. Augmented supervised learning is when you get machine to really to help you a little bit. There's a few tricks there but still only tricks. It's still the human is at the core of it and the promise of future research that we're pursuing, that I'm pursuing and perhaps in the applications if we get to discuss or some of the speakers here get to discuss, they're pursuing in semi supervised and reinforcement learning where the human starts to play a smaller and smaller role in how much they get to annotate, they have to annotate the data.

And the dream of the sort of wizards of the dark arts of deep learning are all excited about unsupervised learning. It has very few actual successes in application in the real world today but it is the idea that you can build a machine that doesn't require a human teacher, a human being to teach you anything, fills us artificial intelligence researchers with excitement.

There's a theme here. Machine learning is really simple. The learning system in the middle, there's a training stage where you teach it something. All you need is some data, input data, and you need to teach it the correct output for that input data. So you have to have a lot of pairs of input data and correct output.

There'll be a theme of cats throughout this presentation. So if you want to teach a system difference between a cat and a dog, you need a lot of images of cats and you need to tell it that this is a cat. This bounding box here in the image is a cat.

You have to give it a lot of images of dogs and tell it, "Okay, well in these pictures there are dogs." And then there's a spelling mistake on the second stage is the testing stage when you actually give it new input data it's never seen before and you hope that it has given for cat versus dog enough data to guess is this new image that I've never seen before a cat or a dog.

Now one of the open questions you want to keep in mind is what in this world can we not model in this way? What activity, what task, what goal? I offer to you that there's nothing you can't model in this way. So let's think about what in terms of machine learning can be so let's start small.

What can be modeled in this way? First on the bottom of the slide left is one-to-one mapping where the input is an image of a cat and the output is a label that says cat or dog. You can also do one-to-many where the image, the input is an image of a cat and the output is a story about that cat, a captioning of the image.

You can, first of all, you can do the other way, many-to-one mapping where you give it a story about a cat and it generates an image. There's many-to-many, this is Google Translate, we translate a sentence from one language to another and there's various flavors of that. Again, same theme here, input data provided with correct output and then let it go into the wild where it runs on input data it hasn't seen before to provide guesses.

And it's as simple as this, whatever you can convert into one of the following four things, numbers, vector of numbers, so a bunch of numbers, a sequence of numbers where the temporal dynamics matters, so like audio, video, where the sequence, the ordering matters, or a sequence of vector numbers, just a bunch of numbers.

If you can convert it into numbers, and I propose to you that there's nothing you can't convert into numbers. If you can convert it to numbers you can have a system learn to do it. And the same thing with the output, generate numbers, vectors of numbers, sequence of numbers, or sequence of vectors and numbers.

First, is there any questions at this point? Well, we have a lot of fun slides to get through, but I'll pause every once in a while to make sure we're on the same page here. So what kind of input are we talking about? Just to fly through it, images, so faces or medical applications for looking at scans of different parts of the body to determine if they're to diagnose any kind of medical conditions.

Texts, so conversations, your texts, article, blog posts for sentiment analysis, question answering, so you ask it a question where the output you hope is answers. Sounds, a voice recognition, any kind of anything you could tell from audio. Time series data, so financial data, stock market, you can use it to predict anything you want about the stock market including whether to buy or sell.

If you're curious, it doesn't work quite well as a machine learning application. Physical world, so cars or any kind of object, any kind of robot that exists in this world. So location of where I am, location of where other things are, the actions of others, that could be all input.

All of it can be converted to numbers. And the correct output, same thing. Classification, a bunch of numbers. Classification is saying is this a cat or a dog, regression is saying to what degree I turn the steering wheel, sequence, generating audio, generating video, generating stories, captioning, text, images, generating anything you could think of as numbers.

And at the core of it is a bunch of data agnostic machine learning algorithms. There's traditional ones, nearest neighbors, Naive Bay, support vector machines. A lot of them are limited and I'll describe how. And then there's neural networks. There's nothing special and new about neural networks. And I'll describe exactly the very subtle thing that is powerful, that's always been there all along and certain things have now been able to unlock that power about neural networks.

But it's still just the flavor of a machine learning algorithm. And the inspiration for neural networks, as Jonathan showed last time, is our human brain. It's perhaps why the media, perhaps why the hype is captivated by the idea of neural networks, is because you immediately jump to this feeling like because there's this mysterious structure to them that scientists don't understand.

Artificial neural networks I'm referring to and the biological ones. We don't understand them and the similarity captivates our minds and we think well this approach is perhaps as limited as our, as limitless as our own human mind. But the comparison ends there. In fact the artificial neuron, their artificial neural networks are much simpler computational units.

At the core of everything is this neuron. This is a computational unit that does a very, two very simple operations. On the left side it takes a set of numbers as inputs it applies weights to those inputs, sums them together, applies a little bias and provides an output somewhere between 0 and 1.

So you can think of it as a computational entity that gets excited when it sees certain inputs and gets totally turned off when it gets other kinds of inputs. So maybe this neuron with a 0, with a 0.7, 0.6, 1.4 weights, it gets really excited when it sees pictures of cats and totally doesn't care about dogs.

Some of us are like that. So that's the job of this neuron is to detect cats. Now what, the way you build an artificial neural network, the way you release the power that I'll talk about in the following slides about the applications, what could be achieved, is just stacking a bunch of these together.

Think about it. This is, this is a extremely simple computational unit. So you need to sort of pause whenever we talk about the following slides and think that there's a few slides that I'll show that say neural networks are amazing. I want you to think back to this slide that everything is built on top of these really simple addition operations with a simple nonlinear function applied at the end.

Just a tiny math operation. We stack them together in a feed-forward way so there's a bunch of layers and when people talk about deep neural networks it means there's a bunch of those layers and then there's recurring neural networks that are also a special flavor that's able to have memory.

So as opposed to just pushing input into output directly, it's also able to do stuff on the inside in a loop where it remembers things. This is useful for natural language processing, for audio processing, whenever the sequence is not, the length of the sequence is not defined. Okay, slide number one in terms of neural networks are amazing.

This is, this is perhaps for the math nerds, but also I want you to use your imagination. There's a universality to neural networks. It means that the simple computational unit on the left is an input, on the right is the output of this network. With just a single hidden layer, it's called a hidden layer because it sits there in the middle of the input and the output layers.

A single hidden layer with some number of nodes can represent any function. Any function. That means anything you want to build in this world. Everyone in this room can be represented with a neural network with a single hidden layer. So the power, and this is just one hidden layer, the power of these things is limitless.

The problem of course is how do you find the network? So how do you build a network that is as clever as many of the people in this room? But the fact that you can build such a network is incredible, is amazing. I want you to think about that.

And the way you train a network, so it's born as a blank slate. Some random weights assigned to the edges. Again, a network is represented, the numbers at the core, the parameters at the core of this network are the numbers on each of those arrows, each of those edges.

And you start knowing nothing. This is a baby network. And the way you teach it something, unfortunately, currently, as I said, in a supervised learning mechanism, you have to give it pairs of input and output. You have to give it pictures of cats and labels on those pictures saying that they're cats.

And the basic fundamental operation of learning is when you compute the measure of an error and you back propagate it to the network. What I mean, everything is easier with cats. I apologize, I apologize, too many cats. And so the input here is a cat and the neural network we trained, it's just guessing, it doesn't know.

Say, I don't know, it's guessing cat. Well, it happens to be right. So we have to, this is the measure of error. Yes, you got it right. And you have to back propagate that error. You have to reward the network for doing a good job. And all you do, what I mean by reward, there's weights on each of those edges and so the node, the individual neurons that were responsible, that back to that cat neuron, that cat neuron needs to be rewarded for seeing the cat.

So you just increase the weights on the neurons that were associated with producing the correct answer. Now you give it a picture of a dog and the neural network says cat. Well, that's an incorrect answer, so no, there's a high error, needs to be back propagated to the network.

So the weights that were responsible with classifying this picture as a cat need to be punished, they need to be decreased. Simple. And you just repeat this process over and over. This is what we do as kids when we're first learning. And you know, for the most part, that we have to, we're also supervised learning machines in the sense that we have our parents and we have the environment, the world, that teaches about what's correct and what's incorrect.

And we back propagate this error and reward through our brain to learn. The problem is, as human beings, we don't need too many examples and I'll talk about some of the drawbacks of these approaches. We don't need too many examples. You fall off your bike once or twice and you learn how to ride the bike.

Unfortunately neural networks need tens of thousands of times when they fall off the bike in order to learn how to not do it. That's one of the limitations. And one key thing I didn't mention here is when we refer to input data, it's, when we refer to input data, we usually refer to sensory data, raw data.

We have to represent that data in some clever way, in some deeply clever way, where we can reason about it, whether it's in our brains or in the neural network. And a very simple example here to illustrate why representation of data matters. So the way you represent the data can make the discrimination of one class from another, a cat versus dog, either incredibly difficult or incredibly simple.

Here is a visualization of the same kind of data in Cartesian coordinates and polar coordinates. On the right you can just draw a simple line to separate the two. What you want is a system that's able to learn the polar coordinate representation versus the Cartesian representation automatically. And this is where deep learning has stepped in and revealed the incredible power of this approach, which deep learning is the smallest circle there.

It's a type of representational learning. Machine learning is the bigger second to the biggest. So this class is about the biggest circle, AI, includes robotics, includes all the fun things that are built on learning. And I'll discuss while machine learning I think will close this entire circle into one.

But for now AI is the biggest circle, then a subset of that is machine learning, and a smaller subset of that is representation learning. So deep learning is not only able to say, given a few examples of cats and dogs, to discriminate between a cat and a dog. It's able to represent what it means to be a cat.

So it's able to automatically determine what are the fundamental units at the low level and the high level. Talking about this very Plato. What it means to represent a cat from the whiskers to the high level shape of the head to the the fuzziness and the deformable aspects of the cat.

Not a cat expert, but I hear this these are the features of a cat. Verses that are essential to discriminate between a cat and a dog. Learning those features as opposed to having to have experts. This is the drawback of systems that Jonathan talked about from the 80s and 90s where you have to bring in experts for any specific domain that you try to solve.

You had to have them encode that information. Deep learning, this is simply the only big difference between deep learning and other methods. Is that it learns the representation for you. It learns what it means to be a cat. Nobody has to step in and help it figure out what cats have whiskers and dogs don't.

What does this mean? The fact that it can learn these features, these whisker features, is as opposed to having five or ten or a hundred or five hundred features that are encoded by brilliant engineers with PhDs. It can find hundreds of thousands, millions of features automatically. Hundreds of millions of features.

So stuff that can't be put into words or described. In fact it's one of the limitations in neural networks is they find so many fundamental things about what it means to be a cat that you can't visualize what it really knows. It just seems to know stuff and it finds that stuff automatically.

What does this mean? The critical thing here is because it's able to automatically learn those hundreds of millions of features, it's able to utilize data. It doesn't start, the diminishing returns don't hit until, well we don't know when they hit. The point is with the classical machine learning algorithms you start hitting a wall when you have tens of thousands of images of cats.

With deep learning you get better and better with more data. Neural networks are amazing slide two. Here's a game, a simple arcade game, where there's two paddles, they're bouncing a ball back and forth. Okay, great, you can figure out an artificial intelligence agent that can play this game. It can, not even that well, just kind of, it kind of learns to do alright and eventually win.

Here's the fascinating thing. With deep learning, as opposed to encoding the position of the paddles, the position of the ball, having an expert in this game, there's many, come in and encode the physics of this game. The input to the neural network is the raw pixels of the game.

So it's learning in the following way. You give it an evolution of the game, you give it a bunch of pixels. Pixels are images that are built up of pixels. They're just numbers from 0 to 256. So there's this array of numbers that represent each image and then you give it several tens of thousands of images that represent a game.

So you have this stack of pixels and stack of images that represent a game and the only thing you know, this giant stack of numbers, the only thing you know is at the end you won or lost. That's it. So based on that you have to figure out how to play the game.

You know nothing about games, you know nothing about colors or balls or paddles or winning or anything. That's it. So this is, why is this amazing? That it even works and it works, it wins. It's amazing because that's exactly what we do as human beings. This is general intelligence.

So I need you to pause and think about this. We'll talk about special intelligence and the usefulness and okay there's cool tricks here and there that we can do to get you an edge on your high-frequency trading system but this is general intelligence. General intelligence is the same intelligence we use as babies when we're born.

What we get is an input, sensory input of image sensory input. Right now all of us, most of us are seeing, hearing, feeling with touch and that's the only input we get. We know nothing and with that input we have to learn something. Nobody is pre-teaching us stuff and this is an example of that, a trivial example but one of the first examples where this is truly working.

I'm sorry to linger on this but it's a fundamental fact. The fact that we have systems that and now outperform human beings in these simple arcade games is incredible. This is the research side of things but let me step back. These again the takeaways. That previous slide is why I think machine learning is limitless in the future.

Currently it's limited. Again the representation of the data matters and if you want to have impact we currently can only tackle the small problems. What are those problems? Image recognition. We can classify given the entire image of a leopard, of a boat, of a mite with pretty good accuracy of what's in that image.

That's image classification. What else? We can find exactly where in that image each individual object is. That's called image segmentation. Again the process is the same. The learning system in the middle, a neural network, as long as you give it a set of numbers as input and the correct set of labels as output, it learns to do that for data it hasn't seen in the past.

Let me pause a second and maybe if you have any questions. Does anyone have any questions about the techniques of neural networks? Yes. So that's a great question and in a couple of slides I'll get to it exactly. So the data representation, I'll elaborate in a little bit, but loosely the data representation is for a neural network is in the weights of each of those arrows that connect the neurons.

That's where the representation is. So I'll show to really clarify that example of what that means. The Cartesian versus polar coordinates is just a very simple visualization of the concept. But you want to be able to represent the data in an arbitrary way where there's no limits to the representation.

It could be highly nonlinear, highly complex. Any other questions? Generally speaking, in our current state, when we talk about machine learning or AI, it's simply statistical models that are able to recognize patterns or things of that nature where they're not necessarily thinking but simply recognizing. So I'm a little confused about how the current, I guess, system differs from deep learning and whether you think that there is the possibility of transitioning from recognizing to actually thinking.

So I have a couple of slides almost asking this question because there's no good answers. But one could argue, and I think somebody in the last class brought up that, you know, is machine learning just pattern recognition? It's possible that reasoning, thinking, is just pattern recognition. And I'll describe sort of an intuition behind that.

So we tend to respect thinking a lot because we've recently as human beings learned to do it. In our evolutionary time, we think that it's somehow special from, for example, perception. We've had visual perception for several orders of magnitude longer in our evolution as a living species. We've started to learn to reason, I think, about a hundred thousand years ago.

So we think it's somehow special from the same kind of mechanism we use for seeing things. Perhaps it's exactly the same thing. So perception is pattern recognition. Perhaps reasoning is just a few more layers of that. That's the hope. But it's an open question. Yes. The concept of neural network itself is not very new.

So is there any technical innovation or breakthrough to expand the use of neural network? Or is it just an increase of the result of computational power? Yes, that's a great question. There's been very few breakthroughs in neural networks since through the AI winters that we've discussed, through a lot of excitement, in spurts, and even recently there's been a very few algorithmic innovations.

The big gains came from compute. So improvements in GPU and better, faster computers. You can't underestimate the power of community. So the ability to share code and the internet. Ability to communicate together through the internet and work on code together. And then digitization of data. So like ability to have large data sets easily accessible and downloadable.

All of those little things. But I think in terms of the future of deep learning and machine learning, it all rides on compute, I think. Meaning continued bigger and faster computers. That doesn't necessarily mean Moore's Law in making smaller and smaller chips. It means getting clever in different directions.

Massive parallelization. Coming up with ways to do super efficient, power efficient implementations in neural networks and so on. So let me just fly through a few examples of what we can do with machine learning. Just to give you a flavor, I think in future lectures it's possible we'll discuss with different speakers, different specific applications, really dig into those.

So we can, as opposed to working with just images, you can work with videos and segment those. I mentioned image segmentation. We can do video segmentation. Through video segment, the different parts of a scene that's useful to a particular application. Here in driving, you can segment the road from cars and vegetation and lane markings.

You can also, this is a subtle but important point. >> Just go back one slide. How do they see the light? It's such a critical piece. The more I listen to you and read your stuff, it seems like this critical, these very small pieces of information that we know are important.

Like there is a red light. I have to stop. I have to slow down. How does it filter that out and pick out that? >> It's got to be 100% reliable on that, right? >> Hard question. The question was how do you detect the traffic light and lights. How do we do it as human beings, first of all?

Let's start there. The way we do it is by the knowledge we bring to the table. We know what it means to be on the road. There's a lot of the huge network of knowledge that you come with. That makes the perception problem much easier. This is pure perception.

You take an image and you separate different parts based purely on tiny patterns of pixels. First it finds all the edges. It learns that traffic lights have certain kinds of edges around them. Then zoom out a little bit. They have a certain collection of edges that make up this black rectangle type shape.

It's all about shapes. It builds up knowing the shape structure of things. It's a purely perception problem. One of the things I argue is that if it's purely a perception approach and you bring no knowledge to the table about the physics of the world, the three-dimensional physics and the temporal dynamics, that you're not going to be able to successfully achieve near 100% accuracy on some of these systems.

That's exactly the right question. For all of these things, think about how you as a human being would solve these problems. What is lacking in the machine learning approach? What data is lacking in the machine learning approach in order to achieve the same kind of results? The same kind of reasoning required that you would use as a human.

There is also image detection. Image detection, which means, it's a subtle but important point, the stuff I mentioned before, image classification is given an image of a cat. You find the cat. Sorry, you don't find the cat. You say this image is of a cat or not. And then detection or localization is when you actually find where in the image that is.

That problem is much harder, but also doable with machine learning, with deep neural networks. Now, as I said, inputs/outputs can be anything. The input can be video. The output can be video. And you can do anything you want with these videos. You can colorize the video. You can take an old black and white film and produce color images.

Again, in terms of having an impact in the world using these applications, you have to think, this is a cool demonstration, but how well does it actually work in the real world? Translation, whether that's from text to text or image to image, you can translate here dark chocolate from one language to another.

This class, Global Business of Artificial Intelligence, there's a reference below there. You can go and generate your own text. You can generate the writing of the act of generating handwriting. You can type in some text and given different styles that it learns from other handwriting samples, it can generate any kind of text using handwriting.

Again, the input is language. The output is a sequence of writing of pen movements on the screen. You can complete sentences. This is kind of a fun one where if you start... So you can generate language. And you can generate language where you start, you feed the system some input first.

So in black there it says, "Life is," and then have the neural network complete those sentences. "Life is about kids." "Life is about the weather." There's a lot of knowledge here, I think, being conveyed. And you can start the sentence with, "The meaning of life is." "The meaning of life is literary recognition." True for us academics.

Or, "The meaning of life is the tradition of ancient human production." Also true. But these are all generated by a computer. You can also caption. This has become very popular recently, is caption generation. Given input is an image, the output is a set of text that captures the content of the image.

You find the different objects in the image. That's a perception problem. And once you find the different objects, you stitch them together in a sentence that makes sense. You generate a bunch of sentences and classify which sentence is the most likely to fit this image. And you can, so certainly in the, I try to avoid mentioning driving too much, because it is my field, it is what I'm excited about.

But then the moment I start talking about driving, it'll all be about driving. So, but I should mention, of course, that deep learning is critical to driving applications for both the perception and what is really exciting to us now is the end-to-end, the end-to-end approach. So whenever you say end-to-end in any application, what that means is you start from the very raw inputs that the system gets, and you produce the very final output that's expected of the system.

So as opposed to in the self-driving car case, as opposed to breaking a car down into each individual components of perception, localization, mapping, control, planning, it's just taking the whole stack and just ignoring all the super complex problems in the middle and just taking the external scene as input, and as output, produce steering and acceleration and braking commands.

And so in this way, taking this input as the image of the external world, in this case in a Tesla, we can generate steering commands for the car. Again, input, a bunch of numbers that's just images. Output, a single number that gives you the steering of the car. Okay, so let's step back for a second and think about what can't we do with machine learning.

We talked about you can map numbers to numbers. Let's think about what we can't do. At the core of artificial intelligence, in terms of making an impact on this world, is robotics. So what can't we solve in robotics and artificial intelligence with a machine learning approach? And let's break down what artificial intelligence means.

Here's a stack. Starting at the very top is the environment, the world that you operate in. There's sensors that sense that world. There's feature extraction and learning from that data. And there's some reasoning, planning, and effectors are the ways you manipulate the world. What can't we learn in this way?

So we've had a lot of success, as Jonathan talked about, in the history of AI with formal tasks, playing games, solving puzzles. Recently we're having a lot of breakthroughs with medical diagnosis. We're still struggling, but are very excited about in the robotics space with more mundane tasks of walking, of basic perception, of natural language written and spoken.

And then there is the human tasks, which are perhaps completely out of reach of this pipeline at the moment, is cognition, imagination, subjective experience. So high level reasoning, not just common sense, but high level human level reasoning. So let's fly through this pipeline. There's sensors, cameras, LIDAR, audio. There's communication that flies through the air or wired or wireless or wired.

IMU, measuring the movement of things. So that's the way, think about it, that's the way as human beings and as any kind of system that you design, you measure the world. You don't just get an API to the world. You need to somehow measure aspects of this world. So that's how you get the data.

So that's how you convert the world into data you can play with. And once you have the data, this is the representation side. You have to convert that raw data of raw pixels, raw audio, raw LIDAR data. You have to convert that into data that's useful for the intelligence system, for the learning system to use to discriminate between one thing and another.

For vision, that's finding edges, corners, object parts, and entire objects. And there's the machine learning that I've talked about. There's different kinds of mapping of the representation that you've learned to an actual outputs. There is, once you have this, so you have this idea of, and this goes to maybe a little bit of Simon's question, is reasoning.

This is something that's out of reach of machine learning at the moment. This is going to your question. Then we can build a world class machine learning system for taking an image and classifying that it's a duck. I wonder if this will work. Wake you up. So we could take, this is well studied, exceptionally well studied problem.

We could take audio sample of a duck and tell that it's a duck. In fact, what species of bird. It's incredible how much research there is in bird species classification. And you can look at video and we could tell that we can do action recognition, it's swimming. But we can't do with learning now is reason.

That if it looks like a duck, it swims like a duck, and quacks like a duck, it's very likely to be a duck. This is the reasoning problem. This is the task that I personally am obsessed with and that I hope that machine learning can close. And then there is the planning action and the effectors.

So this is another place where machine learning has not had many strides. There's mechanical issues here that are incredibly difficult. There's degrees of freedom with all the actuators involved, with all the, just the ability to localize every part of yourself in this dynamic space. Where things are constantly changing, where there's degrees of uncertainty, where there's noise.

Just that basic problem is exceptionally difficult. So let me just pose this question. We talked about how machine, what machine learning can do with the cats and the duck. We could do that. Given representation, it could predict what's in the image. But one of the open questions is, and deep learning has been able to do the feature extraction, the representation learning.

This is the big breakthrough that everybody's excited about. But can it also reason? These are the open questions. Can it reason? Can it do the planning and action? And as human beings do, can it close the loop entirely from sensors to effectors? So learn not only the brain, but the way you sense the world and the way you affect the world.

The thing about that pawn game, so essentially, does the neural network get punished when it detects the ball because it goes off the map? Is that how it learns? So the question was about the pawn game. Thank you. I get to talk to it for a little longer. It doesn't get punished when it doesn't detect the ball.

This is the beautiful thing. It gets punished only at the very end of the game for losing the game and gets rewarded for winning the game. So it knows nothing about that ball and it learns about that ball. That's something you need to really sit and think about. Because as human beings, imagine if you're playing with a physical ball.

How do you learn what a ball is? You get hurt by it, you squeeze it, you throw it, you feel the dynamics of it, the physics of it. And nobody tells you about what a ball is. You're just using the raw sensory input. We take it for granted, and maybe this is what I can end on.

This is something Jonathan brought up. We take the simplicity of this task for granted. Because we've had eyes, we, broadly speaking, as living species on planet Earth, these eyes have been involved for 540 million years. So we have 540 million years of data. We've been walking for close to that, bipedal mammals.

We have been thinking only very recently, so 100,000 years versus 100 million years. And that's why some of these problems that we're trying to solve, you can't take for granted how actually difficult they are. So for example, this is the Marvax Paradox that Jonathan brought up, is that the easy problems are hard.

The things we think are easy are actually really hard. This is a state-of-the-art robot on the right playing soccer. And that was a state-of-the-art human on the left playing soccer. And I'll give it a second. The question was, you know, there's a fundamental difference between the way we train neural networks and the way we've trained biological neural networks through evolution by discarding through natural selection a bunch of the neural networks that didn't work so well.

So first of all, the process of evolution is, I think, not well understood. Meaning, sorry, the role, careful here. The role of evolution in the evolution of our cognition, of our intelligence. I don't know if that's, so this is an open question. So maybe clarify this point. Is neural networks, artificial neural networks are fixed for the most part in size.

This is exactly right. It's like a single human being that gets to learn. We don't have mechanisms of modifying or evolving those neural networks yet. Although you could think of researchers as doing exactly that. You have grad students working on different neural networks, and the ones that don't do a good job don't get promoted and get a good job.

There is a natural selection there, but other than that, it's an open question. It's a fascinating one. So Lex is going to come back. He's not available next week, but he's going to come back the week after. So we can pick up many of these points here. Are there any last final takeaways you want to emphasize?

Stay tuned and keep your head up because the future, I believe, is really promising. And the slides will be made available for sure. We're going to take a five minute break. I think a lot of the explorations of what it means to build an intelligent machine has been in sci-fi movies.

We're now beginning to actually make it a reality. This is Space Odyssey to keep with that theme in the previous lecture that we had. This is as opposed to the dreamlike monolith view when the astronaut is gazing out into the open sky at the stars. We're going to look at the practice of AI today and how we go.

If you're familiar with the movie, when this new technology appeared before our eyes and we're full of excitement, how we transfer that into actual practical impact on our lives. To quickly review what we talked about last time, I presented the technology and asked the question of whether this technology merely serves a special purpose to answer specific tasks that can be formalized or whether it can be through the process of transferring the knowledge learned on one domain be generalizable to where an intelligent system that's trained in a small domain can be used to achieve general intelligent tasks like we do as human beings.

This is kind of the stack of artificial intelligence going from all the way up to the top of the environment, the world. The sensors, the data, the intelligent system, the way it perceives this world. Then once you have this, you convert the world into some numbers, you're able to extract some representation of that world and this is where machine learning starts to come into play.

And then there's the part where I will raise it again today is can machine learning be doing the following steps too that we can do very well as human beings is the reasoning step. You know, you can tell the difference between a cat and a dog, but can you now start to reason about what it means to be alive, what it means to be a cat, a living creature, what it means to be this kind of physical object or this kind of physical object and take what's called common sense, things we take for granted, start to construct models of the world through reasoning.

Descartes, "I think, therefore I am." We want our neural networks to come up with that on their own. And once you do that, action. You'll go right back into the world and you start acting in that world. So the question is can machine learning, can this be learned from data or do experts need to encode the knowledge of reasoning, the knowledge of actions, the set of actions?

That's kind of the open question I raised. It continues throughout the talk today. And so as we start to think about how artificial intelligence, especially machine learning, as it realizes itself through robotics, gets to impact the world, we start thinking about what are the easy problems and what are the hard problems.

And it seems to us that vision and movement, walking, is easy because we've been doing it for millions of years, hundreds of millions of years, and thinking is hard, reasoning is hard. I propose to you that it's perhaps because we've only been doing it for a short time and so think we're quite special because we're able to think.

So we have to kind of question of what is easy and what is hard. Because when we start to develop some of these systems, you start to realize that all of these problems are equally hard. So the problem of walking that we take for granted, the actuation and the ability to recognize where you are in the physical space, to sense the world around you, to deal with the uncertainty of the perception problem.

And then, so all of these robots, by the way, this is for the most recent DARPA challenge, which MIT was also part of. And so what are these robots doing? They don't have any, they only have sparse communication with human beings on the periphery. So most of the stuff they have to do autonomously, like get inside a car.

This is an MIT robot, unfortunately. They have to get in the car and the hardest task, they have to get out of the car. That's walking. So this kind of raises to you the very real aspect here. You want to build applications that actually work in the real world.

And that's the first challenge and opportunity here. Many of the technologies we talked about currently crumble under the reality of our world. When we transfer them from a small data set in the lab to the real world. For the computer vision is perhaps one of the best illustrations of this.

Computer vision is the task, as we talked about, of interpreting images. And so when you, there's been a lot of great accomplishments on interpreting images, cats versus dogs. Now, when you try to create a system like the Tesla vehicle that I've often, that we work with, and I always talk about is it's a vision based robot, right?

It has radar for basic obstacle avoidance, but most of the understanding of the world comes from a single monocular camera. Now they've expanded the number of cameras, but for the most time, there's been 100,000 vehicles driving on the roads today with a single, essentially a single webcam. So when you start to do that, you have to perform all of these extraction of texture, color, optical flow.

So the movement through time, temporal dynamics of the images, you have to construct these patterns, construct the understanding of objects and entities and how they interact. And from that, you have to act in this world. And that's all based on this computer vision system. So it's no longer cats versus dogs.

It's, it's detection of pedestrians or the wrong classification. The wrong detection is the difference between life and death. So let's look at cats where things are a little more comfortable. Computer vision, and I would like to illustrate to you why this is such a hard task. We talked about, we've been doing it for 500 million years, so we think it's easy.

Computer vision is actually incredible. So all you're getting with your human eyes is you're getting essentially pixels in. There's light coming into your eyes and all you're getting is the reflection from the different surfaces in here of light. And there's perception, there's sensors inside your eyes converting that into numbers.

It's really very similar to this. Numbers, in the case of what we use with computers, RGB images, where the individual pixels are numbers from 0 to 255, so 256 possible numbers, and there's just a bunch of them. And that's all we get. We get a collection of numbers where they're spatially connected.

The ones that are close together are part of the same object, so cat pixels are all connected together. That's the only thing we have to help us, but the rest of it is just numbers, intensity numbers. And we have to use those numbers to classify what's in the image.

And if you really think about it, this is a really difficult task. All you get is these numbers. How the heck are you supposed to form a model of the world with which you can detect pedestrians with really 99.99999% accuracy? Because these pedestrians, or these cars, the cyclists in the car context, or any kind of applications you're looking at, even if your job is in the factory floor to detect the defective gummy bears that are flying past at like 100 miles an hour, your task is you don't want that bad gummy bear to get by, that your product and the brand will be damaged.

However serious or not serious your application is, you have to have a computer vision system that deals with all of these aspects. Viewpoint variation, scale variation, no matter the size of the object, it's still the same object. No matter the viewpoint from which area you look at that object, it's still the same object.

The lighting that moves, we have lighting consistently here because we're indoors, but when you're outdoors or you're moving, the scene is moving, the lighting, the complexity of the lighting variations is incredible. From the illumination to just the movement of the different objects in the scene. Now that we've had these conversations, I think about this every time I drive.

I think about you and this point and how hard it is to see these things. And particularly when I'm driving at night, and particularly when it's twilight and the light is changing, I think almost every time I drive there's one or two things that I see that I'm drawing in 200 million years in order to be able to figure out.

It's a guy who's opened his car door and I can't see him, but I can just see the light doesn't look quite right on that side of the road. And somehow I know in my mind it's a person. But it seems like an almost impossible problem for the machines to get right with sufficient accuracy.

I will argue that the pure perception task is too hard. That you come to the table as human beings with all this huge amount of knowledge. That you're not actually interpreting all the complex lighting variations that you're seeing. You actually know enough about the world, enough about your commute home, enough about the kinds of things you would see in this world, about Boston, about the way pedestrians move, the certain light of day.

You bring all that to the table that makes the perception task doable. And that's one of the big missing pieces in the technology. As I'll talk about, that's the open problem of machine learning. It's how to bring all that knowledge, first of all build that knowledge, and then bring that knowledge to the table.

As opposed to starting from scratch every time. And so, cats. I promise cats. Okay, so to me occlusion, for most of the computer vision community, this is one of the biggest challenges. And it really highlights how far we are from being able to reason about this world. Occlusions are when, what an occlusion is, is when the objects you're trying to detect, something about, classify the object, detect the object, the object is blocked partially by another object in front of them.

This is something you think is trivial perhaps, you don't even really think about it, because we reason in a three-dimensional way. But the occlusion aspect makes perception incredibly difficult. So we have to design, think about this. So this image is converted into numbers, and we, for the task of detecting, is there a cat in this image, yes or no?

You have to be able to reason about this image with that object in the scene. Most of us are able to very easily detect that there's a cat in this image. We're able to detect that there's a cat in this image. Now think about this, there's a single eye and there's an ear.

So you have to think about, what is it, part of our brain, that allows us to understand, to suppose that with some high degree of accuracy that there's a cat here in this picture. I mean the degree of occlusion here is immense. So I promise. So this is for most of you.

Some of you will think this is in fact a monkey eating a banana, but I would venture to say that most of us are able to tell it's nevertheless a cat. You'll watch this for hours. And so let me give you another, this is kind of a paper that's often cited, or a set of papers, that illustrate how difficult computer vision is, how thin the line that we're walking with all of these impressive results that we've been able to show recently in the machine learning community.

In this case, for deep neural networks are easily fooled paper, the seminal paper at this point, shows that when you apply a network trained on ImageNet, so basically on detecting cats versus dogs or different categories inside images, if you can find an arbitrary number of images that look like noise up in the top row, where the algorithm used to classify those images in ImageNet of cat versus dog, is able to confidently say with 99.6% accuracy or above, that it's seeing a robin or a cheetah or an armadillo or a panda in that noise.

So it's confidently saying, given this noise, that that's obviously a robin. So you have to realize that the kind of, this is patterns, the kind of processes it's using to understand what's contained in the image is purely a collection of patterns that it has been able to extract from other images that has been annotated by humans.

And that perhaps is very limiting to trying to create a system that's able to operate in the real world. This is a very clean illustration of that concept. In the same, you can confidently predict in those images below, where there are strong patterns, it's not even noise, strong patterns that have nothing to do with the entities being detected.

Again, confidently, that same algorithm is able to see a penguin, a starfish, a baseball and a guitar in that noise. And more serious for people designing robots like myself, on the sensor side, you can flip that and say, I can take an image and I can distort it with some very little amount of noise.

And if that noise is applied to the image, I can completely change the confident prediction about what's in that image. So to explain what's being shown, so on the left, in the column on the left, and again here, what's the same kind of neural network is able to predict accurately, confidently, that there is a dog in that image.

But if we apply just a little bit of noise to that image, to produce that image, imperceptible to our human eyes, the difference between those two, the same algorithm is saying that there is confidently an ostrich in that image. So another thing to really think about, how noise can have such a significant impact on the prediction of these algorithms.

This is really, really, quite honestly, out of all the things I'll say today and I'm aware of, one of the biggest challenges of machine learning being applied in the real world is robustness. How much noise can you add into the system before everything falls apart? So how do you validate sensors?

So say a car company has to produce a vehicle and it has sensors in that vehicle. How do you know that those sensors will not start generating slight noise due to interference of various kinds? And because of that noise, instead of seeing a pedestrian, it will see nothing or the opposite.

It will see pedestrians everywhere. So of course, the most dangerous is when it will not see an object and collide with it, in the case of cars. There's also spoofing, which a lot of people, as always, with security, people are really concerned about. And perhaps people here are really concerned about this issue.

I think this is a really important issue, but because you can apply noise and convince the system that you're seeing an ostrich when there's in fact no ostrich, you can do the same thing in an attacking way. So you can attack the sensors of a car and make it believe, like with LIDAR spoofing, so spoof LIDAR or radar or ultrasonic sensors to believe that you're seeing pedestrians when they're not there, and the opposite, to hide pedestrians, make pedestrians invisible to the sensor when they're in fact there.

So whenever you have intelligent systems operating in this world, they become susceptible to the fact that everything, so much of the work is done in software and based on sensors. So at any point in the chain, if there's a failure, you have to be able to detect that failure.

And right now we have no mechanisms for automatically detecting that failure. So on the data side, one challenge that we're constantly dealing with is that we, the algorithms and machine learning algorithms that we're using need labeled data. And we have very little labeled data. Labeled data, again, is when you have pairs of input data and the ground truth, the true label annotation class that that image belongs to or concept.

And it doesn't have to be an image, it could be any source of data. It's a really costly process to do. So because it's so costly, we rely, every breakthrough we've had so far relies on that labeled data. And because of its cost, we don't have much of it.

So all the problems that come from data can either be solved by having a lot more of this data, which I believe is, most people believe is too challenging. It's too challenging to have human beings annotate huge amounts of data. Or we have to develop algorithms that are able to do something with the unlabeled data.

It's the unsupervised, semi-supervised, sparsely supervised reinforcement learning. As we talked about last time, I'll mention again here. So one way you understand something about data when you don't have labels is you reason about it. All you're given is a few facts. When you're a baby, your parents give you a few facts, and you go into this world with those facts, and you grow your knowledge graph, your knowledge base, your understanding of the world from those few facts.

We don't have a good method of doing that in an automated, unrestricted way. The inefficiency of our learners, the machine learning algorithms I've talked about, neural networks, need a lot of examples of every single concept that they're given in order to learn anything about them. Thousands, tens of thousands of cats are needed to understand what the spatial patterns at every level, the representation of a cat, the visual representation of a cat.

We can't do anything with a single example. There's a few approaches, but nothing quite robust yet. And we haven't come up with a way--this is also possible-- to make annotation, this labeling process, somehow be very cheap. So leveraging--this is something being called human computation. That term has fallen out of favor a little bit.

One of my big passions is human computation, is using something about our behavior, something about what we do in this world online or in the real world, to annotate data automatically. So, for example, as you drive, which is what we do, everybody has to drive, and we can collect data about you driving in order to train self-driving vehicles to drive.

And that's a free annotation. So here are the annotated data sets we have, the supervised learning data sets. There's many, but these are some of the more famous ones, from the toy data sets of MNIST to the large, broad, arbitrary categories of images data sets, which is what ImageNet is.

And there's in health care, there's in audio, there's in video, there's a huge number of data sets now, but each one of them is usually in a scale of hundreds of thousands, millions, tens of millions, not billions or trillions, which is what we need to create systems that operate in the real world.

And again, these are the kinds of machine learning algorithms we have. There's five listed here. The teachers on the left is what is the input to the system that requires to train it. From the supervised learning at the very top is where we have all of our successes, and everything else is where the promise lies.

The semi-supervised, the reinforcement, or the fully unsupervised learning, where the input from the human is very minimal. And another way to think about this, so whenever you think about machine learning today, whenever somebody talks about machine learning, what they're talking about is systems that memorize, that memorize patterns. And so this is one of the big criticisms of the current machine learning approaches, where all they're doing is, they're only as good as the human annotated data that they're provided.

We don't have mechanisms for actually understanding. You can pause and think about this. In order to create an intelligent system, it shouldn't just memorize. It should understand the representations inside that data in order to operate in that world. And that's the open question, one of them. And one of the challenges and opportunities for machine learning researchers today is to extend machine learning from memorization to understanding.

This is that duck, the reasoning. If you get information from the perception systems that it looks like a duck, from the audio processing that it quacks like a duck, and then from video classification, the activity recognition that it swims like a duck, the reasoning step is how to connect those facts to then say that it is in fact a duck.

Okay, so that's on the algorithm side and the data side. Now this is one of the reasons computational power, computational hardware, that is at the core of the success of machine learning. So our algorithms have been the same since the '60s, since the '80s, '90s, depending on how you're counting.

The big breakthroughs came in compute. So there's Moore's Law. Most of you know the way the CPU side of our computers works for a single CPU is that it's, for the most part, executing a single action at a time in a sequence. So it's sequential, very different from our brain, which is a massively parallelized system.

So because it's sequential, the clock speed matters, because that's how fast, essentially, those instructions are able to be executed. And so we're leveling off. Physics is stopping us from continuing Moore's Law. Intel, AMD are aggressively pushing this Moore's Law forward. But--and there's some promise that it will actually continue for another 10 or 15 years.

Then there's another form of parallelism, massive parallelism, is the GPU. And this is essential for neural networks. This is essential to the success, recent success of neural networks, is the ability to utilize these inherently parallel architectures of graphics processing units, GPUs. The same thing used for video games. This is the reason NVIDIA stock is doing extremely well, is GPUs.

So it's parallelism of basic computational processes that make machine learning work on a GPU. One of the limitations of GPUs, one of the challenges is in bringing them to--in scaling, and bringing them into real-world applications, is power usage, is power consumption. And so there is a lot of specialized chips, specialized just from the neural network architectures, coming out from Google with their Tensor Processing Unit from IBM, Intel, and so on.

It's unclear how far this goes. So this is sort of the direction of trying to design an electronic brain so it has the efficiency. Our human brain is exceptionally efficient at running the neural networks in our heads. Or it is a magnitude more efficient than our computers are. And this is trying to design systems that are able to go towards that efficiency.

Why do you care about efficiency? For several reasons. One, of course, as I'm sure we'll talk about throughout this class, is about the thing in our smartphones, battery usage. And this is the big one, community. I think it could be attributed to the big breakthroughs in machine learning recently.

In the last decade, compute is important, algorithm development is important. But it's the community of nerds, global. This is global artificial intelligence. And I will show in several ways why global is essential here. Is tens of, hundreds of thousands, millions of programmers, mechanical engineers, building robots, building intelligence systems, building machine learning algorithms.

The exciting nature of the growth of the community perhaps is the key for the future to unlocking the power of machine learning. So this is just one example. GitHub is a repository for code. And this is showing on the y-axis at the bottom is 2008 when GitHub first opened.

And this is going up to 2012. Quick, near exponential growth of the number of users participating and the number of repositories. So these are standalone, unique projects that are being hosted on GitHub. So this is one example I'll show you about this competition that we're recently running. And then I'll challenge people here to participate in this competition, if you dare.

So this is a chance for you to build a neural network in your browser. So you can do this on your phone later tonight, of course. On your phone, you can specify various parameters of the neural network, specify different numbers of layers and the depth of the network, the number of neurons in the network, the type of layers.

It's pretty self-explanatory. It's super easy in terms of just tweaking little things. And remember, machine learning to a large part is an art at this point. It's more perhaps than even, you know, more than a well-understood, theoretically bounded science, which is one of the challenges. But it's also an opportunity.

Deep traffic is a chance. So we've all been stuck in traffic. There you go. Americans spend eight billion hours stuck in traffic every year. That's our pitch for this competition. So deep neural networks can help. And so you have a neural network that drives that little car with an MIT logo, red one, on this highway and tries to weave in and out of traffic to get to his destination.

And trying to achieve a speed of 80 miles an hour, which is the speed limit, which is the physical speed limit of the car. Of course, the actual speed limit of the road is 65 miles an hour. But we don't care about that. We just want to get to work as quickly as possible or home.

So what the basic structure of this game is, and I want to explain this game a little bit and then tell you how incredibly popular it's gotten and how incredibly powerful the networks that people have built from all over the world, the community that's built of this over a single month, is incredible.

And this happens for thousands of projects out there. Now, another challenging opportunity. OK, so you may have seen this. This is kind of ethics. Most engineers, most I personally don't like, I love the philosophy. But this kind of construction of ethics that's often presented here is one that is not usually concerned to engineering.

So what is this question? You know, when you have a car and you have a bunch of pedestrians, do you hit the larger group of pedestrians or the smaller group of pedestrians? Do you avoid the group of pedestrians, but put yourself into danger? These kinds of ethical questions of an intelligent system.

It's a very interesting question. It's one that we can debate and there's really no good answer, quite honestly. But it's a problem that both humans and machines struggle with. And so it's not interesting on the engineering side. We're interested with problems that we can solve on the engineering side.

So the kind of problem that I'm obsessed with and very interested in is the real world problem of controlling a vehicle through this space. So there's it happens in a few seconds here. So this is a Manhattan, New York intersection, right? This is pedestrians walking perfectly legally. I think they have a green light.

Of course, there's a lot of jaywalking, too, as well. Well, this car just like it's not part of the point. But yes, exactly. There's an ambulance. And so there's another car that starts making a left turn in a little bit. I mean, I missed it. Hopefully not. So, yeah.

And then there's another car after that, too, that just illustrates when you design an algorithm that's supposed to move through the space. Like watch this car. The aggression it shows. Now, this isn't a true example for those that try to build robots. This is this is the real question is how do you design a system that's able.

So you have to think you have to put reward functions, objective functions, utility functions under which it performs the planning. So a car like that has several thousand candidate trajectories you can take to that intersection. You can take a trajectory where it speeds up to 60 miles an hour.

It doesn't stop and just swerves and hits everything. OK, that's a bad trajectory. Right. Then there is a trajectory which most companies take, which most of Google self-driving car and every company that is concerned about PR is whenever there's any kind of obstacle, any kind of risk that's at all reasonable that you can maybe even touch an obstacle.

Then you're not going to take that trajectory. So what that means is you're going to navigate to this intersection at 10 miles an hour and let people abuse you by walking in front of you because they know you're not going to stop. And so in the middle there is hundreds, thousands of trajectories that are ethically questionable in the sense that you're putting other human beings at risk in order to safely and successfully navigate to an intersection.

And the design of those objective functions is is the kind of question you have to ask for intelligence systems for four cars. There's no grandma and a few children. You have to choose who gets to die. Very, very difficult problems, of course. But the problem of one I'm very interested in is streets of Boston, streets of New York is how to gently nudge yourself through a crowd of pedestrians in the way we all actually do when we drive in New York in order to be able to safely navigate these environments.

And these questions come up in health care. These questions come up in factory, in robots, in armed and humanoid robots that operate with other human beings. And that's one of the big challenges. Another sort of fun illustration that folks at OpenAI use often to illustrate, well let me just pause for a second, the gamified version of this.

There's a game called Coast Runners and you're racing against other boats along this track and your job is, there's your score here at the bottom left, number of laps, your time, and you're trying to get to the destination as quickly as possible while also collecting funky little things like these green little things along the way.

Okay, so what they've done is build an intelligence system, the general purpose one that we talked about last time that learns how to navigate successfully through the space. So you're trying to maximize the reward. And what this boat learns to do is instead of finishing the race, it learns to find a loop where it can keep going around and around, collecting those green dots, and it learns the fact that they regenerate with time.

So it learns to maximize the score by going around and around. Now these are the kinds of things, this is the big challenge of reward functions, of designing systems, of designing what you want your system to achieve. Not only is it difficult to, the ethical questions are difficult, but just avoiding the pitfalls of local optima, of figuring out something really good that happens in the short term, the greedy, what are those psychology experiments where the kid eats the marshmallow and can't wait for, can't delay gratification.

This kind of, the idea of delayed gratification in the case of designing intelligence systems is a huge, actual serious problem. And this is a good illustration of that. So, we flew through a few concepts here. Is there any questions about some of the compute and the algorithm side we talked about today?

So the question was, yeah you highlighted some of the limitations of machine, computer vision algorithms, machine learning algorithms, but you haven't highlighted some of the limitations of human beings. And if you put those in a column and you compare those, are machines doing better overall? Or is there any kind of way to compare those?

I mean there is actually interesting work on ImageNet, so ImageNet is this categorization task of where you have to classify images. And you can ask the question, when I present you images of cats and dogs, where are machines better than humans and when are they not? So you can compare when machines do better, what are the fail points, and what are the fail points for humans.

And there's a lot of interesting visual perception questions there. But I think overall, it's certainly true that machines fail differently than human beings. But in order to make an artificial intelligence system that's usable and could make you a lot of money, and people would want to use, it has to be better for that particular task in every single way.

In order for you to want to use the system, it has to be superior to human performance, and usually far superior to human performance. So on the philosophical level, it's an interesting thing to compare what are we good at, what are not. But if you're using Amazon Echo, your voice recognition, or any kind of natural language, chat bots, or a car, you're not going to be, well this car is not so good with pedestrians, but I appreciate the fact that it can stay in the lane.

Fortunately, you have a very high standard for every single thing that you're good at, and it has to be superior to that. I think, maybe that's unfair to the robots. I'm more of the nerd that makes the technology happen. But it's certainly, on the self-driving car aspect, policy is probably the biggest challenge.

And I don't think there's good answers there. Some of those ethical questions that come up, it feels like, so we work a lot with Tesla. So I'm driving a Tesla around every day, and we're playing around with it, and studying human behavior inside Teslas. And it seems like there's so much hunger amongst the media to jump on something.

And it feels like a very shaky PR terrain, a very shaky policy terrain, we're all walking. Because we have no idea how we coexist with intelligence systems. And then, of course, government is nervous, because how do we regulate this shaky terrain? Everybody's nervous, and excited. So, I'm not sure there's no great answer.

That's a perfect transition point, if that's okay. Same kind of question to Jason in a moment. Thanks a lot, Lex, for another great session.