Deep Learning State of the Art (2019)

00:00:00.000 | The thing I would very much like to talk about today is

00:00:02.860 | the state of the art in deep learning.

00:00:05.600 | Here we stand in 2019

00:00:08.400 | really at the height of some of the great accomplishments that have happened

00:00:12.900 | but also stand at the beginning.

00:00:14.900 | And it's up to us to define where

00:00:17.500 | this incredible data-driven technology takes us.

00:00:20.500 | And so I'd like to talk a little bit about

00:00:23.200 | the breakthroughs that happened in 2017 and 2018 that take us to this point.

00:00:29.600 | So this lecture is not on

00:00:34.600 | the state of the art results on main machine learning benchmarks.

00:00:38.600 | So the various image classification, object detection,

00:00:43.100 | or the NLP benchmarks, or the GAN benchmarks.

00:00:47.400 | This isn't about the cutting edge algorithm

00:00:51.100 | that's available on GitHub that performs best on a particular benchmark.

00:00:56.800 | This is about ideas.

00:00:58.800 | Ideas and developments that are at the cutting edge

00:01:02.800 | of what defines this exciting field of deep learning.

00:01:06.100 | And so I'd like to go through a bunch of different areas

00:01:09.600 | that I think are really exciting.

00:01:11.200 | Now of course this is also not a lecture that's complete.

00:01:15.200 | There's other things that I may be totally missing

00:01:17.700 | that happened in 2017 and 2018 that are

00:01:20.200 | particularly exciting to people here, people beyond.

00:01:24.200 | For example, medical applications of deep learning

00:01:27.400 | is something I totally don't touch on.

00:01:30.400 | And protein folding and all kinds of applications

00:01:34.200 | that there has been some exciting developments

00:01:36.400 | from DeepMind and so on that don't touch on.

00:01:39.400 | So forgive me if your favorite developments are missing,

00:01:43.200 | but hopefully this encompasses some of the really

00:01:46.600 | fundamental things that have happened,

00:01:48.600 | both on the theory side, on the application side,

00:01:51.600 | and on the community side of all of us being able to work together

00:01:54.600 | on these kinds of technologies.

00:01:57.600 | I think 2018, in terms of deep learning,

00:02:01.200 | is the year of natural language processing.

00:02:03.600 | Many have described this year as the ImageNet moment

00:02:08.400 | in 2012 for computer vision when AlexNet

00:02:12.400 | was the first neural network that really gave that big jump

00:02:16.200 | in performance on computer vision.

00:02:17.800 | It started to inspire people what's possible

00:02:20.400 | with deep learning, with purely learning-based methods.

00:02:23.200 | In the same way, there's been a series of developments

00:02:26.600 | from 2016, '17, and led up to '18,

00:02:31.400 | with the development of BERT that has made on benchmarks

00:02:39.200 | and in our ability to apply NLP to solve various NLP tasks,

00:02:44.200 | natural language processing tasks, a total leap.

00:02:48.000 | So let's tell the story of what takes us there.

00:02:50.400 | There's a few developments.

00:02:52.200 | I've mentioned a little bit on Monday

00:02:54.600 | about the encoder-decoder recurrent neural networks.

00:02:59.000 | So this idea of recurrent neural networks

00:03:04.000 | encode sequences of data and output something.

00:03:09.000 | Output either a single prediction or another sequence.

00:03:13.000 | When the input sequence and the output sequence

00:03:16.000 | are not necessarily the same size,

00:03:19.200 | they're like in machine translation.

00:03:21.200 | We have to translate from one language to another.

00:03:24.600 | The encoder-decoder architecture

00:03:29.000 | takes the following process.

00:03:31.000 | It takes in the sequence of words

00:03:33.400 | or the sequence of samples as the input

00:03:36.600 | and uses the recurrent units,

00:03:38.800 | whether it's LSTM or GRUs or beyond,

00:03:42.400 | and encodes that sentence into a single vector.

00:03:47.200 | So forms an embedding of that sentence

00:03:50.600 | of what it represents, a representation of that sentence.

00:03:54.600 | And then feeds that representation

00:03:58.600 | in the decoder recurrent neural network

00:04:01.600 | that then generates the sequence of words

00:04:07.600 | that form the sentence in the language

00:04:10.800 | that's being translated to.

00:04:13.000 | So first you encode by taking a sequence

00:04:16.000 | and mapping it to a fixed size vector representation

00:04:20.400 | and then you decode by taking that fixed size vector representation

00:04:23.600 | and unrolling it into the sentence

00:04:26.400 | that can be of different length than the input sentence.

00:04:29.000 | Okay, that's the encoder-decoder structure

00:04:31.400 | for recurrent neural networks.

00:04:32.800 | It's been very effective for machine translation

00:04:36.400 | and dealing with arbitrary length input sequences,

00:04:40.000 | arbitrary length output sequences.

00:04:43.000 | Next step, attention.

00:04:46.200 | What is attention?

00:04:47.200 | Well, it's the next step beyond.

00:04:49.600 | It's an improvement on the encoder-decoder architecture.

00:04:56.400 | It provides a mechanism

00:05:00.200 | that allows to look back at the input sequence.

00:05:02.400 | So as opposed to saying

00:05:04.600 | that you have a sequence that's the input sentence

00:05:08.000 | and that all gets collapsed

00:05:09.600 | into a single vector representation,

00:05:11.600 | you're allowed to look back at the particular samples

00:05:14.400 | from the input sequence

00:05:17.000 | as part of the decoding process.

00:05:19.000 | That's attention.

00:05:20.400 | And you can also learn

00:05:23.000 | which aspects are important

00:05:26.000 | for which aspects of the decoding process,

00:05:28.600 | which aspects of the input sequence

00:05:30.600 | are important to the output sequence.

00:05:35.000 | Visualize another way.

00:05:37.600 | And there's a few visualizations here

00:05:40.600 | that are quite incredible

00:05:42.000 | that are done by Jay Alomar.

00:05:45.600 | I highly recommend you follow the links

00:05:49.800 | and look at the further details

00:05:52.200 | of these visualizations of attention.

00:05:54.600 | So if we look at neural machine translation,

00:05:57.200 | the encoder RNN takes a sequence of words

00:06:00.200 | and throughout, after every sequence,

00:06:03.200 | forms a set of hidden representations,

00:06:07.200 | a hidden state that captures the representation

00:06:10.200 | of the words that followed.

00:06:13.200 | And those sets of hidden representations

00:06:16.200 | as opposed to being collapsed

00:06:17.200 | to a single fixed size vector

00:06:18.800 | are then all pushed forward to the decoder

00:06:22.600 | that are then used by the decoder to translate,

00:06:25.800 | but in a selective way

00:06:27.800 | where the decoder,

00:06:29.400 | here visualized on the y-axis,

00:06:32.200 | the input language and on the x,

00:06:34.000 | the output language.

00:06:37.600 | The decoder weighs the different parts

00:06:40.800 | of the input sequence,

00:06:43.000 | differently in order to determine

00:06:45.800 | how to best translate,

00:06:47.400 | generate the word that forms a translation

00:06:50.000 | in the full output sentence.

00:06:52.000 | Okay, that's attention.

00:06:54.200 | Allowing, expanding the

00:06:56.000 | encoder-decoder architecture

00:06:58.600 | to allow for selective attention

00:07:04.000 | to the input sequence

00:07:05.200 | as opposed to collapsing everything down

00:07:06.800 | into a fixed representation.

00:07:08.800 | Okay, next step, self-attention.

00:07:12.600 | In the encoding process,

00:07:15.200 | allowing the encoder

00:07:18.400 | to also selectively look

00:07:22.000 | in forming the hidden representations

00:07:24.200 | at other parts of the input sequence

00:07:27.600 | in order to form those representations.

00:07:30.000 | It allows you to determine

00:07:33.600 | for certain words

00:07:36.000 | what are the important relevant aspects

00:07:38.200 | of the input sequence that can help you

00:07:40.000 | encode that word the best.

00:07:43.200 | So it improves the encoder process

00:07:45.200 | by allowing it to look

00:07:46.600 | at the entirety of the context.

00:07:48.200 | That's self-attention.

00:07:52.600 | Building on that, transformer.

00:07:55.600 | It's using the self-attention mechanism

00:07:59.000 | in the encoder

00:08:01.000 | to form these sets of representations

00:08:03.200 | on the input sequence.

00:08:04.800 | And then as part of the decoding process,

00:08:07.200 | follow the same but in reverse

00:08:09.400 | with a bunch of self-attention

00:08:11.400 | that's able to look back again.

00:08:13.400 | So it's self-attention on the encoder,

00:08:15.400 | attention on the decoder,

00:08:17.200 | and that's where the magic,

00:08:18.800 | that's where the entirety of the magic is

00:08:21.200 | that's able to capture the rich context

00:08:24.400 | of the input sequence

00:08:26.000 | in order to generate

00:08:27.800 | in a contextual way the output sequence.

00:08:30.600 | So let's take a step back then

00:08:32.800 | and look at what is critical

00:08:35.200 | to natural language

00:08:38.000 | in order to be able to reason about words,

00:08:40.600 | construct a language model

00:08:42.600 | and be able to reason about the words

00:08:45.200 | in order to classify a sentence,

00:08:46.800 | to translate a sentence

00:08:48.600 | or compare two sentences and so on.

00:08:52.000 | The sentences are

00:08:54.600 | collections of words or characters

00:08:57.000 | and those characters and words have to

00:08:59.200 | have an efficient representation

00:09:01.000 | that's meaningful for that kind of understanding.

00:09:03.200 | And that's what the process of embedding is.

00:09:05.600 | We talked a little bit about it on Monday

00:09:07.600 | and so the traditional Word2Vec process of embedding

00:09:11.400 | is you use some kind of trick

00:09:13.600 | in an unsupervised way to map words

00:09:16.600 | into a compressed representation.

00:09:22.200 | So language modeling is the process

00:09:26.200 | of determining which words

00:09:28.000 | follow each other usually.

00:09:29.400 | So one way you can use it

00:09:31.600 | in a skip-gram model

00:09:33.600 | taking huge data sets of words,

00:09:36.800 | you know there's writing all over the place,

00:09:38.600 | taking those data sets

00:09:40.200 | and feeding a neural network

00:09:42.200 | that in a supervised way

00:09:45.000 | looks at which words

00:09:47.200 | are usually follow the input.

00:09:50.000 | So the input is a word,

00:09:51.600 | the output is which word

00:09:53.200 | are statistically likely to follow that word

00:09:55.400 | and the same with the preceding word.

00:09:57.200 | And doing this kind of unsupervised learning

00:10:00.600 | which is what Word2Vec does,

00:10:02.800 | if you throw away the output and the input

00:10:05.200 | and just take in the hidden representation form in the middle

00:10:08.000 | that's how you form this compressed embedding

00:10:11.000 | a meaningful representation

00:10:13.600 | that when two words are related

00:10:15.400 | in a language modeling sense

00:10:17.400 | two words are related they're going to be

00:10:19.800 | in that representation close to each other

00:10:21.600 | and when they're totally unrelated

00:10:23.000 | have nothing to do with each other they're far away.

00:10:25.600 | ELMo is the approach of using

00:10:28.200 | bi-directional LSTMs

00:10:30.400 | to learn that representation.

00:10:32.200 | And what bi-directional, bi-directionally

00:10:34.400 | so looking not just the sequence that led up to the word

00:10:36.800 | but in both directions

00:10:38.200 | the sequence that followed the sequence that before.

00:10:41.200 | And that allows you to learn

00:10:44.600 | the rich full context of the word.

00:10:48.600 | In learning the rich full context of the word

00:10:51.800 | you're forming representations

00:10:53.200 | that are much better able to represent

00:10:55.200 | the statistical language model

00:10:58.800 | behind the kind of corpus of language

00:11:02.200 | that you're looking at.

00:11:04.200 | And this has taken a big leap

00:11:08.400 | in ability to then further algorithms

00:11:13.600 | that then with the language model

00:11:15.600 | a reasoning about doing things like

00:11:18.000 | sentence classification, sentence comparison, so on

00:11:20.400 | translation, that representation is much more effective

00:11:24.400 | for working with language.

00:11:26.000 | The idea of the OpenAI transformer

00:11:30.000 | is the next step forward

00:11:33.200 | is taking the same transformer

00:11:36.000 | that I mentioned previously

00:11:37.600 | the encoder with self-attention

00:11:39.400 | decoder with attention looking back at the input sequence

00:11:43.400 | and using it

00:11:46.600 | taking the language learned by the decoder

00:11:50.600 | and using that as a language model

00:11:54.600 | and then chopping off layers

00:11:56.000 | and training on a specific language task

00:11:59.800 | like sentence classification.

00:12:02.200 | Now BERT is the thing that

00:12:05.200 | that did the big leap in performance.

00:12:08.200 | With the transformer formulation there's always

00:12:11.400 | there's no bi-directional element

00:12:13.400 | there is, it's always moving forward

00:12:15.800 | so the encoding step and the decoding step

00:12:18.600 | with BERT is, it's richly bi-directional

00:12:23.600 | it takes in the full sequence of the sentence

00:12:29.400 | and masks out

00:12:33.400 | some percentage of the words

00:12:35.000 | 15% of the words

00:12:36.400 | 15% of the samples, the tokens from the sequence

00:12:40.400 | and tasks the entire encoding

00:12:46.400 | self-attention mechanism to predict

00:12:50.200 | the words that are missing.

00:12:52.400 | That construct and then you stack a ton of them together

00:12:56.600 | a ton of those encoders

00:12:58.600 | self-attention feed-forward network

00:13:00.000 | self-attention feed-forward network together

00:13:02.000 | and that allows you to learn the rich context

00:13:05.200 | of the language to then at the end

00:13:07.600 | perform all kinds of tasks.

00:13:09.800 | You can create first of all like ELMo

00:13:13.400 | and like Word2Vec

00:13:14.800 | create rich contextual embeddings

00:13:17.600 | take a set of words and represent them

00:13:21.400 | in the space that's very efficient to reason with.

00:13:24.200 | You can do language classification

00:13:26.200 | you can do sentence pair classification

00:13:29.400 | you could do the similarity of two sentences

00:13:31.600 | multiple choice question answering

00:13:33.200 | general question answering

00:13:34.800 | tagging of sentences.

00:13:37.200 | Okay, I lingered on that one a little bit too long

00:13:41.200 | but it is also the one I'm really excited about

00:13:45.800 | and really if there's a breakthrough this year

00:13:48.400 | it's thanks to BERT.

00:13:51.200 | The other thing I'm very excited about

00:13:54.200 | is totally jumping away from the

00:13:58.200 | NeurIPS, the theory

00:14:03.200 | those kind of academic developments in deep learning

00:14:06.400 | and into the world of applied deep learning.

00:14:10.200 | So Tesla has a system called Autopilot

00:14:15.200 | where the hardware version 2 of that system

00:14:19.200 | is a implementation of the NVIDIA Drive PX2 system

00:14:29.200 | which runs a ton of neural networks.

00:14:32.200 | There's eight cameras on the car

00:14:35.600 | and a variant of the Inception network

00:14:40.600 | is now taking in all eight cameras

00:14:45.600 | at different resolutions as input

00:14:47.600 | and performing various tasks

00:14:50.600 | like drivable area segmentation

00:14:53.600 | like object detection

00:14:55.600 | and some basic localization tasks.

00:14:58.600 | So you have now a huge fleet of vehicles

00:15:03.600 | where it's not engineers

00:15:05.600 | some I'm sure are engineers

00:15:07.600 | but it's really regular consumers

00:15:09.600 | people that have purchased the car

00:15:12.600 | have no understanding in many cases

00:15:14.600 | of what a neural networks limitations

00:15:16.000 | and capabilities are so on.

00:15:17.600 | Now it has a neural network

00:15:19.600 | is controlling the well-being

00:15:21.600 | its decisions, its perceptions

00:15:25.600 | and the control decision based on those perceptions

00:15:27.600 | are controlling the life of a human being.

00:15:30.600 | And that to me is one of the great

00:15:32.600 | sort of breakthroughs of 17 and 18

00:15:35.600 | in terms of the development

00:15:39.600 | of what AI can do in a practical sense

00:15:42.600 | in impacting the world.

00:15:43.600 | And so one billion miles

00:15:46.600 | over one billion miles have been driven in autopilot.

00:15:49.600 | Now there's two types of systems

00:15:51.600 | currently operating in Tesla's

00:15:53.600 | there's hardware version one

00:15:55.600 | hardware version two.

00:15:56.600 | Hardware version one was Intel Mobileye

00:15:59.600 | monocular camera perception system

00:16:01.600 | as far as we know that was not using a neural network

00:16:04.600 | and it was a fixed system

00:16:05.600 | that wasn't learning at least online learning in the Tesla's.

00:16:08.600 | The other is hardware version two

00:16:10.600 | and it's about half and half now

00:16:12.600 | in terms of the miles driven.

00:16:14.600 | The hardware version two has a neural network

00:16:16.600 | that's always learning.

00:16:17.600 | There's weekly updates.

00:16:18.600 | It's always improving the model

00:16:20.600 | shipping new weights and so on.

00:16:22.600 | That's the exciting set of breakthroughs.

00:16:25.600 | In terms of AutoML

00:16:27.600 | the dream of automating

00:16:30.600 | some aspects or all aspects

00:16:32.600 | or as many aspects as possible

00:16:33.600 | of the machine learning process

00:16:35.600 | where you can just

00:16:37.600 | drop in a data set that you're working on

00:16:41.600 | and the system will automatically determine

00:16:45.600 | all the parameters

00:16:47.600 | from the details of the architectures

00:16:49.600 | the size of the architecture

00:16:51.600 | the different modules in that architecture

00:16:53.600 | the hyper parameters

00:16:55.600 | used for training the architecture

00:16:58.600 | running that, doing inference, everything.

00:17:00.600 | All is done for you

00:17:01.600 | all you just feed it is data.

00:17:03.600 | So that's been the success

00:17:06.600 | of the neural architecture search

00:17:08.600 | in '16 and '17

00:17:10.600 | and there's been a few ideas

00:17:12.600 | with Google AutoML

00:17:13.600 | that's really trying to almost create an API

00:17:15.600 | where you just drop in your data set

00:17:17.600 | and it's using reinforcement learning

00:17:19.600 | and recurring neural networks

00:17:21.600 | to given a few modules

00:17:24.600 | stitch them together in such a way

00:17:26.600 | where the objective function is optimizing

00:17:28.600 | the performance of the overall system

00:17:30.600 | and they showed a lot of exciting results

00:17:32.600 | Google showed and others

00:17:34.600 | that outperform state of the art systems

00:17:36.600 | both in terms of efficiency

00:17:37.600 | and in terms of accuracy.

00:17:39.600 | Now in '18

00:17:41.600 | there have been a few improvements

00:17:43.600 | on this direction

00:17:44.600 | and one of them is Attenet

00:17:46.600 | where it's now using

00:17:49.600 | the same reinforcement learning

00:17:51.600 | AutoML formulation

00:17:52.600 | to build ensembles on neural networks

00:17:54.600 | so in many cases

00:17:56.600 | state of the art performance

00:17:57.600 | can be achieved

00:17:58.600 | by as opposed to taking

00:18:00.600 | a single architecture

00:18:02.600 | is building up a multitude

00:18:04.600 | an ensemble, a collection of architectures

00:18:06.600 | and that's what is doing here

00:18:08.600 | is given a candidate architectures

00:18:11.600 | stitching them together

00:18:12.600 | to form an ensemble

00:18:13.600 | to get state of the art performance

00:18:15.600 | now that state of the art performance

00:18:17.600 | is not a leap

00:18:20.600 | a breakthrough leap forward

00:18:22.600 | but it's nevertheless a step forward

00:18:24.600 | and it's a very exciting field

00:18:27.600 | that's going to be receiving

00:18:28.600 | more and more attention

00:18:30.600 | there's an area of machine learning

00:18:33.600 | that's heavily understudied

00:18:35.600 | and I think it's extremely exciting area

00:18:37.600 | and if you look at 2012

00:18:43.600 | with AlexNet

00:18:45.600 | achieving the breakthrough performance

00:18:48.600 | of showing that

00:18:49.600 | what deep learning networks are capable of

00:18:52.600 | from that point on

00:18:54.600 | from 2012 to today

00:18:56.600 | there's been non-stop

00:18:57.600 | extremely active developments

00:18:59.600 | of different architectures

00:19:00.600 | that even on just ImageNet alone

00:19:02.600 | on doing the image classification task

00:19:04.600 | have improved performance

00:19:08.600 | over and over and over

00:19:09.600 | with totally new ideas

00:19:11.600 | now on the other side

00:19:12.600 | on the data side

00:19:14.600 | there's been very few ideas

00:19:17.600 | about how to do data augmentation

00:19:19.600 | so data augmentation

00:19:22.600 | is the process of

00:19:25.600 | you know it's what

00:19:27.600 | kids always do

00:19:28.600 | when you learn about an object right

00:19:30.600 | is you look at an object

00:19:31.600 | and you kind of like twist it around

00:19:34.600 | is taking the raw data

00:19:39.600 | and messing it with such a way

00:19:41.600 | that it can give you

00:19:42.600 | much richer representation

00:19:44.600 | of what this data can look like

00:19:47.600 | in other forms

00:19:48.600 | in other contexts in the real world

00:19:51.600 | there's been very few developments

00:19:54.600 | I think still

00:19:55.600 | and there's this auto-augment

00:19:57.600 | is just a step

00:19:58.600 | a tiny step into that direction

00:20:00.600 | that I hope that we as a community

00:20:03.600 | invest a lot of effort in

00:20:05.600 | so what auto-augment does

00:20:07.600 | is it says

00:20:08.600 | okay so there's these

00:20:11.600 | data augmentation methods

00:20:13.600 | like translating the image

00:20:14.600 | sharing the image

00:20:16.600 | doing color manipulation

00:20:17.600 | like color inversion

00:20:19.600 | let's take those as basic actions

00:20:21.600 | you can take

00:20:22.600 | and then use reinforcement learning

00:20:23.600 | and an RNN again construct

00:20:26.600 | to stitch those actions together

00:20:28.600 | in such a way

00:20:29.600 | that can augment data

00:20:32.600 | like on ImageNet

00:20:34.600 | to when you train on that data

00:20:37.600 | it gets state-of-the-art performance

00:20:40.600 | so mess with the data

00:20:41.600 | in a way that optimizes

00:20:44.600 | the way you mess with the data

00:20:46.600 | so and then they've also showed

00:20:49.600 | that given that

00:20:51.600 | the set of data augmentation policies

00:20:55.600 | that are learned to optimize

00:20:57.600 | for example for ImageNet

00:20:59.600 | given some kind of architecture

00:21:01.600 | you can take that

00:21:03.600 | learned set of policies

00:21:05.600 | for data augmentation

00:21:06.600 | and apply it to

00:21:08.600 | a totally different data set

00:21:10.600 | so there's the process of transfer learning

00:21:14.600 | so what is transfer learning?

00:21:16.600 | So we talked about

00:21:17.600 | transfer learning

00:21:18.600 | you have a neural network

00:21:19.600 | that learns to do

00:21:20.600 | cat versus dog

00:21:21.600 | or no

00:21:22.600 | learns to do a thousand class

00:21:24.600 | classification problem on ImageNet

00:21:26.600 | and then you transfer

00:21:28.600 | you chop off few layers

00:21:29.600 | and you transfer on the task

00:21:30.600 | of your own data set

00:21:31.600 | of cat versus dog

00:21:33.600 | what you're transferring

00:21:34.600 | is the weights

00:21:36.600 | that are learned

00:21:38.600 | on the ImageNet

00:21:39.600 | classification task

00:21:40.600 | and now you're then

00:21:42.600 | fine-tuning those weights

00:21:44.600 | on the specific

00:21:48.600 | personal cat versus dog data set

00:21:50.600 | you have

00:21:52.600 | now you could do the same thing here

00:21:55.600 | you can transfer

00:21:57.600 | as part of the transfer learning process

00:21:59.600 | take the data augmentation policies

00:22:02.600 | learned on ImageNet

00:22:04.600 | and transfer those

00:22:05.600 | you can transfer both the weights

00:22:06.600 | and the policies

00:22:08.600 | that's a really super exciting idea

00:22:12.600 | I think

00:22:13.600 | it wasn't quite demonstrated

00:22:14.600 | extremely well here

00:22:16.600 | in terms of performance

00:22:18.600 | so it got an improvement

00:22:20.600 | in performance and so on

00:22:21.600 | but it kind of inspired an idea

00:22:24.600 | that's something

00:22:25.600 | that we need to really think about

00:22:26.600 | how to augment data

00:22:28.600 | in an interesting way

00:22:29.600 | such that

00:22:31.600 | given just a few samples

00:22:33.600 | of data

00:22:35.600 | we can generate huge data sets

00:22:37.600 | in a way that you can then form

00:22:39.600 | meaningful, complex, rich representations from

00:22:43.600 | I think that's really exciting

00:22:44.600 | and one of the ways that

00:22:46.600 | you break open the problem

00:22:47.600 | of how do we learn a lot from a little

00:22:50.600 | training deep neural networks

00:22:53.600 | with synthetic data

00:22:54.600 | this

00:22:56.600 | also really an exciting topic

00:22:58.600 | that a few groups

00:23:01.600 | but especially NVIDIA

00:23:02.600 | has invested a lot in

00:23:03.600 | and here's a

00:23:04.600 | from a CVPR 2018

00:23:06.600 | probably my favorite work on this topic

00:23:08.600 | is

00:23:09.600 | they really went crazy

00:23:12.600 | and said okay let's mess

00:23:13.600 | with synthetic data

00:23:15.600 | in every way we could possibly can

00:23:18.600 | so on the left there

00:23:19.600 | shown a set of backgrounds

00:23:21.600 | then there's also set of artificial objects

00:23:23.600 | and you have a car

00:23:24.600 | or some kind of object

00:23:25.600 | that you're trying to classify

00:23:27.600 | so let's take that car

00:23:28.600 | and mess with it

00:23:29.600 | with every way possible

00:23:31.600 | apply lighting variation to it

00:23:33.600 | with every way possible

00:23:35.600 | rotate everything that is crazy

00:23:37.600 | so

00:23:38.600 | what NVIDIA

00:23:40.600 | is really good at

00:23:41.600 | is creating realistic scenes

00:23:43.600 | and they said okay

00:23:44.600 | let's create realistic scenes

00:23:46.600 | but let's also go way above board

00:23:48.600 | and not do realistic at all

00:23:50.600 | do things that can't possibly happen in reality

00:23:53.600 | and so generally these huge data sets

00:23:55.600 | I want you to train

00:23:56.600 | and again achieve quite interesting

00:23:58.600 | quite good performance

00:24:02.600 | on image classification

00:24:04.600 | of course

00:24:05.600 | they're trying to apply to ImageNet

00:24:06.600 | and so on these kinds of tasks

00:24:08.600 | you're not going to outperform

00:24:10.600 | networks that were trained on ImageNet

00:24:12.600 | but they show that

00:24:13.600 | with just a small sample from

00:24:15.600 | from those real images

00:24:18.600 | they can fine tune this network train

00:24:20.600 | on synthetic images

00:24:21.600 | totally fake images

00:24:22.600 | to achieve state of the art performance

00:24:25.600 | again another way to generate

00:24:27.600 | to get to learn a lot from very little

00:24:31.600 | by generating fake worlds

00:24:33.600 | synthetically

00:24:37.600 | the process of annotation

00:24:39.600 | which for supervised learning

00:24:41.600 | is what you need to do

00:24:43.600 | in order to

00:24:45.600 | train the network

00:24:46.600 | you need to be able to provide ground truth

00:24:47.600 | you need to be able to label

00:24:49.600 | whatever the entity that is

00:24:51.600 | being learned

00:24:52.600 | and so for image classification

00:24:54.600 | that's saying what is going on in the image

00:24:56.600 | and part of that was done

00:24:58.600 | on ImageNet

00:24:59.600 | by doing a Google search

00:25:00.600 | for creating candidates

00:25:02.600 | now

00:25:03.600 | saying what's going on in the image

00:25:05.600 | is a pretty easy task

00:25:06.600 | then there is the

00:25:08.600 | object detection task

00:25:09.600 | of detecting the bounty box

00:25:11.600 | and so saying

00:25:12.600 | drawing the actual bounty box

00:25:14.600 | is a little bit more difficult

00:25:16.600 | but it's a couple of clicks and so on

00:25:18.600 | then if we take the final

00:25:20.600 | the

00:25:22.600 | probably one of the higher

00:25:24.600 | complexity tasks

00:25:26.600 | of perception

00:25:28.600 | of image understanding

00:25:29.600 | is segmentation

00:25:30.600 | is actually drawing

00:25:32.600 | either pixel level or polygons

00:25:34.600 | the outline of a particular object

00:25:36.600 | now if you have to annotate that

00:25:38.600 | that's extremely costly

00:25:39.600 | so the work with

00:25:41.600 | polygon RNN

00:25:42.600 | is to use recurring neural networks

00:25:44.600 | to make suggestions for polygons

00:25:46.600 | it's really interesting

00:25:48.600 | there's a few tricks

00:25:50.600 | to form these high resolution polygons

00:25:52.600 | so the idea is

00:25:53.600 | it drops in a single point

00:25:55.600 | you draw a

00:25:57.600 | you draw a bounty box around an object

00:25:59.600 | you use

00:26:01.600 | convolution neural networks to drop

00:26:03.600 | the first point

00:26:04.600 | and then you use recurring neural networks

00:26:05.600 | to draw around it

00:26:07.600 | and the performance

00:26:09.600 | is really good

00:26:10.600 | there's a few tricks

00:26:11.600 | and this tool is available online

00:26:12.600 | it's a really interesting idea

00:26:14.600 | again

00:26:15.600 | the dream

00:26:16.600 | with AutoML

00:26:17.600 | is to remove the human from the picture

00:26:19.600 | as much as possible

00:26:20.600 | with data augmentation

00:26:22.600 | remove the human from the picture

00:26:23.600 | as much as possible

00:26:24.600 | for menial data

00:26:25.600 | automate the boring stuff

00:26:27.600 | and in this case

00:26:28.600 | the act of drawing a polygon

00:26:30.600 | try to automate it

00:26:31.600 | as much as possible

00:26:33.600 | the

00:26:35.600 | interesting

00:26:37.600 | other dimension

00:26:39.600 | along which

00:26:41.600 | deep learning is

00:26:43.600 | recently been trying to be optimized

00:26:45.600 | is

00:26:46.600 | how

00:26:47.600 | do we make

00:26:48.600 | deep learning accessible

00:26:50.600 | fast

00:26:51.600 | cheap

00:26:52.600 | accessible

00:26:53.600 | so the Dawn Bench

00:26:54.600 | from Stanford

00:26:55.600 | the benchmark

00:26:56.600 | the Dawn Bench

00:26:57.600 | benchmark

00:26:58.600 | from Stanford

00:26:59.600 | asked

00:27:00.600 | formulated an interesting competition

00:27:02.600 | which got a lot of attention

00:27:04.600 | and a lot of progress

00:27:06.600 | it's saying

00:27:07.600 | if we want to achieve

00:27:09.600 | 93% accuracy on ImageNet

00:27:11.600 | and 94% on CIFAR-10

00:27:13.600 | let's now

00:27:15.600 | compete

00:27:16.600 | that's like the requirement

00:27:17.600 | let's now compete

00:27:19.600 | how you can do it

00:27:20.600 | in the least amount of time

00:27:21.600 | and for the least amount of dollars

00:27:24.600 | do the training in the least amount of time

00:27:27.600 | and the training in the least amount of dollars

00:27:29.600 | like literally dollars

00:27:30.600 | you're allowed to spend

00:27:31.600 | to do this

00:27:33.600 | and Fast.ai

00:27:35.600 | you know

00:27:36.600 | it's a renegade

00:27:37.600 | awesome

00:27:38.600 | renegade group of deep learning researchers

00:27:40.600 | have been able to train

00:27:42.600 | on ImageNet in 3 hours

00:27:44.600 | so this is for training process

00:27:46.600 | for 25 bucks

00:27:48.600 | so training a network

00:27:50.600 | that achieves 93% accuracy

00:27:53.600 | for 25 bucks

00:27:55.600 | and 94% accuracy

00:27:57.600 | for 26 cents

00:27:59.600 | on CIFAR-10

00:28:01.600 | so the key idea

00:28:02.600 | that they were playing with

00:28:04.600 | is quite simple

00:28:05.600 | but really it boils down to

00:28:07.600 | messing with the learning rate

00:28:09.600 | throughout the process of training

00:28:11.600 | so the learning rate is

00:28:13.600 | how much you based on the loss

00:28:15.600 | based on the error the neural network observes

00:28:17.600 | how much do you adjust the weights

00:28:19.600 | so they found

00:28:23.600 | that if they

00:28:24.600 | crank up

00:28:25.600 | the learning rate

00:28:27.600 | while decreasing

00:28:28.600 | the momentum

00:28:30.600 | which is a parameter of the optimization process

00:28:32.600 | where they do it that jointly

00:28:34.600 | they're able

00:28:35.600 | to make the network learn

00:28:37.600 | really fast

00:28:39.600 | that's really exciting

00:28:40.600 | and the benchmark itself

00:28:41.600 | is also really exciting

00:28:43.600 | because that's exactly

00:28:44.600 | for people sitting in this room

00:28:46.600 | that opens up the door

00:28:48.600 | to doing all kinds of

00:28:50.600 | fundamental deep learning problems

00:28:52.600 | without the resources

00:28:54.600 | of Google DeepMind

00:28:56.600 | or OpenAI

00:28:57.600 | or Facebook

00:28:58.600 | or so on

00:28:59.600 | without computational resources

00:29:01.600 | that's important for academia

00:29:02.600 | that's important for independent researchers

00:29:04.600 | and so on

00:29:05.600 | so GANs

00:29:07.600 | there's been a lot of work

00:29:09.600 | on generative adversarial neural networks

00:29:11.600 | and

00:29:13.600 | in some ways there's not been

00:29:15.600 | breakthrough

00:29:17.600 | ideas

00:29:19.600 | in GANs for quite a bit

00:29:21.600 | and I think

00:29:23.600 | BigGAN from

00:29:25.600 | from Google DeepMind

00:29:27.600 | the ability to generate

00:29:29.600 | incredibly high resolution

00:29:31.600 | images

00:29:33.600 | and

00:29:34.600 | it's the same GAN technique

00:29:36.600 | so in terms of breakthroughs innovations

00:29:38.600 | but scaled

00:29:40.600 | so increase the model capacity

00:29:42.600 | and increase the batch size

00:29:44.600 | the number of images that are

00:29:46.600 | fed

00:29:48.600 | to the network

00:29:50.600 | it produces incredible images

00:29:52.600 | I encourage you to go online

00:29:54.600 | and look at them

00:29:55.600 | it's hard to believe that they're

00:29:57.600 | generated

00:29:59.600 | so that was

00:30:01.600 | 2018 for GANs

00:30:03.600 | was a year

00:30:05.600 | of scaling

00:30:07.600 | and parameter tuning

00:30:09.600 | as opposed to breakthrough new ideas

00:30:11.600 | Video to Video Synthesis

00:30:15.600 | this work is from

00:30:17.600 | NVIDIA

00:30:19.600 | is looking at the problem

00:30:21.600 | so there's been a lot of work on

00:30:23.600 | going from image to image

00:30:25.600 | so from a particular

00:30:27.600 | image generating another image

00:30:29.600 | so whether it's colorizing an image

00:30:31.600 | or just traditionally

00:30:33.600 | defined GANs

00:30:35.600 | the idea with

00:30:37.600 | Video to Video Synthesis that a few people have been

00:30:39.600 | working on but NVIDIA took a good

00:30:41.600 | step forward

00:30:43.600 | is to

00:30:45.600 | make

00:30:47.600 | the video to make

00:30:49.600 | the temporal consistency, the temporal dynamics

00:30:51.600 | part of the optimization process

00:30:53.600 | so make it look not jumpy

00:30:55.600 | so if you look here at the

00:30:57.600 | comparison for this particular

00:30:59.600 | so the input is the labels in the top

00:31:03.600 | left and the output of the

00:31:05.600 | NVIDIA approach is on the bottom right

00:31:11.600 | see it's very

00:31:13.600 | temporally consistent. If you look at

00:31:15.600 | the image to image

00:31:17.600 | mapping that's

00:31:19.600 | state of the art, pix2pix HD

00:31:21.600 | it's very jumpy

00:31:23.600 | it's not temporally consistent at all

00:31:25.600 | and there's some naive

00:31:27.600 | approaches for trying to maintain temporal

00:31:29.600 | consistency

00:31:31.600 | that's in the bottom left

00:31:33.600 | so you can apply this to all

00:31:35.600 | kinds of tasks, all kinds of video to video

00:31:37.600 | mapping. Here is mapping it to

00:31:39.600 | face edges

00:31:41.600 | edge detection on faces, mapping it to

00:31:43.600 | faces

00:31:45.600 | generating faces from just edges

00:31:47.600 | you can look at

00:31:51.600 | body pose

00:31:53.600 | to actual images

00:31:55.600 | as input to the network you can take

00:31:57.600 | the pose of the person

00:31:59.600 | and generate the

00:32:01.600 | video of the person

00:32:03.600 | okay, semantic segmentation

00:32:13.600 | the problem of perception

00:32:17.600 | began with AlexNet and ImageNet

00:32:19.600 | and then further and further developments

00:32:21.600 | where the input, the problem is

00:32:23.600 | of basic image classification where the input is an image

00:32:25.600 | and the output is a classification of what's going on

00:32:27.600 | in that image and the fundamental

00:32:29.600 | architecture can be reused for

00:32:31.600 | more complex tasks like detection

00:32:33.600 | like segmentation and so on

00:32:35.600 | interpreting what's going on in the image

00:32:37.600 | so these large networks

00:32:39.600 | from VGGNet, GoogleNet,

00:32:41.600 | ResNet,

00:32:43.600 | SCNet, DenseNet

00:32:45.600 | all these networks are forming rich

00:32:47.600 | representation that can then be used for all kinds

00:32:49.600 | of tasks, whether that task is object

00:32:51.600 | detection, this here

00:32:53.600 | shown is the region based methods

00:32:55.600 | where the neural network

00:32:57.600 | is tasked, the convolutional layers

00:32:59.600 | make

00:33:01.600 | region proposals

00:33:03.600 | so a bunch of candidates to be considered

00:33:05.600 | and then there's a step that's

00:33:07.600 | determining what's in those

00:33:09.600 | different regions and forming bounding boxes

00:33:11.600 | around them in a for loop way

00:33:13.600 | and then there is the one shot method

00:33:15.600 | single shot method where

00:33:17.600 | in a single pass all of the

00:33:19.600 | bounding boxes in their classes

00:33:21.600 | are generated and there has been

00:33:23.600 | a tremendous amount of work

00:33:25.600 | in the space of object detection

00:33:27.600 | some are single

00:33:29.600 | shot methods

00:33:31.600 | some are region

00:33:33.600 | based methods and there's been a lot

00:33:35.600 | of exciting work

00:33:37.600 | but not

00:33:39.600 | I would say breakthrough

00:33:41.600 | ideas and then

00:33:43.600 | we take it to the highest level of

00:33:45.600 | perception which is semantic segmentation

00:33:47.600 | there's also been

00:33:49.600 | a lot of work there, the state of

00:33:51.600 | the art performance is

00:33:53.600 | at least for the open source systems

00:33:55.600 | is DeepLab

00:33:57.600 | V3+

00:33:59.600 | on the PASCAL

00:34:01.600 | VLC challenge

00:34:03.600 | so semantic segmentation to catch it

00:34:05.600 | all up started in 2014 with fully

00:34:07.600 | convolutional neural networks

00:34:09.600 | chopping off the fully connected

00:34:11.600 | layers and then

00:34:13.600 | outputting

00:34:15.600 | the heat map, very grainy

00:34:17.600 | very

00:34:19.600 | low resolution

00:34:21.600 | then improving that with segnet

00:34:23.600 | performing max pooling

00:34:25.600 | with

00:34:27.600 | a breakthrough idea that's reused in a lot

00:34:29.600 | of cases is dilated convolution

00:34:31.600 | atrius convolutions

00:34:33.600 | having some spacing

00:34:35.600 | which increases the

00:34:37.600 | field of view of the convolutional

00:34:39.600 | filter, the key idea

00:34:41.600 | behind DeepLab V3+

00:34:43.600 | that I

00:34:45.600 | is the state of the art is

00:34:47.600 | the multi-scale processing

00:34:49.600 | without

00:34:51.600 | increasing the parameters, the

00:34:53.600 | multi-scales achieved by

00:34:55.600 | quote-unquote the atrius

00:34:57.600 | rate, so taking those atrius convolutions

00:34:59.600 | and increasing the spacing

00:35:01.600 | and you can think of increasing that

00:35:03.600 | spacing

00:35:05.600 | by enlarging the

00:35:07.600 | model's field of view, so you

00:35:09.600 | can consider all these different

00:35:11.600 | scales of processing

00:35:13.600 | looking at the

00:35:15.600 | layers of features

00:35:17.600 | so

00:35:19.600 | allowing you to be able

00:35:21.600 | to grasp the greater

00:35:23.600 | context as part of the

00:35:25.600 | upsampling deconvolutional step

00:35:27.600 | and that's what's producing the state of the art performances

00:35:29.600 | and that's where we have the

00:35:31.600 | notebook

00:35:33.600 | tutorial

00:35:35.600 | on github

00:35:37.600 | showing this

00:35:39.600 | DeepLab

00:35:41.600 | architecture trained on

00:35:43.600 | Cityscapes, so Cityscapes

00:35:45.600 | is a driving segmentation

00:35:47.600 | data set

00:35:49.600 | that is

00:35:51.600 | that is

00:35:55.600 | one of the most commonly used for the task of

00:35:57.600 | driving scene segmentation

00:35:59.600 | Okay, on the deep reinforcement

00:36:03.600 | learning front

00:36:05.600 | So this is touching a bit

00:36:07.600 | a bit on the 2017

00:36:09.600 | but I think

00:36:11.600 | the excitement really settled in

00:36:13.600 | in 2018

00:36:15.600 | is the work from Google

00:36:17.600 | and from OpenAI

00:36:19.600 | DeepMind, so it started

00:36:21.600 | in DQN paper from Google

00:36:23.600 | DeepMind where they beat a bunch of

00:36:25.600 | a bunch of Atari

00:36:29.600 | games achieving superhuman

00:36:31.600 | performance with

00:36:33.600 | deep reinforcement

00:36:35.600 | learning methods that are taking in just the raw

00:36:37.600 | pixels of the game, so this same

00:36:39.600 | kind of architecture is able to learn

00:36:41.600 | how to beat these games

00:36:43.600 | super exciting idea

00:36:45.600 | that kind of has echoes

00:36:47.600 | of what general intelligence is

00:36:49.600 | taking in the raw

00:36:51.600 | information and being able to

00:36:53.600 | understand the game

00:36:55.600 | the sort of physics of the game sufficiently to be able

00:36:57.600 | to beat it. Then in 2016

00:36:59.600 | AlphaGo

00:37:01.600 | with some supervision

00:37:03.600 | and some playing against itself

00:37:05.600 | self play

00:37:07.600 | some supervised learning

00:37:09.600 | on expert world champ players

00:37:11.600 | and some self play

00:37:13.600 | where it plays against itself

00:37:15.600 | was able to beat the top of the world champion

00:37:17.600 | at Go

00:37:19.600 | and then 2017 AlphaGo Zero

00:37:21.600 | a specialized

00:37:23.600 | version of AlphaZero

00:37:25.600 | was able to beat

00:37:27.600 | the AlphaGo

00:37:29.600 | with just a few days

00:37:31.600 | of training and zero

00:37:33.600 | supervision from expert games

00:37:35.600 | so through the process of self play

00:37:37.600 | again this is kind of

00:37:39.600 | getting

00:37:41.600 | the human out of the picture

00:37:43.600 | more and more and more which is why AlphaZero

00:37:45.600 | is probably or this

00:37:47.600 | AlphaGo Zero was a demonstration

00:37:49.600 | of the cleanest

00:37:53.600 | demonstration of all the nice

00:37:55.600 | progress in deep reinforcement learning

00:37:57.600 | and I think if you look at the history of AI

00:37:59.600 | when you're sitting on a

00:38:01.600 | porch 100 years from now

00:38:03.600 | sort of reminiscing back

00:38:05.600 | AlphaZero

00:38:07.600 | will be a thing that people

00:38:09.600 | will remember as an

00:38:11.600 | interesting moment in time

00:38:13.600 | as a key moment in time

00:38:15.600 | and AlphaZero

00:38:17.600 | was applied in 2017

00:38:19.600 | to beat

00:38:21.600 | AlphaZero paper

00:38:23.600 | was in 2017 and it was

00:38:25.600 | this year played Stockfish

00:38:27.600 | in chess which is

00:38:29.600 | the best engine

00:38:31.600 | chess playing engines

00:38:33.600 | was able to beat it with just 4 hours of training

00:38:35.600 | of course the 4 hours

00:38:37.600 | is caveats because 4 hours

00:38:39.600 | for Google DeepMind is highly distributed

00:38:41.600 | training so it's not 4 hours

00:38:43.600 | for an

00:38:45.600 | undergraduate student sitting in their dorm room

00:38:47.600 | but meaning

00:38:49.600 | it was

00:38:51.600 | able through self play to

00:38:53.600 | very quickly learn to beat the state of the art

00:38:55.600 | chess engine and

00:38:57.600 | learn to beat the state of the art shogi

00:38:59.600 | engine ELMO

00:39:01.600 | and the interesting

00:39:03.600 | thing here is

00:39:05.600 | with perfect information games

00:39:07.600 | like chess you have a tree

00:39:09.600 | and you have all the decisions you could

00:39:11.600 | possibly make and so the farther along you look

00:39:13.600 | at along that tree presumably

00:39:15.600 | the better you do

00:39:17.600 | that's how Deep Blue beat

00:39:19.600 | Kasparov in the

00:39:21.600 | 90's is you just look as far

00:39:23.600 | as possible down the tree

00:39:25.600 | to determine which is the action is the most optimal

00:39:27.600 | if you look at the way

00:39:29.600 | human

00:39:31.600 | grandmasters think it certainly

00:39:33.600 | doesn't feel like they're like looking down a tree

00:39:35.600 | there's something like

00:39:37.600 | creative intuition there's something like you could

00:39:39.600 | see the patterns in the board you can do

00:39:41.600 | a few calculations but really

00:39:43.600 | it's in the order of hundreds it's not

00:39:45.600 | on the order of millions

00:39:47.600 | or billions which is kind

00:39:49.600 | of the

00:39:51.600 | the stockfish the state of the art chess engine

00:39:55.600 | approach and AlphaZero is moving

00:39:57.600 | closer and closer and closer towards the human

00:39:59.600 | grandmaster concerning very few

00:40:01.600 | future moves it's able through

00:40:03.600 | the neural network estimator that's

00:40:05.600 | estimating the quality of the move and the quality

00:40:07.600 | of the different the quality of the

00:40:09.600 | current quality of the board and the quality

00:40:11.600 | of the moves that follow

00:40:13.600 | it's able to do much much less

00:40:15.600 | look ahead so the

00:40:17.600 | neural network learns the fundamental

00:40:19.600 | information just like when a grandmaster looks

00:40:21.600 | at a board they can tell how

00:40:23.600 | good that is

00:40:25.600 | that's again interesting it's a step

00:40:27.600 | towards

00:40:29.600 | at least echoes

00:40:31.600 | of what human intelligence is in

00:40:33.600 | this very structured formal constrained world

00:40:35.600 | of chess and Go and Shogi

00:40:37.600 | and then there's

00:40:39.600 | the other side of the world that's messy

00:40:41.600 | it's still games

00:40:43.600 | it's still constrained in that way

00:40:45.600 | but OpenAI

00:40:47.600 | has taken on the challenge of

00:40:49.600 | playing games that are much messier

00:40:51.600 | that have this

00:40:53.600 | semblance

00:40:55.600 | of the real world and the fact that

00:40:57.600 | you have to do teamwork you have to

00:40:59.600 | look at long time horizons

00:41:01.600 | with huge amounts of imperfect

00:41:03.600 | information hidden information

00:41:05.600 | uncertainty

00:41:07.600 | so within that world they've

00:41:09.600 | taken on the challenge of a popular game

00:41:11.600 | Dota 2

00:41:13.600 | on the human side of that

00:41:15.600 | there's the competition

00:41:17.600 | the international hosted every year

00:41:19.600 | where in 2018 the winning

00:41:21.600 | team gets 11 million dollars

00:41:23.600 | it's a very popular very active competition

00:41:25.600 | that's been going on for

00:41:27.600 | a few years

00:41:29.600 | they've been

00:41:31.600 | improving and achieved a lot of interesting

00:41:33.600 | milestones in 2017

00:41:35.600 | they were 1v1 bot

00:41:37.600 | beat the top professional Dota 2 player

00:41:39.600 | the way you achieve great

00:41:41.600 | things is

00:41:43.600 | you try

00:41:45.600 | in 2018 they tried to go

00:41:47.600 | 5v5

00:41:49.600 | the OpenAI 5 team lost 2 games

00:41:51.600 | against the top 2

00:41:53.600 | the top Dota 2 players

00:41:55.600 | at the 2018 international

00:41:57.600 | and of course their ranking

00:41:59.600 | here the MMR ranking in Dota 2

00:42:01.600 | has been increasing over and over

00:42:03.600 | but there's a lot of challenges

00:42:05.600 | here that make it extremely difficult

00:42:07.600 | to beat the human players

00:42:09.600 | and

00:42:11.600 | this is you know in every story

00:42:13.600 | Rocky or whatever

00:42:15.600 | you think about losing is

00:42:17.600 | an essential element of a story

00:42:19.600 | that leads to then a movie

00:42:21.600 | and a book and greatness

00:42:23.600 | so you better believe that they're coming back

00:42:25.600 | next year and there's going to be

00:42:27.600 | a lot of exciting developments there

00:42:29.600 | it also Dota 2

00:42:31.600 | this particular video game makes it

00:42:33.600 | currently

00:42:35.600 | there's really 2 games

00:42:37.600 | that have the public eye

00:42:39.600 | in terms of AI taking on as benchmarks

00:42:41.600 | so we solved GO

00:42:43.600 | incredible accomplishment but what's next

00:42:45.600 | so last year

00:42:47.600 | the

00:42:49.600 | associate with the best paper

00:42:51.600 | in Europe's

00:42:53.600 | was the heads up

00:42:55.600 | Texas no limit hold'em

00:42:57.600 | AI was able to

00:42:59.600 | beat the top level players

00:43:01.600 | what's completely current well not completely

00:43:03.600 | but currently out of reach is the

00:43:05.600 | general not heads up 1 vs 1

00:43:07.600 | but the general team

00:43:09.600 | Texas no limit hold'em

00:43:11.600 | and on the gaming

00:43:13.600 | side this dream

00:43:15.600 | of Dota 2 now that's the benchmark

00:43:17.600 | that everybody's targeting and it's actually

00:43:19.600 | incredibly difficult one and some people think

00:43:21.600 | it'll be a long time before we can

00:43:23.600 | win and on

00:43:25.600 | the more

00:43:27.600 | practical side of things

00:43:29.600 | the

00:43:31.600 | 2018

00:43:33.600 | starting 2017 has been a year

00:43:35.600 | of

00:43:37.600 | the frameworks

00:43:39.600 | growing up

00:43:41.600 | of maturing and creating ecosystems

00:43:43.600 | around them with

00:43:45.600 | TensorFlow

00:43:47.600 | with the history there

00:43:49.600 | dating back a few years has really

00:43:51.600 | with TensorFlow 1.0

00:43:53.600 | as

00:43:55.600 | come to be sort of a mature

00:43:57.600 | framework

00:43:59.600 | PyTorch 1.0 came out in 2018

00:44:01.600 | is matured as well and now

00:44:03.600 | the really exciting developments in

00:44:05.600 | TensorFlow with eager execution

00:44:07.600 | and beyond that's

00:44:09.600 | coming out in TensorFlow 2.0

00:44:11.600 | in 2019 so

00:44:13.600 | really those two

00:44:15.600 | those two players have

00:44:17.600 | made incredible

00:44:19.600 | leaps in

00:44:21.600 | standardizing deep

00:44:23.600 | learning and

00:44:25.600 | the fact that a lot of the ideas I talked

00:44:27.600 | about today and Monday and we'll keep talking

00:44:29.600 | about are all

00:44:31.600 | have a GitHub repository

00:44:33.600 | with implementations in TensorFlow

00:44:35.600 | and PyTorch making it extremely

00:44:37.600 | accessible and that's really exciting

00:44:39.600 | it's probably best to

00:44:41.600 | quote Jeff Hinton

00:44:43.600 | the quote unquote godfather of deep learning

00:44:45.600 | one of the key people

00:44:47.600 | behind backpropagation

00:44:49.600 | said recently of backpropagation

00:44:51.600 | is my view is throw it all away

00:44:53.600 | and start again he

00:44:55.600 | believes backpropagation is

00:44:57.600 | totally broken and an idea that

00:44:59.600 | is ancient and

00:45:01.600 | needs to be completely revolutionized

00:45:03.600 | and the practical protocol

00:45:05.600 | for doing that is he said

00:45:07.600 | the future depends on some graduate student

00:45:09.600 | who's deeply suspicious of everything I've said

00:45:11.600 | that's probably a good

00:45:13.600 | way to

00:45:15.600 | to end the discussion

00:45:17.600 | about what the state

00:45:19.600 | of the art in deep learning

00:45:21.600 | holds because everything we're doing

00:45:23.600 | is fundamentally based on

00:45:25.600 | ideas

00:45:27.600 | from the 60s and the 80s

00:45:29.600 | and really in terms of new ideas

00:45:31.600 | there's not been many new ideas

00:45:33.600 | especially the

00:45:35.600 | state of the art results that I've mentioned

00:45:37.600 | are all based on

00:45:39.600 | fundamentally

00:45:41.600 | on stochastic gradient

00:45:43.600 | descent and backpropagation

00:45:45.600 | it's ripe for

00:45:47.600 | totally new ideas so it's

00:45:49.600 | up to us to define the

00:45:51.600 | real breakthroughs and the real

00:45:53.600 | state of the art 2019

00:45:55.600 | and beyond. So with that

00:45:57.600 | I'd like to thank

00:45:59.600 | you and the stuff is on the

00:46:01.600 | website deeplearning.mit.edu

00:46:03.600 | [applause]

00:46:05.600 | [silence]

00:46:07.600 | [silence]

00:46:09.600 | [silence]

00:46:11.600 | [silence]

00:46:13.600 | [silence]

00:46:15.600 | [silence]

00:46:17.600 | [silence]

00:46:19.600 | [silence]

00:46:21.600 | [silence]

00:46:23.600 | [silence]

Deep Learning State of the Art (2019) - MIT

Chapters