back to index

TensorFlow Tutorial (Sherry Moore, Google Brain)


Chapters

0:0 Introduction
1:10 Thank you
2:0 What is TensorFlow
5:50 How TensorFlow works
7:45 Frontend libraries
8:50 Portability
9:50 How we use it
10:35 Smart Reply
11:25 Games
11:55 Models
12:20 Highlevel libraries
13:50 Linear Regression
14:20 Mystery Creation
15:35 Jupiter Notebook
19:55 Variable Objects
22:20 Training Graph
25:20 Results
26:35 Other optimizers
28:35 Learn something new
29:50 What is M
31:20 What is important when building a network
35:30 Train graphs
36:25 Placeholders
37:45 Saver
38:45 Reduce Loss
42:20 LS
43:10 Checkpoint
43:45 Return
46:45 Exercises
48:0 Training Evaluations
49:45 TensorFlow is for Machine Learning
50:5 Questions
50:55 Peachy
54:10 Tori
55:5 Load your own data
57:50 Writing TensorFlow in Python

Whisper Transcript | Transcript Only Page

00:00:00.000 | So I'm going to take a picture so I remember how many of you
00:00:02.760 | are here.
00:00:05.560 | Smile.
00:00:06.060 | Like Sami says, my name is Sherry Moore.
00:00:12.240 | I work in the Google Brain team.
00:00:15.640 | So today I'll be giving a tutorial on TensorFlow.
00:00:18.800 | First I'll talk a little bit about what TensorFlow is
00:00:22.120 | and how it works, how we use it at Google.
00:00:26.400 | And then the important part is that I'm
00:00:28.440 | going to work with you together to build a couple models
00:00:33.440 | to solve the most classic machine learning problems,
00:00:37.960 | so-called get your feet wet for those of you from New Zealand.
00:00:41.840 | Anybody from New Zealand?
00:00:45.240 | So hopefully at the end, you'll be going home
00:00:48.440 | with all the tools that you have to build
00:00:51.400 | all the wonderful things that you have watched today,
00:00:53.840 | like all the image recognition, the training
00:00:57.800 | of different colors, arts, making music.
00:01:00.520 | So that's the goal.
00:01:02.800 | So before I go any further, has everybody installed TensorFlow?
00:01:11.440 | Yay, brilliant.
00:01:12.960 | Thank you.
00:01:13.520 | And I would like to acknowledge--
00:01:15.280 | so I know the link here says Sherry-Ann,
00:01:17.040 | but if you have Wolf G, TF tutorial is perfectly fine.
00:01:21.520 | Wolf is actually my colleague who
00:01:23.800 | spent all the time verifying installation
00:01:26.160 | on every single platform.
00:01:27.360 | So I would really like to thank him.
00:01:29.160 | Thanks, Wolf, if you're watching.
00:01:31.200 | And also, I have my wonderful product boss or product manager
00:01:34.800 | in the audience somewhere.
00:01:35.920 | So if you guys have any request for TensorFlow,
00:01:38.840 | make sure that you go find him and tell him
00:01:41.360 | why TensorFlow must support this feature.
00:01:45.200 | Or is Zach somewhere?
00:01:48.000 | All right, so there he is.
00:01:50.480 | So with that, we can move forward
00:01:53.720 | to talk about TensorFlow.
00:01:56.640 | So what exactly is TensorFlow?
00:01:58.000 | TensorFlow is a machine learning library
00:02:00.320 | that we developed at Google.
00:02:03.120 | And we open sourced it last November.
00:02:06.400 | And ever since then, we have become the most, most popular
00:02:10.800 | machine learning library on GitHub.
00:02:13.400 | How do we know?
00:02:15.280 | Because we have over 32,000 stars.
00:02:20.560 | Those of you who track GitHub, you
00:02:22.200 | know how hard it is to get one of those acknowledgments.
00:02:25.000 | And we also have over 14,000 forks.
00:02:28.040 | And we have over 8,000 contributions
00:02:32.480 | from 400 individual developers.
00:02:37.280 | And we designed this specifically
00:02:40.360 | for machine learning.
00:02:41.520 | However, as you'll see later, because
00:02:43.960 | of its really flexible data flow infrastructure,
00:02:47.960 | it makes it really suitable for pretty much any application
00:02:51.640 | that can fit into that model.
00:02:52.840 | Basically, if your model can be asynchronous and fire
00:02:58.600 | on when data is ready, it can probably use TensorFlow.
00:03:04.040 | Originally, we worked alongside with other researchers.
00:03:07.240 | As a matter of fact, I was really fortunate.
00:03:09.240 | When I joined the team, I sat right next to Alex,
00:03:12.440 | the person who invented AlexNet.
00:03:14.360 | So that's how closely we worked together.
00:03:17.400 | As we developed TensorFlow, they would tell us, no,
00:03:20.080 | this is not how we use it.
00:03:21.840 | Yes, when you do this, it makes our lives a lot easier.
00:03:24.880 | And this is why we believe that we have developed
00:03:27.280 | an infrastructure that will work really well for researchers.
00:03:30.640 | And also, being Google, we also always
00:03:32.800 | have in mind that we would like to take from research
00:03:36.240 | to prototyping to a production in no time.
00:03:38.720 | We don't want you to write all the code that's typically
00:03:42.200 | just throwing away.
00:03:43.680 | We want you to write code that can literally cut and paste
00:03:46.080 | and save in a file and productize it immediately.
00:03:49.760 | So TensorFlow is really designed with that in mind.
00:03:55.440 | So we are halfway into your deep learning school.
00:03:59.080 | So can anybody tell me, if you want to build a neural net,
00:04:03.640 | what must you have?
00:04:05.400 | What are the primitives?
00:04:07.680 | What are the-- yeah, primitive, I think,
00:04:10.720 | is the word I'm looking for.
00:04:12.160 | What must you have to build a neural net?
00:04:16.680 | Anybody?
00:04:17.640 | What is in the neural net?
00:04:21.320 | [INAUDIBLE]
00:04:23.280 | That's a very good answer.
00:04:25.560 | So in a neural net, you have neurons.
00:04:27.600 | That's right.
00:04:32.240 | So all these neurons, what do they operate on?
00:04:38.280 | What do all these neurons do?
00:04:41.880 | They process data, and they operate on data.
00:04:45.120 | And they do something, such as convolution, matrix
00:04:48.800 | multiplication, max pooling average, pooling dropout,
00:04:52.240 | whatever that is.
00:04:53.640 | So in TensorFlow, all the data is held
00:04:57.640 | in something called a tensor.
00:05:00.520 | Tensor is nothing more than a multidimensional array.
00:05:03.960 | For those of you who are familiar with NumPy arrays,
00:05:07.040 | it's very similar to the ND array.
00:05:10.320 | And the graph, I think one of the gentlemen
00:05:12.920 | earlier this morning described, there
00:05:15.000 | is this concept of the graph, which
00:05:16.920 | is a composition of all these neurons that
00:05:21.560 | do different functions.
00:05:23.640 | And all these neurons are connected to each other
00:05:26.640 | through their inputs and outputs.
00:05:28.280 | So as data become available, they would fire--
00:05:32.160 | by fire, I mean they do what they're
00:05:33.840 | designed to do, such as doing matrix
00:05:35.800 | multiplication or convolution.
00:05:38.160 | And then they will produce output
00:05:39.520 | for the next computation node that's
00:05:42.280 | connected to the output.
00:05:44.160 | So by doing this--
00:05:46.560 | so I don't know how many of you can actually see this animation.
00:05:51.720 | Yeah?
00:05:53.360 | So this is to really visualize how TensorFlow works.
00:05:57.200 | All these nodes, the oval ones are computation.
00:06:00.560 | The rectangle ones are stateful nodes.
00:06:03.840 | So all these nodes, they would generate output,
00:06:07.160 | or they take input.
00:06:08.320 | And as soon as all the inputs for a particular node
00:06:11.360 | are available, it would do its thing, produce output.
00:06:16.000 | And then the tensor, all the data,
00:06:20.240 | which are held in tensors, will flow through your network.
00:06:24.200 | Therefore, tensor flow.
00:06:26.360 | Yeah?
00:06:26.860 | So everybody's like, wow, this sounds like magic.
00:06:32.920 | How does it work?
00:06:34.520 | So who said-- is it Sir Arthur Clark that says,
00:06:38.600 | any sufficiently-- what's the word?
00:06:41.400 | Any sufficiently advanced technology
00:06:43.360 | is indistinguishable from magic.
00:06:45.520 | So that's what this is.
00:06:48.680 | It's just really awesome.
00:06:52.240 | Excuse me for a second.
00:06:53.440 | I know I want to get through this as quickly as possible
00:06:59.040 | so we can actually do the lab that you're all dying to do.
00:07:02.880 | So as any good infrastructure-- so this is--
00:07:05.240 | I want to give you a little image of how
00:07:07.720 | we design this tensor flow.
00:07:11.040 | Just like any well-designed infrastructure,
00:07:13.160 | it has to be really modular.
00:07:15.000 | Because being modular allows you to innovate, to upgrade,
00:07:19.080 | to improve, to modify, to do whatever you want with any
00:07:23.000 | piece, as long as you keep the APIs consistent.
00:07:26.320 | Everybody can work in parallel.
00:07:27.840 | It's really empowering.
00:07:29.280 | I think that's one of the wonderful things that's
00:07:31.360 | done at Google.
00:07:32.080 | Pretty much any infrastructure at Google is really modular.
00:07:35.080 | They talk really well to each other.
00:07:37.040 | All you need to maintain is the API stability.
00:07:42.280 | So in this case, we have a front end.
00:07:45.440 | I think you guys must have seen some examples of how
00:07:48.160 | you construct a graph.
00:07:49.360 | So we have the front end libraries
00:07:52.080 | written in your favorite language.
00:07:53.720 | And if C++ and Python is not your favorite language,
00:07:56.920 | feel free to contribute.
00:07:58.080 | We always welcome contribution.
00:08:00.200 | So you construct your graph in your favorite language.
00:08:03.760 | And this graph will be sent to-- we
00:08:06.840 | call it the core TensorFlow execution system.
00:08:10.400 | That's your runtime.
00:08:11.320 | And that's what you all will be running today on your laptop
00:08:14.560 | when you open your Python notebook or Jupyter notebook.
00:08:20.280 | So the execution runtime, depending
00:08:23.400 | on where you are going to run this application,
00:08:27.640 | it will send the kernel to the corresponding device.
00:08:31.400 | So it could be a CPU, could be a GPU,
00:08:33.000 | could be a phone, could be TPU.
00:08:36.520 | Anybody knows what TPU is?
00:08:40.000 | Brilliant, very nice.
00:08:42.280 | I was at Strata.
00:08:43.320 | I said, anybody knows what TPU is?
00:08:44.920 | And everybody's like, hmm, translation?
00:08:48.600 | So this is good.
00:08:50.160 | So just to highlight our portability,
00:08:53.760 | today you'll be running TensorFlow on your laptop.
00:08:56.320 | We run it in our data center.
00:08:59.040 | Everybody can run it on your iPhone, your Android phone.
00:09:04.700 | I would love to see people putting it on Raspberry Pi,
00:09:07.280 | because can you imagine, you can just
00:09:09.040 | write your own TensorFlow application.
00:09:11.680 | It could be your security system,
00:09:13.920 | because somebody just stole my bike and my security camera,
00:09:17.340 | capture all this grainy stuff that I cannot tell.
00:09:20.040 | Wouldn't it be nice if you do machine learning on this thing
00:09:24.680 | and they just start taking high-resolution pictures when
00:09:28.360 | things are moving, rather than constantly capturing
00:09:31.480 | all those grainy images, which is totally useless?
00:09:34.520 | So I think the application-- literally,
00:09:36.160 | applications are limitless.
00:09:39.120 | Your imagination is the limit.
00:09:42.600 | So we talked about what TensorFlow is, how it works.
00:09:48.920 | How do we use it at Google?
00:09:50.480 | We use it everywhere.
00:09:51.500 | I think you have seen some of the examples.
00:09:53.800 | We use it to recognize pictures.
00:09:55.960 | This is actually done with Inception.
00:09:57.560 | They can recognize out of the box 1,000 images.
00:10:03.480 | You have to retrain it if you want it to recognize, say,
00:10:05.920 | all your relatives or your pets.
00:10:08.720 | But it's not difficult. And I have links for you to--
00:10:11.960 | actually, if you want to train on your own images,
00:10:14.160 | it's really easy.
00:10:15.120 | They should totally try it.
00:10:16.600 | Wouldn't it be fun if you go to your 40-year reunion
00:10:20.160 | and you just go, I know who you are.
00:10:23.840 | Just show off a little.
00:10:25.160 | It would be brilliant.
00:10:27.040 | And we also use it to do Google Voice Search.
00:10:31.720 | This is one that's super awesome.
00:10:33.680 | So how many of you use Smart Reply?
00:10:37.240 | Have you ever used Smart Reply?
00:10:38.480 | Yeah, yeah, this is awesome, especially for those of you
00:10:42.240 | who are doing what you're not supposed to do--
00:10:44.840 | texting while driving, you saw an email coming in,
00:10:47.720 | and you can just say, oh, yes, I'll be there.
00:10:50.740 | So based on the statistics that we collected in February,
00:10:55.440 | over 10% of all the responses sent on mobile
00:10:58.920 | is actually done by our Smart Reply.
00:11:01.840 | I believe if we have--
00:11:03.480 | maybe Zach can collect some stats for me later.
00:11:06.440 | And maybe by now, it'll be like 80%.
00:11:10.520 | It's actually really funny.
00:11:11.640 | At the very beginning, when we train it,
00:11:13.800 | the first answer is always, I love you.
00:11:16.880 | We're like, that's probably not the right answer.
00:11:23.080 | We also play games.
00:11:25.080 | All of you, I'm sure, have followed this.
00:11:27.720 | There are all kinds of games that are being developed.
00:11:30.360 | It's really fun to watch if you watch it.
00:11:32.480 | Literally, come up with scenarios for you
00:11:34.680 | to play as well.
00:11:35.800 | It not only learns to play the game,
00:11:38.000 | but learns how to make a game for you.
00:11:40.640 | It's fascinating.
00:11:42.720 | And of course, art.
00:11:43.960 | I think many of you have done this deep dream.
00:11:46.320 | If we have time in the end of the lab, we can try this.
00:11:50.800 | So if we are super fast, we can all try to make some art.
00:11:55.280 | And all those, what I just talked about, of course,
00:11:57.960 | Google being this wonderful, generous company,
00:12:00.960 | wants to share our knowledge.
00:12:02.680 | So we have actually published all our models.
00:12:05.760 | So if you go to that link, you'll
00:12:07.480 | find all these inception and captioning, language
00:12:12.360 | model on a billion words, the latest
00:12:14.760 | ResNet on CIFAR-10, sequence to sequence,
00:12:17.120 | which I think Kwok will be talking about tomorrow.
00:12:20.320 | And we have many other high-level libraries.
00:12:23.920 | So today, my lab, the lab that we will do,
00:12:27.880 | will be on the core TensorFlow APIs.
00:12:30.560 | But there are tons of new higher-level APIs,
00:12:33.560 | such as some of the mentioned Keras.
00:12:36.000 | And we have SLIM.
00:12:38.200 | We have PrettyTensor.
00:12:39.600 | We have TF Learn.
00:12:41.320 | We have many libraries that's developed on top
00:12:44.040 | of the core TensorFlow APIs.
00:12:46.160 | Then we encourage people to do so.
00:12:47.660 | If whatever is out there does not fit your needs perfectly,
00:12:51.080 | go for it.
00:12:51.600 | Develop your own.
00:12:52.360 | And we welcome the contribution.
00:12:54.000 | And we published a lot of that here.
00:12:56.200 | I might have blurred some of the boundaries.
00:12:58.040 | But these are basically all the models and libraries
00:13:01.280 | that we have produced.
00:13:02.600 | And we really love contribution.
00:13:05.440 | If you have developed a really cool model,
00:13:07.520 | please do send to us.
00:13:09.280 | And we will showcase your work.
00:13:14.960 | So that's the introduction of TensorFlow.
00:13:17.820 | How does everybody feel?
00:13:19.580 | Are you all ready to get started?
00:13:23.460 | All right, so OK, before you bring up your Python notebook,
00:13:27.620 | I want to say what we are going to do first.
00:13:30.540 | So as I mentioned, there are two classic machine learning
00:13:33.820 | problems that everybody does.
00:13:35.900 | One is linear regression.
00:13:38.180 | The other is classification.
00:13:39.440 | So we are going to do two simple labs to cover those.
00:13:43.580 | I do have a lot of small exercises you can play with.
00:13:46.140 | I encourage you to play with it to be a lot more comfortable.
00:13:49.460 | So the first one is linear regression.
00:13:52.940 | So I'm sure it has been covered, yeah, in today's lectures.
00:13:55.820 | Somebody must have covered linear regression.
00:13:58.700 | Can anybody give me a one-line summary?
00:14:02.340 | What is a linear regression problem?
00:14:05.820 | Anybody?
00:14:06.320 | The professors--
00:14:10.180 | [LAUGHTER]
00:14:12.620 | [LAUGHS]
00:14:14.540 | Well, if you don't know, go Google it.
00:14:18.940 | So I didn't know the audience when
00:14:22.860 | Sammy asked me to do this.
00:14:24.740 | So I wrote this for one of the high schools.
00:14:28.620 | So I think it still kind of makes sense, right?
00:14:30.780 | Because all of us have played this game
00:14:33.340 | at one point of our lives.
00:14:35.140 | Like, if you tell me 5 or tell you 10,
00:14:39.860 | and you try to guess what the equation is,
00:14:42.420 | we must have all done this.
00:14:44.260 | I think my friends are still doing on Facebook saying, oh,
00:14:47.620 | only genius can solve this kind of equation.
00:14:50.340 | And then they would be like, yeah, I solved it.
00:14:52.860 | I was like, my god, if anybody--
00:14:54.860 | I will unfriend you guys if you click on another one of those.
00:14:59.020 | But basically, this is what we are
00:15:00.660 | trying to do in the first lab.
00:15:02.820 | So we will have a mystery equation.
00:15:04.820 | It's really simple.
00:15:05.980 | It's just a linear--
00:15:07.180 | literally a line.
00:15:08.660 | And then I will tell you that this is the formula.
00:15:12.460 | But I'm not going to give you a weight, w and b.
00:15:15.580 | All of you have learned by now, w stands for weight and b
00:15:19.180 | stands for bias.
00:15:20.580 | So the idea is that if you are given enough samples,
00:15:24.780 | if you are given enough x and y values,
00:15:27.940 | you should be able to make a pretty good guess what w and b
00:15:32.940 | So that's what we are going to do.
00:15:35.420 | So now you can bring up your Jupyter Notebook
00:15:39.660 | if you don't have it up already.
00:15:41.100 | Yeah, everybody have it up?
00:15:48.300 | Can I see a show of hands, everybody?
00:15:51.700 | Those of-- yeah, brilliant.
00:15:53.340 | All right.
00:15:55.060 | So for pretty much any models, these
00:15:59.620 | are going to come up over and over again.
00:16:01.380 | And just to make sure that you're all paying attention,
00:16:05.260 | I do have--
00:16:07.980 | I asked Sammy if I was supposed to bring Shrek, and he said no.
00:16:11.540 | But I do have a lot of TensorFlow stickers,
00:16:13.420 | and I have all kinds of little toys.
00:16:16.340 | So later, I'm going to ask this question.
00:16:18.140 | Whoever can answer will get some mystery present.
00:16:22.900 | So really pay attention, OK?
00:16:24.860 | So pretty much whenever you build any model,
00:16:27.620 | there are, I would say, four things that you will need.
00:16:30.380 | You need input.
00:16:31.700 | You need data.
00:16:32.700 | So you're going to see in both labs,
00:16:34.380 | we're going to be defining some data.
00:16:36.580 | You're going to be building an inference graph.
00:16:39.220 | I think in other lectures, it's also called a Fourier graph,
00:16:42.980 | to the point that it produces logits, the logistic outputs.
00:16:47.780 | And then you're going to have training operations, which
00:16:52.540 | is where you would define a loss, an optimizer.
00:16:57.220 | And I think that's pretty much it.
00:17:00.220 | Hang on.
00:17:01.020 | And there's a fourth thing.
00:17:04.260 | Yeah, and then you will basically run the graph.
00:17:06.260 | So the three important things, OK, you'll
00:17:07.980 | always have your data, your inference graph.
00:17:10.500 | You always have to define your loss and your optimizer.
00:17:15.060 | And the training is basically to minimize your loss.
00:17:18.020 | So I'm going to be asking that later.
00:17:20.660 | All right.
00:17:21.140 | So now we know what we're going to do.
00:17:23.020 | So you can go to that lab.
00:17:27.180 | Yeah, everybody have it?
00:17:28.140 | So Shift, Return.
00:17:31.460 | We'll run the first one.
00:17:32.900 | You say, I have no idea what's happening.
00:17:35.220 | Here, we turn again.
00:17:36.020 | Still nothing.
00:17:37.060 | However, let's see what we are producing here.
00:17:40.700 | So you can also do the same on your laptop.
00:17:42.860 | You can uncomment that plot.
00:17:46.900 | You're going to say, so you know what kind of data
00:17:49.020 | you're generating.
00:17:49.940 | So in this case, when here we turn, what are we seeing?
00:17:54.980 | This is your input data.
00:17:56.140 | This is when you try to make a guess,
00:17:58.100 | when your friend tell me, oh, give me x and y.
00:18:01.380 | So this is when your x is 0.2, your y is 0.32.
00:18:06.380 | So this is basically your input data.
00:18:09.620 | Yeah, everybody following?
00:18:11.980 | If at any point you're kind of lost, raise your hand,
00:18:14.980 | and your buddy next to you will be able to help you.
00:18:18.020 | So now-- oh, OK, I want to say one more thing.
00:18:25.540 | So today, the labs are all on really core TensorFlow APIs.
00:18:30.980 | The reason I want to do that--
00:18:32.420 | I know there are a lot of people who
00:18:33.880 | use Keras use another thing that we heavily advertise,
00:18:38.380 | which is contript TF, contript TF learn.
00:18:41.900 | So I feel like I'm giving you all the ingredients.
00:18:45.620 | So even though you could go to Whole Foods
00:18:49.300 | and buy the package meal, maybe one day you
00:18:52.500 | don't like the way they cook it.
00:18:53.860 | So I'm giving you all your lobsters, your Kobe beef,
00:18:58.580 | so that you can actually assemble whatever
00:19:00.900 | you want to build yourself.
00:19:03.460 | So this next one is very key.
00:19:06.500 | It's a very key concept.
00:19:08.540 | Here you'll see variables.
00:19:10.620 | So variable in TensorFlow is how--
00:19:13.780 | it's corresponding to the square.
00:19:16.460 | Any of you remember this slide?
00:19:18.060 | OK, I'm going to switch quickly.
00:19:20.500 | Don't freak out.
00:19:21.220 | So actually, I wanted you all to commit this little graph
00:19:28.340 | to your memory, because you'll be seeing this over and over
00:19:32.700 | again.
00:19:33.200 | And it makes a lot more sense when you have
00:19:35.000 | this visual representation.
00:19:37.540 | So in TensorFlow, the way we hold all the data,
00:19:42.140 | the weights and the biases associated with your network
00:19:46.660 | is using something called variable.
00:19:48.620 | It's a stateful operation.
00:19:51.740 | I'm going to switch back, OK?
00:19:54.820 | So this is what we are doing in section 1.3.
00:19:57.820 | We are building those square nodes in your network
00:20:02.860 | to hold these weights and variables.
00:20:04.580 | And they are the ones when you train.
00:20:07.100 | That's where the gradients will be applied to,
00:20:09.380 | so that they will eventually resemble the target network
00:20:14.100 | that you are trying to train for.
00:20:17.500 | So now you have built it.
00:20:18.500 | Wonderful.
00:20:19.500 | So you can shift return.
00:20:20.980 | Do you see anything?
00:20:23.980 | Nope.
00:20:26.060 | So exactly what have we built?
00:20:27.820 | That's uncommon.
00:20:28.540 | Take a look.
00:20:31.220 | So these are called the variable objects.
00:20:33.660 | So at the bottom of the slide for this lab,
00:20:36.980 | I have a link, which is our Google 3 docs, the API docs,
00:20:42.620 | which is available in GitHub.
00:20:45.780 | I think you should always have that up,
00:20:48.420 | so whenever you want to do something,
00:20:50.260 | you would know what kind of operations
00:20:52.300 | are possible with this object.
00:20:54.840 | For example, I can say here, what's the name of this?
00:21:01.740 | Oh, it's called variable 6.
00:21:02.860 | Why is it called variable 6?
00:21:04.300 | Oh, it's because when I create this variable,
00:21:07.700 | I didn't give it a name.
00:21:08.660 | So I can say Sherry's--
00:21:11.660 | Sherry weight.
00:21:14.660 | I hope that's not--
00:21:15.420 | but so see, now my variable is called Sherry weight.
00:21:22.860 | Same thing with my--
00:21:25.480 | so this would be a good practice, because later--
00:21:27.800 | Sherry by is-- oh, because I ran this so many times.
00:21:45.400 | Every single time you run, if you don't restart,
00:21:47.680 | that is going to continue to grow your current path.
00:21:51.000 | So to avoid that confusion, let me restart it.
00:21:56.120 | Restart.
00:21:56.620 | I had to wait.
00:22:06.480 | Sorry.
00:22:20.640 | So now, so we have done--
00:22:22.480 | built our input, built our inference graph.
00:22:24.600 | Now we can actually build our training graph.
00:22:27.920 | And as you have all learned, we need to define a loss function.
00:22:31.920 | We need to define an optimizer.
00:22:34.600 | I think it's also called something else--
00:22:37.000 | regularizer, maybe some other terms.
00:22:39.560 | And your ultimate goal is to minimize your loss.
00:22:42.760 | So I'm not going to do it here, but you
00:22:44.520 | can do it at your leisure.
00:22:46.440 | You can uncomment all these things that you have created
00:22:51.880 | and see what they are.
00:22:53.280 | And I can tell you these are different operations.
00:22:55.840 | So that's how you actually get to learn about the network
00:22:59.100 | that you have built really well.
00:23:00.680 | In the next line, I'm also not going to uncomment,
00:23:03.200 | but you should at one point.
00:23:05.740 | This is how you can see what you have built.
00:23:08.920 | So actually, why don't we do that?
00:23:10.680 | Because this is really critical.
00:23:12.080 | And as you debug, this would become--
00:23:14.480 | so this is the network that you have built.
00:23:20.680 | They have names, different names.
00:23:23.480 | They have inputs and outputs.
00:23:24.720 | They have attributes.
00:23:25.560 | And this is how we connect all these nodes together.
00:23:29.920 | This is your neural net.
00:23:31.480 | So what you're seeing right now is your neural net
00:23:34.560 | that you have just built. Yeah?
00:23:37.560 | Everybody following?
00:23:39.560 | So now, the next step-- now you're done.
00:23:41.760 | You build your network.
00:23:43.680 | You build all your training.
00:23:44.960 | Now, let's do some training.
00:23:47.920 | So in TensorFlow, do you remember in the architecture
00:23:50.920 | that I showed, you have the front end, C++ and Python
00:23:53.800 | front end.
00:23:54.300 | You use that to build your graphs.
00:23:56.400 | And then you send a graph to your runtime.
00:23:59.040 | And this is exactly what we're doing here.
00:24:01.480 | This is how we talk to the runtime.
00:24:03.240 | We create something called a session.
00:24:05.320 | You get a handle to the session.
00:24:07.400 | And then when you say run, you're
00:24:09.280 | basically sending this session, your graph.
00:24:12.680 | So this is different from the other machine learning
00:24:15.840 | libraries.
00:24:16.720 | I forgot which one.
00:24:17.840 | Those are so-called imperative.
00:24:19.120 | It happens as you type.
00:24:20.880 | TensorFlow is different.
00:24:21.960 | You have to construct your graph.
00:24:23.640 | And then you create a session to talk to your runtime
00:24:26.500 | so that it knows how to run on your different devices.
00:24:29.440 | That's a very important concept because people constantly
00:24:34.200 | compare.
00:24:34.760 | And it's just different.
00:24:38.000 | So now you can also comment to see what the initial values
00:24:45.280 | But we're not going to do that.
00:24:46.640 | We're just going to run it.
00:24:48.560 | And now we're going to train.
00:24:49.720 | The data is not so--
00:24:56.200 | what do you think of the data?
00:24:58.760 | Did we succeed in guessing?
00:25:01.000 | Is everybody following what we are trying to do?
00:25:07.000 | [CHUCKLES]
00:25:08.720 | Yeah?
00:25:12.160 | So what was our objective before I started the lab?
00:25:15.320 | What did I say our objective was?
00:25:18.760 | Find the mic.
00:25:19.920 | Yes, to guess the mystery function.
00:25:21.720 | So have we succeeded?
00:25:24.360 | It's really hard to tell.
00:25:26.040 | All right, so now all of you can go to the end
00:25:29.240 | and comment this part.
00:25:31.360 | Let's see how successful we are.
00:25:33.840 | [PAUSE]
00:25:36.280 | So the green line was what we have initialized our weight
00:25:46.400 | and bias to.
00:25:49.000 | Yeah?
00:25:49.560 | The blue dots were the initial value, the target values.
00:25:55.880 | And the red dots is our trained value.
00:26:00.240 | Make sense?
00:26:01.840 | So how successful are we?
00:26:04.120 | Great big success?
00:26:05.800 | Yeah, I would say so.
00:26:07.600 | So any questions?
00:26:11.120 | Any questions so far?
00:26:12.720 | So what are the things?
00:26:13.680 | So everybody should play with this.
00:26:15.120 | You're not going to break it.
00:26:16.320 | This is a notebook, Python notebook.
00:26:18.400 | The worst that happens is they would just say, OK, clear all.
00:26:21.160 | Like, well, I just did, and change it.
00:26:23.240 | So what can you play with?
00:26:24.320 | Since today you learned all these concepts
00:26:26.600 | about different loss functions, different optimizers,
00:26:29.640 | all this crazy different inputs, different data.
00:26:32.000 | So now you can play with it.
00:26:33.880 | How about instead of--
00:26:36.600 | let's pick one.
00:26:39.120 | So instead of gradient descent, what are the other optimizers?
00:26:47.160 | How can you find out?
00:26:48.080 | I guess that's a better question.
00:26:49.960 | If I want to know what other optimizers are
00:26:51.760 | available in TensorFlow, how can I find out?
00:26:55.480 | Very good.
00:26:56.440 | Yes, the GitHub, Google 3, the G3 doc link with the APIs.
00:27:03.240 | I'm going to switch one more tab.
00:27:05.240 | Bear with me.
00:27:06.280 | So this is-- when you go there, this is what you can find.
00:27:10.720 | You can find all the--
00:27:12.440 | let me make it bigger.
00:27:15.800 | So you can find all the different optimizers.
00:27:17.840 | So you can play with that.
00:27:19.400 | So maybe gradient descent is not the best optimizer you can use.
00:27:24.340 | So you go there and say, what are the other optimizers?
00:27:28.400 | And then you can literally come here and search optimizer.
00:27:32.600 | Well, you can say, wow, I have add the delta, add a grad,
00:27:38.200 | add them.
00:27:39.400 | I'm sure there are more-- a momentum.
00:27:41.920 | So we also welcome contribution.
00:27:43.800 | If you don't like any of these, please do go contribute.
00:27:48.040 | A new optimizer, send a pull request.
00:27:50.120 | We would love to have it.
00:27:51.640 | So I would like to say this over and over again.
00:27:53.960 | We love contribution.
00:27:55.000 | It's an open source project.
00:27:56.680 | So keep that in mind.
00:27:57.660 | We would love to see your code or your models on GitHub.
00:28:03.160 | So back to this one.
00:28:09.960 | How is everybody feeling?
00:28:11.120 | This is too simple?
00:28:12.320 | Yeah?
00:28:13.240 | Should we go register?
00:28:15.880 | [INAUDIBLE]
00:28:23.360 | Can I say that one?
00:28:26.760 | Yeah.
00:28:27.260 | I mean, the gist is not good.
00:28:29.720 | The optimization rule says that it's
00:28:33.720 | up to all the [INAUDIBLE]
00:28:37.160 | Oh, is that right?
00:28:39.480 | Hit Tab to see all the other optimizers you meant?
00:28:43.080 | Oh, brilliant.
00:28:43.960 | See, I didn't even know that.
00:28:45.400 | Learn something new every day.
00:28:47.400 | Let me go there.
00:28:51.280 | Here?
00:28:53.080 | Oh, yay.
00:28:56.160 | So this is even easier.
00:28:58.480 | Thank you.
00:28:59.640 | Clearly, I don't program in Notebook
00:29:01.720 | as often as I should have.
00:29:04.640 | So this is where you can-- all the wonderful things
00:29:07.120 | that you can do.
00:29:07.920 | Thank you.
00:29:08.440 | This is probably a little too low level.
00:29:14.560 | I think it has everything.
00:29:17.800 | But that's a very good tip.
00:29:18.960 | Thank you.
00:29:20.760 | So anything else you would like to see with linear regression?
00:29:23.640 | It's too simple.
00:29:24.320 | You guys all want to recognize some digits.
00:29:28.960 | All right.
00:29:29.840 | So that sounds like a consensus to me.
00:29:33.960 | So let's move.
00:29:34.680 | If you just go to the bottom, you can say--
00:29:37.520 | click on this one.
00:29:38.320 | [VIDEO PLAYBACK]
00:29:51.240 | So this is our MNIST model.
00:29:53.960 | So before we start the lab, so once again,
00:29:56.080 | what are we trying to do?
00:29:58.720 | So we have all these handwritten digits.
00:30:00.640 | What does MNIST stand for?
00:30:02.280 | Does anybody know?
00:30:03.280 | What does MNIST stand for?
00:30:04.360 | [INAUDIBLE]
00:30:12.240 | Very good.
00:30:12.760 | See, somebody can Google.
00:30:15.280 | Very good.
00:30:16.920 | So it stands for, I think, Mixed National Institute
00:30:20.360 | of Standards and Technology, something like that.
00:30:24.040 | So they have this giant collection of digits.
00:30:26.520 | So if you go to the post office, you already
00:30:30.080 | know that it's a trivia.
00:30:31.280 | It's a solved problem.
00:30:32.200 | But I don't know if they actually
00:30:33.840 | use machine learning.
00:30:35.040 | But our goal today is to build a little network using TensorFlow
00:30:40.920 | that can recognize these digits.
00:30:44.280 | Once again, we will not have all the answers.
00:30:47.040 | So all we know is that the network, the input
00:30:50.760 | will give us a 1.
00:30:52.320 | And then we'll say it's a 9.
00:30:54.880 | And then we have the so-called ground truth.
00:30:59.560 | And then they will look at it and say, no, you're wrong.
00:31:01.880 | And then we'll have to say, OK, fine.
00:31:03.440 | This is the difference.
00:31:04.520 | We are going to train the network that way.
00:31:06.840 | So that's our goal.
00:31:09.840 | Yeah?
00:31:10.520 | Everybody see the network on the slide?
00:31:12.140 | So now we can go to the lab.
00:31:14.600 | So can anybody tell me what are the three or four things that's
00:31:23.280 | really important whenever you build a network?
00:31:25.640 | What's the first one?
00:31:28.040 | Your data.
00:31:28.720 | Second one?
00:31:31.640 | Inference graph.
00:31:32.560 | Third one?
00:31:35.600 | Your train graph.
00:31:36.680 | And with this lab, I'm going to teach you a little bit more.
00:31:40.320 | They are like the rock.
00:31:42.400 | Like when you go to a restaurant, I not only give you
00:31:44.560 | your lobster or your Kobe beef, I'm
00:31:46.720 | also going to give you a little rock so you can cook it.
00:31:50.000 | So in this lab, I've also teach some absolutely critical
00:31:54.480 | additional infrastructure pieces,
00:31:56.480 | such as how to save a checkpoint,
00:31:59.200 | how to load from a checkpoint, and how
00:32:01.200 | do you evaluate your network.
00:32:03.160 | I think somebody at one point asked,
00:32:05.160 | how do you know the network is enough?
00:32:07.040 | You evaluate it to see if it's good enough.
00:32:09.520 | So those are the three new pieces of information
00:32:12.560 | that I'll be teaching you.
00:32:14.560 | And also, I'll teach you a really, really useful concept.
00:32:18.000 | It's called placeholder.
00:32:20.600 | That was requested by all the researchers.
00:32:22.840 | We didn't used to have it, but they all came to us and say,
00:32:26.520 | when I train, I want to be able to feed my network any data
00:32:29.240 | we want.
00:32:29.760 | So that's a really key concept that's
00:32:31.440 | really useful for any practical training.
00:32:34.600 | Whenever you start writing real training code,
00:32:37.280 | I think that will come in handy.
00:32:39.040 | So those are the, I think, four concepts now
00:32:41.880 | that I will introduce in this lab that's
00:32:44.080 | slightly different from the previous one--
00:32:46.240 | how to save checkpoint, how to load from checkpoint,
00:32:48.800 | how to run evaluation, and how to use placeholders.
00:32:51.440 | I think the placeholder is actually
00:32:52.960 | going to be the first one.
00:32:54.160 | So once again, we have our typical boilerplate stuff.
00:32:57.760 | So you hit Return, you import a bunch of libraries.
00:33:02.920 | The second one, this is just for convenience.
00:33:06.840 | I define a set of constants.
00:33:10.040 | Some of them you can play with, such as the maximum number
00:33:13.000 | of steps, where you're going to save all your data,
00:33:16.320 | how big the batch sizes are, but some other things
00:33:19.320 | that you cannot change because of the data
00:33:21.320 | that I'm providing you.
00:33:22.400 | For example, the MNIST pictures.
00:33:26.840 | Any questions so far?
00:33:27.880 | So now we'll read some data.
00:33:35.000 | Is everybody there in 2.3?
00:33:36.840 | I'm at 2.3 right now.
00:33:38.680 | So now I use--
00:33:41.120 | if you don't have /tmp, it might be an issue,
00:33:43.520 | but hopefully you do.
00:33:46.000 | If you don't have /tmp, change the directory name.
00:33:52.360 | So the next one is where we build inference.
00:33:54.600 | So can anybody just glance and then
00:33:56.360 | tell me what we're building?
00:33:58.840 | What kind of network?
00:33:59.760 | How many layers am I building?
00:34:01.120 | I have two hidden layers.
00:34:07.480 | You have all learned hidden layers today.
00:34:12.080 | And I also have a linear layer, which will produce logits.
00:34:19.520 | That's correct.
00:34:20.520 | So that's what all the inference graphs will always do.
00:34:23.840 | They always construct your graph,
00:34:26.080 | and they produce logistic outputs.
00:34:28.520 | So once again, here you can uncomment it and see
00:34:31.480 | what kind of graph you have built.
00:34:34.440 | Once you have done the whole tutorial by yourself,
00:34:38.880 | you can actually run TensorBoard,
00:34:41.480 | and you can actually load this graph that you have saved.
00:34:44.920 | And you can visualize it, like what I have shown in the slide.
00:34:48.600 | I didn't draw that slide by hand.
00:34:51.200 | It's actually produced by TensorBoard.
00:34:53.160 | So you can see the connection of all your nodes.
00:34:56.200 | So I feel that that visual representation
00:34:58.840 | is really important.
00:34:59.680 | Also, it's very easy for you to validate
00:35:01.560 | that you have indeed built a graph that you thought.
00:35:04.040 | Sometimes people call something repeatedly,
00:35:06.480 | and they have generated this gigantic graph.
00:35:08.480 | They're like, oh, that wasn't what I meant.
00:35:10.400 | So being able to visualize is really important.
00:35:14.560 | Any questions so far?
00:35:16.760 | See here, I have good habits.
00:35:18.280 | I actually gave all my variables names.
00:35:20.640 | Once again, the hidden layer 1, hidden layer 2.
00:35:22.960 | They all have weights and biases, weights and biases,
00:35:26.440 | et cetera.
00:35:28.040 | So now we're going to build our train graph.
00:35:30.160 | So here is-- actually, here, there's no new concept.
00:35:35.800 | Once again, you define the loss function.
00:35:38.720 | We once again pick gradient descent as our optimizer.
00:35:41.920 | We added a global step variable.
00:35:44.560 | That's what we will use later when we save our checkpoints.
00:35:48.000 | So you actually know at which point, what checkpoint
00:35:52.480 | this corresponds to.
00:35:53.480 | Otherwise, if you always save it to the same name,
00:35:55.960 | then later you say, wow, this result is so wonderful.
00:36:01.880 | But how long did it take?
00:36:03.240 | You have no idea.
00:36:04.040 | So that's a training concept that we introduced.
00:36:07.480 | It's called global step, basically
00:36:08.940 | how long you have trained.
00:36:10.360 | And we usually save that with the checkpoint
00:36:13.080 | so you know which checkpoint has the best information.
00:36:18.080 | Yeah, everybody is good at 2.5?
00:36:21.040 | So now the next one is the additional stuff
00:36:23.440 | that I just mentioned.
00:36:26.720 | That piece of rock that I'm giving you now
00:36:28.520 | to cook your stuff.
00:36:29.880 | So one is a placeholder.
00:36:33.040 | So we are going to define two, one to hold your image
00:36:36.080 | and the other to hold your labels.
00:36:39.760 | We build it this way so that we only
00:36:41.740 | need to build a graph once.
00:36:43.880 | And we will be able to use it for both training, inference,
00:36:47.960 | and evaluation later.
00:36:49.560 | It's very handy.
00:36:51.320 | You don't have to do it this way.
00:36:52.880 | And one of the exercises I put in my slide
00:36:55.080 | is to try to do it differently.
00:36:57.240 | But this is a very handy way and get you
00:36:59.000 | very far with minimum work.
00:37:01.040 | So as I said in the slides, I know
00:37:05.280 | I don't have any highlighters, beams.
00:37:09.120 | But you see there it says, after you create your placeholders,
00:37:13.440 | I said, add to collection and remember this up.
00:37:17.680 | And later we'll see how we're going to call this up
00:37:19.920 | and how we're going to use it.
00:37:23.160 | And the next one, we're going to call our inference,
00:37:26.360 | build our inference.
00:37:29.480 | Is everybody following this part OK?
00:37:32.200 | And once again, we remember our logits.
00:37:33.920 | And then we create our train op and our loss op,
00:37:39.880 | just like with linear regression.
00:37:44.000 | Just like with the linear regression,
00:37:45.540 | we're going to initialize all our variables.
00:37:48.080 | And now at the bottom of this cell,
00:37:52.560 | that's the second new concept that I'm introducing,
00:37:55.160 | which is the saver.
00:37:56.760 | This is what you will use to do checkpoints,
00:37:59.720 | to save the states of your network
00:38:01.640 | so that later you can evaluate it.
00:38:03.800 | Or if your training was interrupted,
00:38:06.480 | you can load from a previous checkpoint
00:38:08.280 | and continue training from there,
00:38:10.120 | rather than always reinitialize all your variables
00:38:13.840 | and start from scratch.
00:38:15.200 | When you're training really big networks, such as Inception,
00:38:17.820 | it's absolutely critical.
00:38:19.500 | Because I think when I first trained Inception,
00:38:23.540 | it took probably six days.
00:38:26.060 | And then later, when we have 50 replicas,
00:38:28.240 | it took still-- like, stay of the hour is still 2 and 1/2
00:38:30.920 | days.
00:38:31.520 | You don't want to have to start from scratch every single time.
00:38:36.140 | So yeah, everybody got that?
00:38:38.860 | The placeholder and the saver.
00:38:42.780 | So now it's 2.7.
00:38:45.340 | We're going to go to 2.7.
00:38:51.300 | Lots of code.
00:38:52.620 | Can anybody tell me what it's trying to do?
00:38:54.420 | So this is an-- yes.
00:39:07.300 | So it's trying to minimize loss.
00:39:10.820 | We can actually see this.
00:39:14.340 | So we'll run it once, OK?
00:39:16.300 | Where did I go?
00:39:24.740 | Very fast.
00:39:25.580 | It's done.
00:39:28.500 | But what if I really want to see what it's doing?
00:39:32.580 | So Python is wonderful.
00:39:35.620 | So I would like to actually see--
00:39:38.100 | did somebody show how you know your training
00:39:40.660 | is going well?
00:39:41.420 | They show the loss going down, going down.
00:39:43.460 | Oh, I think my training is going really well.
00:39:46.060 | So we're going to do something similar.
00:39:48.420 | Sorry.
00:39:49.620 | So I'm going to create a variable.
00:39:52.860 | What do you call it?
00:39:53.700 | Losses?
00:39:56.340 | Which is just an array.
00:40:00.060 | So here, I'm actually going to remember it.
00:40:06.300 | Pinned.
00:40:06.800 | So what am I collecting?
00:40:17.260 | Matplotlib.
00:40:25.280 | Anybody remember this?
00:40:35.380 | It's a plot.
00:40:38.500 | Let's try this.
00:40:39.260 | Oh, look at that.
00:40:49.100 | Now, do you see your loss going down?
00:40:51.820 | So as you train, your loss actually goes down.
00:40:55.900 | So this is how, when you do large-scale training,
00:40:59.660 | this is what we typically do.
00:41:00.820 | We have a gazillion of these jobs running.
00:41:04.060 | In the morning, we would just glance at it,
00:41:06.420 | and we know, oh, which one is doing really, really well.
00:41:09.660 | So of course, that's just when you are prototyping.
00:41:12.820 | That's a really, really handy tool.
00:41:14.580 | But I'm going to show you something even better.
00:41:17.660 | Oh, that's part of the exercise.
00:41:20.260 | Man, I don't have it.
00:41:21.700 | So as one of the exercises, I also
00:41:24.460 | put the answers in the backup slides
00:41:27.580 | that you guys are welcome to cut and paste into a cell.
00:41:30.860 | Then you can actually run all the evaluation
00:41:34.940 | sets against your checkpoint so that you know
00:41:38.100 | how well you're performing.
00:41:39.300 | So you don't have to rely on your eyes,
00:41:42.300 | glancing, oh, my loss is going down,
00:41:44.620 | or relying on validating a single image.
00:41:48.700 | But see, this is how easy it is.
00:41:50.540 | This is how easy the prototype.
00:41:52.260 | And you can learn it.
00:41:53.700 | Very often, our researchers will cut and paste their Colab code
00:41:58.380 | and put it in a file, and that's basically their algorithm.
00:42:02.260 | And they will publish that with their paper.
00:42:04.740 | They would send it to our data scientists or production
00:42:09.420 | people.
00:42:10.060 | We would actually prototype some of their research.
00:42:13.300 | This is how easy, literally, from research to prototyping
00:42:16.460 | to production.
00:42:17.740 | Really streamlined, and you can do it in no time.
00:42:22.140 | So for those of you who have run this step,
00:42:25.340 | can you do an LS in your data path,
00:42:27.980 | wherever you saved that, wherever you declare
00:42:32.140 | your trainer to be?
00:42:33.980 | What do you see in there?
00:42:34.980 | Checkpoints.
00:42:39.380 | That's right.
00:42:40.460 | That's the money.
00:42:42.460 | That's after all this work, all this training
00:42:46.940 | on all these gazillion machines.
00:42:48.580 | That's where all your ways, your biases are stored,
00:42:52.180 | so that later you can load this network up and do your inception
00:42:58.540 | to recognize images, to reply to email, to do art,
00:43:03.900 | et cetera, et cetera.
00:43:04.980 | So that's really critical.
00:43:06.340 | But how do we use it?
00:43:09.100 | Have no fear.
00:43:10.700 | All right, let's move on to 2.8, if you are not already there.
00:43:15.140 | So can somebody tell me what we are trying to do first?
00:43:22.060 | That's right.
00:43:22.620 | First, we load the checkpoint.
00:43:24.140 | And you remember all the things that we told our program
00:43:28.700 | to remember, the logits, and the image placeholder,
00:43:33.380 | and the label placeholder.
00:43:34.580 | How are we going to use it now?
00:43:36.900 | We're going to feed it some images from our evaluation
00:43:40.260 | and see what it thinks.
00:43:41.980 | So now if you hit Return, what's the ground truth?
00:43:50.340 | Five.
00:43:51.180 | What's our prediction?
00:43:52.180 | Three.
00:43:55.500 | What's the actual image?
00:43:56.580 | Could be three, could be five.
00:44:02.580 | But so the machine is getting pretty close.
00:44:05.660 | I would say that's a three.
00:44:07.020 | OK, let's try a different one.
00:44:13.100 | So you can hit Return again in the same cell.
00:44:16.340 | Oh, I need to somehow move this.
00:44:18.140 | So what's the ground truth this time?
00:44:19.660 | [INAUDIBLE]
00:44:21.540 | Yeah, I got it right.
00:44:23.100 | So you can keep hitting.
00:44:24.500 | You can keep hitting Return and see how well it's doing.
00:44:29.540 | But instead of validating, instead
00:44:31.940 | of hitting Return 100 times and count how many times
00:44:34.860 | it has gotten it wrong, as I said in one of the exercises,
00:44:38.460 | and I also put the answer in the slides,
00:44:41.500 | so you can cut and paste and actually
00:44:42.980 | do a complete validation on the whole validation set.
00:44:48.660 | But what do you think?
00:44:50.980 | So you can actually handwrite a different digit.
00:44:54.460 | But the trick is that a lot of people actually tried that
00:44:56.780 | and told me it doesn't seem to work.
00:44:59.460 | So remember on the slide, I said this is what the machine sees.
00:45:03.300 | This is what your eye sees, and this is what the machine sees.
00:45:06.420 | So in the MNIST data set, all the numbers
00:45:09.100 | are between 0 and 1, I believe.
00:45:11.820 | I could be wrong, but I believe it's between 0 and 1.
00:45:14.300 | So if you just use a random tool like your phone,
00:45:17.100 | you write a number and you upload it, number one,
00:45:20.100 | the picture might be too big and you need to scale it down.
00:45:25.060 | Number two, it might have a different representation.
00:45:27.900 | Sometimes it's from 0 to 255, and you
00:45:30.420 | need to scale it to the range, that MNIST.
00:45:34.460 | That's how you have trained your network.
00:45:36.500 | If you train your network with those data,
00:45:38.580 | and then it should be able to recognize
00:45:40.220 | the same set of data, just like when we teach a baby, right?
00:45:43.900 | If you have never been exposed to something,
00:45:46.860 | you are not going to be able to recognize it.
00:45:49.580 | Just like with the OREO, one of our colleagues
00:45:54.460 | captioned that program a while ago.
00:45:59.100 | Any time when it sees something that it doesn't recognize--
00:46:02.100 | have anybody played with that captioning software?
00:46:05.460 | It's super fun.
00:46:07.460 | So you can take a picture and say,
00:46:09.620 | two people eating pizza or dog surfing.
00:46:14.740 | But any time it sees something that it has never
00:46:17.380 | been trained on, it would say, man talking on a cell phone.
00:46:21.540 | So for a while, we had a lot of fun with it.
00:46:23.980 | We would put a watermelon on the post,
00:46:26.060 | and it would say, man talking on a cell phone.
00:46:28.140 | You put a bunch of furniture in the room with nothing,
00:46:31.220 | and it would say, man talking on a cell phone.
00:46:33.140 | So it was really fun.
00:46:34.100 | But just like with your numbers, if you have never
00:46:36.860 | trained it with that style--
00:46:40.100 | like if I write Chinese characters here,
00:46:42.420 | it's never going to recognize it.
00:46:44.100 | But this is pretty fun, so you can play with it.
00:46:46.900 | You can see how well--
00:46:48.780 | see every time.
00:46:49.620 | See, so far, it's 100% other than the first one,
00:46:52.100 | which I cannot tell either.
00:46:54.740 | So what are some of the exercises that we can do here?
00:46:57.900 | What do you want to do with this lab?
00:47:00.220 | It's too easy, huh?
00:47:01.580 | Because I made this so easy, because I
00:47:03.260 | didn't know that you guys are all experts by now.
00:47:05.860 | Otherwise, I would have done a much harder lab.
00:47:09.660 | Let me see what things we can do.
00:47:12.620 | So you can uncomment all the graphs.
00:47:16.940 | Oh, so here's one.
00:47:18.900 | Actually, you already see it.
00:47:20.700 | So try this.
00:47:21.820 | Can you guys try saving the checkpoint, say, every 100
00:47:26.460 | steps?
00:47:26.960 | And you're going to have a gazillion,
00:47:32.380 | but they're tiny, tiny checkpoints, so it's OK.
00:47:34.860 | And try the run evaluation with a different checkpoint
00:47:37.700 | and see what you get.
00:47:38.580 | Do you know how to do that?
00:47:39.740 | Yeah, everybody know how to do that?
00:47:41.260 | So the idea is that when you run the evaluation,
00:47:52.100 | it's very similar.
00:47:54.700 | So we typically run training and evaluation
00:47:57.060 | in parallel or validation.
00:47:59.380 | So as it trains, every so often, say, every half an hour,
00:48:05.140 | depending on your problem, so with the inception,
00:48:08.220 | every 10 minutes, we would also run evaluation
00:48:11.220 | to see how well our model is doing.
00:48:13.300 | So if our model gets to, say, 78.6%, which I believe
00:48:16.900 | is the state of the art, it would be like, oh,
00:48:18.860 | my model's done training.
00:48:20.260 | So that's why you want to save checkpoints often and then
00:48:25.140 | validate them often.
00:48:26.100 | If you're done with that already,
00:48:30.140 | this is the last thing I want to show you.
00:48:32.980 | If you're done with that already,
00:48:34.900 | did you notice anything?
00:48:35.980 | If you try to load from a really early checkpoint,
00:48:41.620 | how good is it when it tries to identify the digits?
00:48:48.500 | Just take a wild guess.
00:48:49.820 | Yeah, very bad.
00:48:53.060 | Maybe every other one is wrong.
00:48:55.980 | But this MNIST is such a small data set.
00:48:58.140 | It's very easy to train.
00:48:59.500 | And we have such a deep network.
00:49:01.420 | If you only have one layer, maybe it won't get it right.
00:49:03.740 | So another exercise-- I think all these you
00:49:10.420 | can do after this session--
00:49:13.940 | is really try to learn to run evaluation from scratch
00:49:18.980 | rather than--
00:49:20.580 | actually, another part-- but run evaluation
00:49:22.900 | on the complete validation set.
00:49:25.380 | That's a really necessary skill to develop
00:49:28.420 | as you build bigger models and you need to run validation.
00:49:31.940 | [END PLAYBACK]
00:49:34.420 | So I think this is the end of my lab.
00:49:39.540 | I do have bonus labs.
00:49:42.700 | But I want to cover this first.
00:49:44.500 | The bottom line is that TensorFlow is really--
00:49:47.620 | it's for machine learning.
00:49:48.700 | It's really from research to prototyping to production.
00:49:51.860 | It's really designed for that.
00:49:53.540 | And I really hope everybody in the audience can give it a try.
00:49:58.340 | And if there are any features that you find it lacking
00:50:01.700 | that you would like to see implemented,
00:50:04.860 | either send us pull requests.
00:50:06.780 | We always welcome contribution.
00:50:08.980 | Or talk to my wonderful product manager, Zach,
00:50:11.660 | sitting over there.
00:50:12.540 | He is taking requests for features.
00:50:16.620 | So with that, yeah, thanks and have fun.
00:50:19.580 | [APPLAUSE]
00:50:21.980 | Thank you, Sherry.
00:50:27.660 | We have time for questions for those who actually tried it.
00:50:31.820 | See, it's so well done.
00:50:35.380 | Everybody feel like they're experts.
00:50:36.900 | They're all ready to go make arts now, right?
00:50:39.420 | Go deep dream.
00:50:40.260 | Cool.
00:50:45.580 | If there are no questions--
00:50:49.140 | oh, there's one question, I think,
00:50:50.500 | someone who's trying desperately.
00:50:56.220 | Hi, my name is Pichin Lo.
00:50:57.940 | And first of all, thank you for introducing TensorFlow
00:51:01.860 | and for designing it.
00:51:04.020 | I have two questions.
00:51:04.980 | So the first question is, I know that TensorFlow
00:51:08.860 | have C++ API, right?
00:51:11.620 | So let's say if I use Keras or any of the Python front end,
00:51:15.780 | I train a model.
00:51:16.620 | Does TensorFlow support that I can pull out the C++ model
00:51:24.060 | of it and then just use that?
00:51:26.260 | Yes, you can.
00:51:27.340 | So even if I use, for example, Keras custom layer
00:51:31.060 | that I code using Python, I still can get those things?
00:51:33.660 | That's correct.
00:51:34.460 | Oh, there it goes.
00:51:34.940 | It's just the front end that's different,
00:51:36.660 | how you construct the graph.
00:51:38.020 | Nice.
00:51:38.540 | But we are not as complete on our C++ API design.
00:51:43.740 | For example, a lot of the training libraries
00:51:46.660 | are not complete yet.
00:51:48.380 | But for the simple models, yes, you can do it.
00:51:52.300 | Well, let's say-- not the training,
00:51:54.100 | but let's say if I just want the testing part.
00:51:56.500 | Because I don't need to do--
00:51:57.660 | I mean, the training I can always do in Python.
00:51:59.660 | We do have that already.
00:52:01.180 | Actually, if you go to our website,
00:52:03.140 | there's a label images, .cc.
00:52:05.820 | I think that's literally just loading from Checkpoint
00:52:08.620 | and run the inference in C. That's all written in C++.
00:52:13.140 | So that's a good example to follow.
00:52:16.140 | A second one.
00:52:16.900 | So another thing that I noticed that you support almost
00:52:20.100 | everything except Windows.
00:52:22.860 | Everything except what?
00:52:25.220 | I mean, iOS, Android, everything.
00:52:26.940 | Oh, have no fear.
00:52:27.900 | Actually, we are actively doing that.
00:52:30.620 | But when I first joined the team,
00:52:32.260 | I think there were 10 of us.
00:52:33.620 | And we have to do everything.
00:52:35.340 | Like before open sourcing, all of us
00:52:37.740 | were in the conference room together.
00:52:40.940 | We're all riding dogs.
00:52:41.900 | We're fixing everything.
00:52:43.100 | So now we have more people.
00:52:45.620 | That's like top of our list.
00:52:47.220 | We would love to support it.
00:52:48.820 | So I'm just curious, because I mean,
00:52:50.780 | when I look at the roadmap, I didn't see a clear timeline
00:52:54.420 | for Windows.
00:52:55.980 | But the thing I know that just like the reason why you cannot
00:52:58.540 | support Windows is because of Bazel.
00:53:00.700 | Bazel doesn't support Windows.
00:53:02.620 | So let's say, theoretically, I mean, what you think,
00:53:06.020 | just like I know Bazel that just like you
00:53:08.060 | will get Window 5 at some point in November.
00:53:11.820 | That is what they say.
00:53:12.860 | So once Bazel can run in Windows,
00:53:15.700 | can I expect like just like immediately do TensorFlow,
00:53:18.580 | or do you foresee some other problem?
00:53:20.980 | Maybe Zach would like to take that question.
00:53:22.940 | [LAUGHTER]
00:53:25.340 | Offline.
00:53:26.340 | [LAUGHTER]
00:53:27.500 | OK, that's it.
00:53:28.100 | So yeah, let's talk offline.
00:53:29.340 | Yeah, sure.
00:53:29.860 | Thank you very much.
00:53:32.380 | Hi, great presentation and session.
00:53:35.020 | My name is Yuri Zifoysh.
00:53:36.060 | I have a question about TPUs.
00:53:37.660 | Are they available right now for testing and playing
00:53:40.980 | for non-Google employees?
00:53:47.780 | Are we--
00:53:48.740 | Is TPU-- are TPUs available outside?
00:53:52.020 | I don't think so at the moment.
00:53:54.940 | Do you know when it might be available in the Google Cloud?
00:53:57.460 | Zach, would you like to take that one?
00:53:59.080 | [LAUGHTER]
00:54:00.060 | It might be afterwards.
00:54:01.020 | [LAUGHTER]
00:54:04.180 | I'm so glad we have a product boss here so that he can--
00:54:08.380 | I'm sorry.
00:54:08.880 | OK, thank you.
00:54:11.620 | Hi, nice tutorial.
00:54:12.700 | I have a question.
00:54:13.860 | Are there any plans to integrate TensorFlow
00:54:18.620 | with the open source framework, like MySource and HDFS,
00:54:24.340 | to make the distributed TensorFlow run--
00:54:26.900 | Easy.
00:54:27.980 | So there are definitely plans.
00:54:30.500 | We are also always actively working on new features.
00:54:33.540 | But we cannot provide a solid timeline right now.
00:54:38.140 | So we do have plans.
00:54:42.460 | We do have projects in progress.
00:54:45.500 | But we cannot commit on a timeline.
00:54:49.180 | So I cannot give you a time saying, yes, by November,
00:54:51.620 | you have what.
00:54:52.980 | So thank you.
00:54:54.340 | But if you have this type of question,
00:54:57.340 | I think Zach is the best person to answer.
00:54:59.340 | Oh, hi.
00:55:06.060 | I was wondering, does TensorFlow have any examples
00:55:08.260 | to load your own data?
00:55:10.100 | Of what-- which data?
00:55:12.860 | So the current example has a MNIST data set.
00:55:15.900 | Are there examples out there to load your own data set?
00:55:19.300 | Yes, definitely.
00:55:20.380 | I think we have two.
00:55:21.980 | One is called the TensorFlow Poet.
00:55:24.860 | I think that one-- that example shows you
00:55:27.140 | how you can load your own data set.
00:55:28.700 | I think-- is there another one?
00:55:35.300 | Zach, are you aware of another one
00:55:36.740 | that might be loading your own data set?
00:55:38.700 | I know we have retraining model.
00:55:41.220 | If you go to TensorFlow, we have an example to do retraining.
00:55:45.020 | Those you can download from anywhere.
00:55:46.780 | So in our example, we just downloaded
00:55:48.580 | a bunch of flowers.
00:55:50.420 | So you can definitely download whatever pictures
00:55:53.260 | that you want to retrain.
00:55:56.140 | Thank you.
00:55:56.640 | Hello.
00:56:00.620 | Thank you for your presentation.
00:56:02.300 | I have a question concerning the training.
00:56:05.140 | You can't train using TensorFlow in any--
00:56:08.820 | virtually in any system, like Android.
00:56:12.340 | And what about the model?
00:56:16.340 | Do you provide anything to move the model to Android?
00:56:20.220 | Because generally, you program in Java there.
00:56:24.420 | So that's a beautiful thing.
00:56:25.460 | You remember the architecture that I showed?
00:56:27.700 | You build a model, and then just send it to the runtime.
00:56:30.180 | It's the same model running on any of the different platforms.
00:56:34.740 | It can be a laptop, Android.
00:56:36.340 | Do you have your own specific format for the model?
00:56:40.900 | Or it's just--
00:56:42.740 | You build the same model.
00:56:43.860 | Because the model is just a bunch of matrix and values.
00:56:48.340 | Is there any special format for your model?
00:56:53.220 | Because sometimes it is bigger.
00:56:56.340 | So I would not recommend training, say,
00:56:58.340 | Inception on your phone, because all the convolution
00:57:00.940 | and the backprop will probably kill it 10 times over.
00:57:04.700 | So definitely-- so there will be that type of limitation.
00:57:10.260 | I think you guys talked about the number of parameters.
00:57:12.500 | If it blows the memory footprint on your phone,
00:57:15.100 | it's just not going to work.
00:57:16.900 | And if the compute--
00:57:18.620 | especially for convolution, it uses a lot of compute.
00:57:21.440 | That's for training.
00:57:22.540 | But--
00:57:23.060 | But for inference, you can run it anywhere.
00:57:25.900 | Thank you.
00:57:26.380 | It's the same model.
00:57:27.220 | You just restore it.
00:57:28.060 | There actually are examples like label image.
00:57:31.740 | That's the C++ version.
00:57:33.420 | I think I also wrote one.
00:57:34.540 | It's called classify image.
00:57:35.660 | It's in Python.
00:57:36.820 | That's also-- you can run it on your phone.
00:57:39.380 | So any of these, you can write your own and load a checkpoint
00:57:43.460 | and run it on your phone as well.
00:57:45.100 | So definitely, I encourage you to do that.
00:57:47.820 | Thank you.
00:57:48.380 | Cool.
00:57:51.780 | I have a question related to TensorFlow serving.
00:57:54.620 | So I went through the online documentation
00:57:57.900 | and currently, I think it requires some coding in C++
00:58:02.220 | and then combined with Python.
00:58:04.740 | Is there only going to be only Python solution that's
00:58:07.700 | going to be provided?
00:58:08.660 | Or is it always going to be--
00:58:10.860 | I think you need to do some first step to create a module
00:58:13.740 | and then just import it into Python.
00:58:17.660 | I am actually surprised to hear that because I'm pretty sure
00:58:21.340 | that you can write the model in just Python or just C++.
00:58:25.860 | You don't have to write it in one way or the other.
00:58:28.140 | They might have a special exporter tool.
00:58:31.300 | At one point, that was the case.
00:58:32.660 | They wrote their exporter in C++.
00:58:35.660 | I think that's probably what you were talking about.
00:58:37.980 | But you don't have to build it in any specific way.
00:58:42.780 | The model is just--
00:58:43.980 | you can write in whatever language you like,
00:58:45.820 | as long as it produces that graph.
00:58:48.780 | And that's all it needs.
00:58:52.020 | So TensorFlow serving, the tutorial, actually,
00:58:55.220 | if you go on the site, it had those steps, actually.
00:58:58.900 | OK, so I will look into that.
00:59:00.580 | So maybe you can come find me later,
00:59:02.180 | and I'll see what the situation is.
00:59:04.220 | I do know that at one point, they were writing the exporter
00:59:06.900 | in C++ only.
00:59:08.540 | But that should have changed by now
00:59:10.300 | because we are doing another version of TensorFlow serving.
00:59:16.420 | And is there any plan to provide APIs for other languages?
00:59:22.260 | Like MXNet has something called MXNetJS.
00:59:27.100 | You mean the front end?
00:59:28.500 | Front end, yes.
00:59:29.140 | Yeah, yeah, yeah.
00:59:29.860 | We have Go.
00:59:30.620 | I think we have Go.
00:59:31.540 | We have some other languages.
00:59:33.180 | Maybe Zach can speak more to it.
00:59:35.420 | And once again, if those languages are not our favorite,
00:59:38.780 | please do contribute.
00:59:40.780 | And if you would like us to do it, talk to Zach.
00:59:43.680 | And maybe he can put that--
00:59:45.180 | maybe-- I don't know.
00:59:46.620 | Because as somebody asked, for the Android,
00:59:48.900 | you need Java front end.
00:59:50.580 | So I think that's going to help out in integrating
00:59:52.860 | these models with--
00:59:54.340 | Yeah, that's great feedback.
00:59:55.900 | We'll definitely take note.
00:59:57.980 | Thank you.
00:59:58.480 | Thank you.
00:59:59.540 | I have a question.
01:00:01.780 | I'm having an embedded GPU board, the X1,
01:00:04.500 | which is an ARM processor.
01:00:06.420 | And I really wanted to work with the TensorFlow,
01:00:08.780 | but I got to know that it can only run on x86 boards.
01:00:13.140 | So when can we expect the TensorFlow
01:00:16.740 | can support ARM processors?
01:00:20.260 | We will have to get back to you after I have
01:00:23.100 | consulted with my product boss, see when we can add that
01:00:27.220 | support.
01:00:29.740 | Thank you.
01:00:30.340 | Sorry.
01:00:30.980 | One last question.
01:00:32.940 | Thanks for the presentation, Sherry.
01:00:34.700 | I have a question regarding the--
01:00:36.980 | when you have the model and you want to run inference,
01:00:41.380 | is it possible to make an executable out of it
01:00:44.460 | so they can drop it into a container
01:00:47.060 | or run it separately from serving?
01:00:49.820 | Is that something that you guys are looking into?
01:00:52.540 | Just run the inference?
01:00:54.580 | Yeah, just have it as a binary.
01:00:56.980 | Yeah, you can definitely do that.
01:00:59.300 | Right now you can?
01:01:00.420 | Yeah, you are always able to do that.
01:01:04.020 | You mean just save the--
01:01:08.660 | you want to--
01:01:09.180 | What I mean is that if you can package it
01:01:11.500 | into a single binary source that you can just pass around.
01:01:15.460 | Yes, yes.
01:01:16.500 | We actually do that today.
01:01:17.820 | That's how the label image works.
01:01:20.300 | It's just its own individual binary.
01:01:23.060 | It actually converted all the checkpoints into constants.
01:01:26.060 | So it doesn't even need to do the slow, et cetera.
01:01:29.220 | It just reads a bunch of constants and runs it.
01:01:31.180 | So it's super fast.
01:01:33.460 | Thank you.
01:01:34.020 | Cool.
01:01:34.540 | You're welcome.
01:01:35.380 | Thanks, Sherry, again.
01:01:36.460 | [APPLAUSE]
01:01:41.580 | We're going to take a short break of 10 minutes.
01:01:43.580 | Let me remind you, for those who haven't noticed yet,
01:01:46.420 | but all the slides of all the talks
01:01:48.260 | will be available on the website.
01:01:49.900 | So do not worry.
01:01:51.100 | They will be available at some point, as soon as we
01:01:53.380 | get them from the speakers.
01:01:55.180 | Oh, I forgot to ask my bonus question.
01:01:57.260 | But in any case, I have a lot of TensorFlow stickers up here.
01:02:00.020 | If you would like one to proudly display on your laptop,
01:02:03.300 | come get it.