back to indexCristos Goodrow: YouTube Algorithm | Lex Fridman Podcast #68
Chapters
0:0 Introduction
3:26 Life-long trajectory through YouTube
7:30 Discovering new ideas on YouTube
13:33 Managing healthy conversation
23:2 YouTube Algorithm
38:0 Analyzing the content of video itself
44:38 Clickbait thumbnails and titles
47:50 Feeling like I'm helping the YouTube algorithm get smarter
50:14 Personalization
51:44 What does success look like for the algorithm?
54:32 Effect of YouTube on society
57:24 Creators
59:33 Burnout
63:27 YouTube algorithm: heuristics, machine learning, human behavior
68:36 How to make a viral video?
70:27 Veritasium: Why Are 96,000,000 Black Balls on This Reservoir?
73:20 Making clips from long-form podcasts
78:7 Moment-by-moment signal of viewer interest
80:4 Why is video understanding such a difficult AI problem?
81:54 Self-supervised learning on video
85:44 What does YouTube look like 10, 20, 30 years from now?
00:00:00.000 |
The following is a conversation with Christos Gudro, 00:00:15.120 |
and every day people watch over 1 billion hours 00:00:24.120 |
For many people, it is not only a source of entertainment, 00:00:27.280 |
but also how we learn new ideas from math and physics videos, 00:00:37.120 |
on some of the most tense, challenging, and impactful 00:00:44.880 |
receive criticism from both viewers and creators, 00:00:48.120 |
as they should, because the engineering task before them 00:00:54.640 |
and the impact of their work is truly world-changing. 00:00:58.680 |
To me, YouTube has been an incredible wellspring 00:01:04.680 |
of lectures that change the way I see many fundamentals 00:01:07.760 |
ideas in math, science, engineering, and philosophy. 00:01:16.840 |
we take in each of our online educational journeys 00:01:36.520 |
recommendation systems will be one of the most impactful 00:02:06.560 |
I'll do one or two minutes after introducing the episode 00:02:22.560 |
I personally use Cash App to send money to friends, 00:02:32.400 |
You can buy fractions of a stock, say $1 worth, 00:02:37.440 |
Brokerage services are provided by Cash App Investing, 00:02:45.960 |
to support one of my favorite organizations called FIRST. 00:02:49.360 |
Best known for their FIRST Robotics and Lego competitions. 00:02:52.880 |
They educate and inspire hundreds of thousands of students 00:02:58.000 |
and have a perfect rating, a charity navigator, 00:03:04.680 |
When you get Cash App from the App Store or Google Play 00:03:16.080 |
that I've personally seen inspire girls and boys 00:03:22.000 |
And now here's my conversation with Christos Goudreau. 00:03:25.680 |
YouTube is the world's second most popular search engine, 00:03:31.400 |
We watch more than 1 billion hours of YouTube videos a day, 00:03:35.480 |
more than Netflix and Facebook video combined. 00:03:38.600 |
YouTube creators upload over 500,000 hours of video 00:03:53.280 |
is just enough for a human to watch in a lifetime. 00:03:56.200 |
So let me ask an absurd philosophical question. 00:04:01.560 |
and there's many people born today with the internet, 00:04:15.200 |
or maybe education or my growth as a human being? 00:04:32.680 |
that YouTube has been really great for my kids, 00:04:38.320 |
she's been watching YouTube for several years. 00:04:42.040 |
She watches Tyler Oakley and the Vlogbrothers. 00:05:00.200 |
and watch professional dancers do that same routine 00:05:10.800 |
And then even my son is a sophomore in college. 00:05:24.200 |
but in a way that would be very hard for anyone to do 00:05:35.160 |
And so I can imagine really good trajectories 00:05:39.920 |
do you think of broadly about that trajectory over a period? 00:05:47.640 |
you just kind of gave a few anecdotal examples. 00:05:51.320 |
But I used to watch certain shows on YouTube. 00:05:59.600 |
from YouTube's perspective, to stay on YouTube, 00:06:03.740 |
So you have to think not just what makes them engage today, 00:06:10.280 |
or this month, but also over a period of years. 00:06:25.240 |
And so I think we've been working on this problem, 00:06:35.280 |
and introduce people who are watching one thing 00:07:05.200 |
and develop a new interest is very, very low. 00:07:19.240 |
but also something that you might be likely to watch. 00:07:25.840 |
between those two things is quite challenging. 00:07:29.520 |
- So the diversity of content, diversity of ideas, 00:07:36.160 |
it's a thing that's almost impossible to define, right? 00:07:51.560 |
I wasn't even aware of a channel called Veritasium, 00:07:54.720 |
which is a great science, physics, whatever channel. 00:08:07.120 |
- Okay, so you're a person who's watching some math channels 00:08:19.000 |
some things from other channels that are related, 00:08:29.440 |
So that's the, maybe the first kind of diversity 00:08:43.680 |
is we basically cluster videos and channels together, 00:08:48.640 |
We do every, almost everything at the video level. 00:08:58.960 |
what is the likelihood that users who watch one cluster 00:09:03.800 |
might also watch another cluster that's very distinct. 00:09:06.640 |
So we may come to find that people who watch science videos 00:09:16.880 |
And so, because of that relationship that we've identified 00:09:25.640 |
and then the measurement of the people who watch both, 00:09:28.480 |
we might recommend a jazz video once in a while. 00:09:31.560 |
- So there's this clustering in the embedding space 00:09:36.560 |
And so you kind of try to look at aggregate statistics 00:09:39.960 |
where if a lot of people that jump from science cluster 00:09:44.960 |
to the jazz cluster tend to remain as engaged 00:09:54.840 |
they should hop back and forth and they'll be happy. 00:09:59.480 |
that a person from who's watching science would like jazz 00:10:06.120 |
I don't know, backyard railroads or something else, right? 00:10:08.640 |
And so we can try to measure these likelihoods 00:10:11.840 |
and use that to make the best recommendation we can. 00:10:16.440 |
- So, okay, so we'll talk about the machine learning of that, 00:10:19.360 |
but I have to linger on things that neither you 00:10:24.320 |
There's gray areas of truth, which is, for example, 00:10:29.320 |
now I can't believe I'm going there, but politics. 00:10:33.140 |
It happens so that certain people believe certain things 00:10:40.240 |
Let's move outside the red versus blue politics 00:10:43.040 |
of today's world, but there's different ideologies. 00:10:46.100 |
For example, in college, I read quite a lot of Ayn Rand. 00:10:50.200 |
I studied, and that's a particular philosophical ideology 00:10:57.100 |
I've kind of moved on from that cluster intellectually, 00:11:00.300 |
but it nevertheless is an interesting cluster. 00:11:06.800 |
of political ideology that's really interesting to explore. 00:11:20.280 |
are often advocating that this is how we achieve utopia 00:11:23.920 |
in this world, and they're pretty certain about it. 00:11:37.860 |
in terms of filtering what people should watch next, 00:11:40.400 |
and in terms of also not letting certain things 00:11:45.900 |
This is exceptionally difficult responsibility. 00:12:05.160 |
doesn't mean that you can literally say anything. 00:12:07.760 |
We as a society have accepted certain restrictions 00:12:14.720 |
There are things like libel laws and things like that. 00:12:17.600 |
And so where we can draw a clear line, we do, 00:12:22.440 |
and we continue to evolve that line over time. 00:12:25.260 |
However, as you pointed out, wherever you draw the line, 00:12:39.400 |
but we will try to reduce the recommendations of them 00:12:43.360 |
or the proliferation of them by demoting them, 00:12:49.820 |
try to raise what we would call authoritative 00:13:24.000 |
that the people who are expressing those point of view 00:13:28.160 |
and offering those positions are authoritative and credible. 00:13:39.400 |
You heard me, I don't care if you leave comments on this. 00:13:44.080 |
But sometimes they're brilliantly funny, which is trolls. 00:14:16.840 |
that this is really important that you're trying to solve it. 00:14:20.040 |
But how do you reduce the meanness of people on YouTube? 00:14:25.040 |
- I understand that anyone who uploads YouTube videos 00:14:30.660 |
has to become resilient to a certain amount of meanness. 00:14:43.080 |
comment ranking, allowing certain features to block people, 00:14:52.360 |
or that trolling behavior less effective on YouTube. 00:15:02.120 |
but it's something that we're gonna keep having to work on. 00:15:10.800 |
where people don't have to suffer this sort of meanness 00:15:16.960 |
I hope we do, but it just does seem to be something 00:15:27.080 |
so you mentioned two things that I kind of agree with. 00:15:41.720 |
Then the other is almost an interface question 00:15:53.820 |
the users of YouTube manage their own conversation? 00:16:04.340 |
without sort of attaching, even like saying that people, 00:16:07.700 |
like what do you mean limiting, sort of curating speech? 00:16:39.520 |
she says something that I personally find very inspiring, 00:16:48.100 |
in a manner so that people 20 and 30 years from now 00:16:55.020 |
They really found a way to strike the right balance 00:16:58.340 |
between the openness and the value that the openness has, 00:17:02.620 |
and also making sure that we are meeting our responsibility 00:17:07.940 |
- So the burden on YouTube actually is quite incredible. 00:17:12.220 |
And the one thing that people don't give enough credit 00:17:16.460 |
to the seriousness and the magnitude of the problem, I think. 00:17:28.940 |
So it's, besides of course running a successful company, 00:17:32.620 |
you're also curating the content of the internet 00:17:44.220 |
is how much of it can be solved with pure machine learning. 00:18:04.540 |
sitting and thinking about what is the nature of truth? 00:18:08.300 |
What is, what are the ideals that we should be promoting? 00:18:36.140 |
And so for instance, when we're building a system 00:18:43.780 |
that are misinformation or borderline policy violations, 00:18:54.120 |
about which of those videos are in which category. 00:19:08.620 |
or apply it to the entire set of billions of YouTube videos. 00:19:13.620 |
And we couldn't get to all the videos on YouTube well 00:19:19.480 |
without the humans, and we couldn't use the humans 00:19:24.380 |
So there's no world in which you have only one 00:19:40.180 |
trying to figure out what are the right policies? 00:19:43.360 |
What are the outcomes based on those policies? 00:19:53.840 |
or build some consensus around what the policies are, 00:19:58.440 |
to implement those policies across all of YouTube. 00:20:10.200 |
And then once we get a lot of training data from them, 00:20:13.580 |
then we apply the machine learning techniques 00:20:17.600 |
- Do you have a sense that these human beings 00:20:24.180 |
Sort of, I mean, that's an interesting question. 00:20:30.160 |
and computer vision in general a lot of annotation, 00:20:32.760 |
and we rarely ask what bias do the annotators have? 00:20:49.140 |
at annotating segmentation at segmenting cars in a scene 00:20:57.560 |
You know, there's specific mechanical reasons for that, 00:21:07.000 |
people are just terrible at annotating trees. 00:21:11.720 |
do you think of, in terms of people reviewing videos 00:21:17.460 |
is there some kind of bias that you're aware of 00:21:38.500 |
That's something that we instruct them to do. 00:21:42.340 |
We ask them to have a bias towards demonstration 00:21:46.740 |
of expertise or credibility or authoritativeness. 00:21:50.620 |
But there are other biases that we wanna make sure 00:22:06.220 |
Another is that you make sure that the people 00:22:13.580 |
and different areas of the United States or of the world. 00:22:29.760 |
because maybe the training data itself comes in 00:22:47.480 |
or has involved some protected class, for instance. 00:22:55.840 |
I'm sure there's a few more that we'll jump back to, 00:23:09.560 |
to make recommendation for what to watch next? 00:23:11.760 |
And it's from a machine learning perspective. 00:23:32.680 |
Even I observe that it's improved quite a bit. 00:23:40.280 |
YouTube uses the best technology we can get from Google 00:23:50.720 |
And of course, the very first things that one thinks about 00:23:54.560 |
is, okay, well, does the word occur in the title, 00:24:04.680 |
where we're mostly trying to do some syntactic match 00:24:16.640 |
For instance, maybe is this video watched a lot 00:24:27.000 |
And then as a result, make sure that that document 00:24:44.700 |
And probably the first real attempt to do that well 00:25:02.660 |
is we observe which videos get watched close together 00:25:15.700 |
where the videos that get watched close together 00:25:18.980 |
by the most people are sort of very close to one another 00:25:24.580 |
close together by the same person or the same people 00:25:34.940 |
that basically represents videos that are very similar 00:25:47.460 |
that are in the same language together, for instance. 00:25:50.140 |
And we didn't even have to think about language. 00:25:55.300 |
And it puts all the videos that are about sports together, 00:25:57.940 |
and it puts most of the music videos together, 00:26:00.020 |
and it puts all of these sorts of videos together 00:26:08.300 |
- So that already cleans up a lot of the problem. 00:26:45.280 |
at recommending videos well to people who are bilingual. 00:26:54.880 |
and I said, "Well, can you give me an example 00:26:56.420 |
"of what problem do you think we have on YouTube 00:26:59.980 |
And so she said, "Well, I'm a researcher in the US, 00:27:12.500 |
and then looked at the Watch Next suggestions, 00:27:18.020 |
"YouTube must think that I speak only English." 00:27:20.980 |
And so she said, "Now, I'm actually originally from Turkey, 00:27:27.380 |
"I really like to watch videos that are in Turkish." 00:27:30.060 |
And so she searched for a video about making the baklava, 00:27:35.980 |
and the Watch Next recommendations were in Turkish. 00:27:38.120 |
And she just couldn't believe how this was possible. 00:27:47.240 |
And it's just sort of an outcome of this related graph 00:27:51.260 |
that's created through collaborative filtering. 00:28:03.340 |
to discover what individual people wanna watch next. 00:28:19.260 |
Just what I've searched, three years, for many, many years. 00:28:22.620 |
And it's a fascinating picture of who I am, actually. 00:28:31.580 |
A summary of who I am as a person on the internet, to me. 00:28:42.060 |
you know, that's actually quite revealing and interesting. 00:28:49.620 |
but not really, it's the number of cat videos I've watched. 00:29:16.100 |
sort of, as opposed to just using this information 00:29:22.040 |
here are the clusters you've loved over the years, 00:29:33.020 |
to see what it is that you've been watching on YouTube. 00:29:57.860 |
the way the recommendation system of YouTube sees a user 00:30:10.840 |
or any user on YouTube as kind of like a DNA strand 00:30:35.660 |
which other vectors are close to me, to my vector? 00:30:43.540 |
that we generate some diverse recommendations 00:30:50.480 |
with respect to the videos they've watched on YouTube, 00:30:59.400 |
That could be an opportunity to make a good recommendation. 00:31:04.400 |
I'm gonna ask for things that are impossible, 00:31:05.920 |
but I would love to cluster them human beings. 00:31:09.560 |
Like I would love to know who has similar trajectories as me 00:31:13.240 |
'cause you probably would wanna hang out, right? 00:31:17.920 |
Like actually finding some of the most fascinating people 00:31:48.720 |
So the measure of quality, is it just something? 00:31:52.880 |
Yeah, how do you know that something is good? 00:32:13.080 |
relies on having a journalism department, right? 00:32:29.800 |
quality has a lot to do with the authoritativeness 00:32:36.480 |
Now, if you think about the other end of the spectrum, 00:32:41.120 |
you know, what is the highest quality prank video? 00:32:43.560 |
Or what is the highest quality Minecraft video, right? 00:32:48.200 |
That might be the one that people enjoy watching the most 00:32:53.920 |
Or it might be the one that when we ask people the next day 00:32:58.920 |
after they watched it, were they satisfied with it? 00:33:04.120 |
And so we, especially in the realm of entertainment, 00:33:09.240 |
have been trying to get at better and better measures 00:33:21.960 |
the first approximation is the one that gets more views. 00:33:33.680 |
especially if people are clicking on something 00:33:35.720 |
and then immediately realizing that it's not that great 00:33:47.080 |
with the premise that like, you know, in some sense, 00:33:50.080 |
the time that someone spends watching a video 00:33:54.040 |
is related to the value that they get from that video. 00:33:59.200 |
but it has something to say about how much value they get. 00:34:09.040 |
clicking through channels on television late at night 00:34:18.200 |
are you glad that you watched that show on TV last night? 00:34:22.400 |
I'd say, yeah, I wish I would have gone to bed 00:34:24.760 |
or read a book or almost anything else, really. 00:34:27.760 |
And so that's why some people got the idea a few years ago 00:34:35.440 |
And so we get feedback data from those surveys 00:34:40.440 |
and then use that in the machine learning system 00:34:56.640 |
what are the signals from a machine learning perspective 00:35:00.160 |
So you mentioned just clicking on the video views, 00:35:02.880 |
the time watched, maybe the relative time watched, 00:35:15.380 |
And then the one I wasn't actually quite aware of, 00:35:20.680 |
is a survey afterwards, which is a brilliant idea. 00:35:31.880 |
- Well, you mentioned commenting, also sharing the video. 00:35:39.320 |
- Within YouTube or outside of YouTube as well? 00:35:44.680 |
- Yeah, like and dislike, how important is that? 00:36:15.680 |
We would ask some users who didn't subscribe very much, 00:36:33.480 |
like actually it doesn't cost anything, it's free, 00:36:35.680 |
it just helps us know that you are very interested 00:36:44.960 |
and don't really watch any of the videos from those channels 00:36:49.040 |
and we say, "Well, why did you subscribe to this 00:36:52.240 |
"if you weren't really interested in any more videos 00:37:00.120 |
"and I just wanted to kind of give him a high five." 00:37:05.440 |
I actually subscribe to channels where I just, 00:37:11.360 |
I like this person, but then I like this person 00:37:19.520 |
Even though I may never actually want to click 00:37:24.960 |
And it's maybe outside of my interest area and so on, 00:37:29.200 |
which is probably the wrong way to use the subscribe button. 00:37:36.600 |
with all the space of people that see the subscribe button 00:37:46.840 |
"We're not gonna pay attention to what you've done." 00:37:51.800 |
in which all the different people in the world 00:37:53.600 |
use the subscribe button or the like and the dislike button. 00:37:57.720 |
- So in terms of signals of machine learning, 00:38:00.400 |
using for the search and for the recommendation, 00:38:13.560 |
So maybe you can speak to the value of those things 00:38:24.200 |
trying to understand what's happening in the video. 00:38:28.780 |
in the machine learning, computer vision world, 00:38:35.580 |
how much are you playing with that currently? 00:38:38.760 |
of being able to analyze the content of the video itself? 00:38:46.160 |
- Analyzing the content-- - Analyzing the content 00:38:54.340 |
our ability to do it well is still somewhat crude. 00:39:04.620 |
we can probably tell you that people are playing soccer. 00:39:07.420 |
We probably can't tell whether it's Manchester United 00:39:17.680 |
and using them, we can use them in some ways. 00:39:21.160 |
So for instance, we use that kind of information 00:39:24.360 |
to understand and inform these clusters that I talked about. 00:39:28.280 |
And also maybe to add some words like soccer, 00:39:34.240 |
if it doesn't occur in the title or the description, 00:39:43.760 |
is please help us out with the title and the description. 00:40:02.360 |
but if you typed World of Warcraft in search, 00:40:16.680 |
- Being literal on the internet is actually very uncool, 00:40:27.560 |
I mean, there's a humor to just being indirect, 00:40:34.360 |
machine learning algorithms want you to be literal, right? 00:40:38.160 |
You just wanna say what's in the thing, be very, very simple. 00:40:42.660 |
And in some sense, that gets away from wit and humor. 00:40:53.040 |
the content of the description, the actual text 00:40:55.800 |
is one of the best ways for the algorithm to find your video 00:41:03.820 |
And I would go further and say that if you want people, 00:41:26.560 |
well, maybe this isn't, somehow search made a mistake. 00:41:31.400 |
So it's important, not just for the machine learning systems 00:41:37.960 |
They get a clue that it's what they're looking for 00:41:47.560 |
So I think from the algorithm perspective, yes, 00:41:52.280 |
and saw a video with the title simply winning, 00:41:57.280 |
and the thumbnail has like a sad orc or something, 00:42:03.800 |
I think that's much, it gets your curiosity up. 00:42:11.620 |
And then if they could trust that the algorithm 00:42:15.820 |
that this is indeed a World of Warcraft video, 00:42:18.200 |
that would have created the most beautiful experience. 00:42:20.720 |
I think in terms of just the wit and the humor 00:42:23.280 |
and the curiosity that we human beings naturally have. 00:42:26.080 |
But you're saying, I mean, realistically speaking, 00:42:28.600 |
it's really hard for the algorithm to figure out 00:42:52.020 |
I think is what you're really working on and hoping. 00:43:09.660 |
So the way that they're probably gonna do that 00:43:15.380 |
It's like ant colonies, that's how they find stuff. 00:43:17.940 |
So, I mean, what degree for collaborative filtering 00:43:23.100 |
in general is one curious ant, one curious user essential? 00:43:35.220 |
In your sense, how many people are just like watching 00:43:38.520 |
the same thing over and over and over and over? 00:43:44.340 |
and then help the other ant in the ant's colony 00:44:09.320 |
Because our systems rely on kind of a faithful amount 00:44:15.500 |
Like, and there are people who try to trick us, right? 00:44:26.200 |
but they're trying to get that association made 00:44:34.040 |
to that sort of attempt at gaming the systems. 00:44:37.620 |
- So speaking to that, there's a lot of people that, 00:44:42.260 |
I don't like it, but like to want to try to game the system, 00:45:05.580 |
where basically describes that it seems what works 00:45:08.860 |
is to create a high quality video, really good video, 00:45:12.100 |
where people would want to watch it once they click on it, 00:45:17.540 |
to get them to click on it in the first place. 00:45:26.420 |
And you will enjoy my videos once you click on them. 00:45:28.940 |
- So in what sense do you see this kind of click-bait style 00:45:33.940 |
attempt to manipulate, to get people in the door, 00:45:39.940 |
or play with the algorithm, or game the algorithm? 00:45:47.300 |
but even if you were to take the algorithm out of it 00:45:57.180 |
about which one to put at the top or the bottom, 00:46:03.580 |
And I'll tell you the same thing that I told Derek is, 00:46:09.100 |
and they have two kinds of books on them, science books. 00:46:12.480 |
I have my math books from when I was a student, 00:46:21.100 |
They're all yellow, they're all from Springer, 00:46:29.300 |
On the other hand, I have other more pop science type books, 00:46:33.580 |
and they all have very interesting covers, right? 00:46:35.860 |
And they have provocative titles and things like that. 00:46:40.100 |
I mean, I wouldn't say that they're clickbaity, 00:46:48.180 |
but you know, that's just a decision you have to make, 00:46:54.900 |
"Classical Recursion Theory" by Pierrotti-Freddie, 00:46:57.820 |
he was fine with the yellow title and nothing more. 00:47:08.380 |
understand that they need to have a compelling cover 00:47:41.820 |
And so for the users who are offended by that, 00:47:46.460 |
we will then depress or suppress those videos. 00:47:52.060 |
there's also another signal where users can say, 00:47:59.740 |
something like, I don't want to see this video anymore, 00:48:10.580 |
It's like, I don't want to, I don't want this. 00:48:14.740 |
To be like, I don't, that's not, that's not for me. 00:48:23.780 |
Right, we don't want to make a recommendation 00:48:30.140 |
that particular one makes me feel good as a user in general, 00:48:35.300 |
'cause I feel like I'm helping the algorithm. 00:48:37.660 |
My interactions on YouTube don't always feel like 00:48:46.860 |
Elon Musk create a feeling for their customers, 00:48:51.660 |
that they're helping the algorithm of Tesla vehicle. 00:48:59.540 |
that you're helping the algorithm get smarter. 00:49:07.900 |
They're all together creating a beautiful thing. 00:49:15.500 |
This conversation is reminding me of that, but. 00:49:22.260 |
I'm not sure I really thought about it that way, 00:49:27.820 |
- It's an interesting question of personalization 00:49:30.820 |
that I feel like when I click like on a video, 00:49:40.900 |
It would make me personally, people are different, 00:49:43.940 |
if I was helping also the YouTube algorithm broadly 00:49:48.540 |
Like there's a, I don't know if that's human nature, 00:49:55.900 |
You want to help it get smarter and smarter and smarter 00:50:09.420 |
And I'm not sure how many people share that feeling. 00:50:14.060 |
But on that point, how much personalization is there 00:50:22.580 |
So is it kind of all really boiling down to clustering? 00:50:38.740 |
So your experience will be quite a bit different 00:50:43.020 |
from anybody else's who's watching that same video, 00:50:48.180 |
And the reason is is that we found that users 00:50:57.700 |
Sometimes they want to keep watching more on that topic 00:51:07.100 |
and they're ready to move on to something else. 00:51:08.980 |
And so the question is, well, what is the something else? 00:51:12.780 |
And one of the first things one can imagine is, 00:51:16.300 |
well, maybe something else is the latest video 00:51:19.380 |
from some channel to which you've subscribed. 00:51:26.700 |
And even if it's not something that you subscribe to, 00:51:39.780 |
as well as the homepage of course, is quite personalized. 00:51:47.420 |
What does success look like in terms of the algorithm 00:51:49.900 |
creating a great long-term experience for a user? 00:51:53.380 |
Or put another way, if you look at the videos 00:51:59.820 |
how do you know the algorithm succeeded for me? 00:52:06.140 |
and watch more YouTube, then that's one indication 00:52:11.020 |
- So just the number of hours is a powerful indicator. 00:52:20.700 |
So that's probably the most simple indicator. 00:52:29.220 |
There's a lot of other things that they could do. 00:52:32.220 |
But like I said, I mean, ideally we would like everybody 00:52:44.660 |
And so that's why we survey them and ask them, 00:53:05.820 |
So it's hard to imagine that we would actually achieve that. 00:53:21.140 |
but that Ray Dalio has this video on the economic machine. 00:53:26.140 |
I forget what it's called, but it's a 30 minute video. 00:53:38.620 |
of how at least the American economic system works. 00:53:42.900 |
And I was just, I wanted to click on something 00:53:51.100 |
I mean, the views and the likes reflect its quality, 00:53:55.520 |
but I was almost upset that I haven't found it earlier 00:54:02.300 |
that this is the best video I've ever watched. 00:54:08.620 |
the best experience is where every single video, 00:54:15.420 |
that actually helps me grow, helps me enjoy life, 00:54:27.980 |
that's one of the most beautiful and ambitious, 00:54:36.480 |
do you think of how YouTube is changing society 00:54:39.020 |
when you have these millions of people watching videos, 00:54:56.340 |
- Well, I mean, I think openness has had an impact 00:55:14.100 |
who decides before you can upload your video, 00:55:36.760 |
he wouldn't have had this opportunity to reach this audience 00:55:50.180 |
I know that there are people that I work with 00:55:54.980 |
especially from places where literacy is low. 00:55:59.980 |
And they think that YouTube can help in those places 00:56:03.900 |
because you don't need to be able to read and write 00:56:06.860 |
in order to learn something important for your life, 00:56:09.840 |
maybe how to do some job or how to fix something. 00:56:14.840 |
And so that's another way in which I think YouTube 00:56:21.460 |
So I've worked at YouTube for eight, almost nine years now. 00:56:40.820 |
well, what is it that you love about YouTube? 00:56:57.140 |
Is they immediately start talking about some channel 00:56:59.660 |
or some creator or some topic or some community 00:57:03.380 |
that they found on YouTube and that they just love. 00:57:19.100 |
and then everything else kind of gets out of the way. 00:57:27.520 |
What about the connection with just the individual creators 00:57:35.220 |
So like I gave the example of Ray Dalio video 00:57:42.500 |
but there's some people who are just creators 00:57:58.860 |
And then there's that genuineness in their growth. 00:58:01.780 |
So, you know, YouTube clearly wants to help creators 00:58:05.620 |
connect with their audience in this kind of way. 00:58:15.220 |
but the entirety of a creator's life on YouTube? 00:58:18.820 |
- Well, I mean, we're trying to help creators 00:58:21.500 |
find the biggest audience that they can find. 00:58:29.060 |
The reason why creator channel is so important 00:58:52.660 |
and I have no concept of what the next viral video could be, 00:58:57.620 |
and the next day it's some children interrupting a reporter, 00:59:01.260 |
and the next day it's, you know, some other thing happening, 00:59:10.500 |
gosh, I really, you know, would like to see something 00:59:19.700 |
between fans and creators is so important for both, 00:59:24.700 |
because it's a way of sort of fostering a relationship 00:59:38.220 |
and again, a topic that you or nobody has an answer to, 00:59:45.500 |
you know, it gives us highs and it gives us lows 00:59:50.100 |
in the sense that sort of creators often speak 01:00:02.700 |
There's a momentum, there's a huge, excited audience 01:00:40.500 |
Is that something, how do we even think about that? 01:00:42.700 |
- Well, the first thing is we wanna make sure 01:00:52.920 |
to demonstrate that you can absolutely take a break. 01:00:57.860 |
If you are a creator and you've been uploading a lot, 01:01:01.060 |
we have just as many examples of people who took a break 01:01:04.300 |
and came back more popular than they were before 01:01:21.700 |
So in your sense that taking a break is okay. 01:01:34.700 |
of creators coming back very strong and even stronger 01:01:49.420 |
that your channel is gonna go down or lose views. 01:01:55.240 |
We know for sure that this is not a necessary outcome. 01:02:01.940 |
to make sure that they take care of themselves. 01:02:05.540 |
You have to look after yourself and your mental health. 01:02:08.340 |
And I think that it probably, in some of these cases, 01:02:14.980 |
contributes to better videos once they come back, right? 01:02:19.980 |
Because a lot of people, I mean, I know myself, 01:02:26.020 |
even though I can keep working until I pass out. 01:02:34.300 |
may even improve the creative ideas that someone has. 01:02:38.540 |
- Okay, I think it's a really important thing 01:02:51.380 |
Sorry, sorry if that sounds like a short time. 01:02:54.760 |
But even like email, just taking a break from email 01:03:00.660 |
especially when you're going through something psychologically 01:03:04.940 |
or really not sleeping much 'cause of work deadlines, 01:03:11.700 |
- And it was there when you came back, right? 01:03:14.280 |
And it looks different actually when you come back. 01:03:17.820 |
You're sort of brighter eyed with some coffee, everything. 01:03:22.200 |
So it's important to take a break when you need it. 01:03:25.100 |
So you've mentioned kind of the YouTube algorithm 01:03:34.180 |
It's potentially sort of more than a million lines of code. 01:03:49.180 |
on top of patches of heuristics and human experts 01:04:08.880 |
of what is the YouTube algorithm at this point? 01:04:15.760 |
- Well, we don't usually think about it as the algorithm 01:04:24.260 |
The other thing that I think people don't understand 01:04:26.900 |
is that what you might refer to as the YouTube algorithm 01:04:31.900 |
from outside of YouTube is actually a bunch of code 01:04:42.580 |
of all the people who come to YouTube every day. 01:04:44.740 |
- So the people part of the code, essentially. 01:04:47.440 |
Like if there were no people who came to YouTube tomorrow, 01:04:49.780 |
then the algorithm wouldn't work anymore, right? 01:04:56.760 |
well, the algorithm does this, the algorithm does that, 01:05:04.560 |
and the algorithm is mostly just keeping track 01:05:07.480 |
of what the viewers do and then reacting to those things 01:05:18.420 |
that the recommendation system and the search system 01:05:21.380 |
and probably many machine learning systems evolve 01:05:36.780 |
Well, how about the most popular ones, right? 01:05:52.180 |
that can actually learn what to do in different situations 01:05:55.700 |
based on some observations of those situations in the past. 01:05:59.620 |
And you keep chipping away at these heuristics over time. 01:06:03.600 |
And so I think that just like with diversity, 01:06:08.100 |
I think the first diversity measure we took was, 01:06:15.420 |
It's a pretty simple heuristic to encourage diversity, 01:06:20.700 |
Who needs to see four, five, six videos in a row 01:06:50.380 |
that this idea was a good one, this was not so good? 01:06:53.820 |
How do you measure that and across which timescale, 01:07:04.500 |
And so just about every single change we make to YouTube, 01:07:08.840 |
we do it only after we've run a A/B experiment. 01:07:33.500 |
for ultimately does this improve the experience for viewers? 01:07:41.980 |
because we can see certain things go up and down. 01:07:45.020 |
So for instance, if we noticed in the experiment, 01:07:52.460 |
or they're saying that they're more satisfied. 01:07:56.860 |
They're giving more videos five stars after they watch them. 01:08:04.380 |
that it's improving the situation for viewers. 01:08:26.060 |
So we get a sense for what the impact is gonna be, 01:08:30.680 |
but also to the different channels and all of that. 01:08:40.640 |
but if I want to make a viral video, how do I do it? 01:08:47.920 |
I know that we have in the past tried to figure out 01:08:52.480 |
if we could detect when a video was going to go viral. 01:08:57.480 |
And those were, you take the first and second derivatives 01:09:01.040 |
of the view count and maybe use that to do some prediction. 01:09:06.040 |
But I can't say we ever got very good at that. 01:09:10.680 |
Oftentimes we look at where the traffic was coming from. 01:09:19.040 |
then maybe it has a higher chance of becoming viral 01:09:22.680 |
than if it were coming from search or something. 01:09:42.280 |
sort of ahead of time predicting is a really hard task. 01:09:49.920 |
can you sometimes understand why it went viral 01:09:56.440 |
First of all, is it even interesting for YouTube 01:10:05.320 |
- Well, I think people expect that if a video is going viral 01:10:09.880 |
and it's something they would be interested in, 01:10:23.840 |
- Well, I mean, we want to meet people's expectations 01:10:45.720 |
"Why are 96 million black balls on this reservoir?" 01:11:02.960 |
this video and you want a particular video like it? 01:11:06.200 |
- I mean, we can surely see where it was recommended, 01:11:16.400 |
It is the video which helped me discover who Derek is. 01:11:26.440 |
boring MIT Stanford talks in my recommendation 01:11:40.880 |
So I clicked on it and watched the whole thing 01:11:43.440 |
But, and then a lot of people had that experience, 01:11:47.960 |
But they all, of course, watched it and enjoyed it, 01:11:56.160 |
that ultimately people get enjoy after they click on it? 01:12:05.520 |
which is you show it to some people and if they like it, 01:12:09.720 |
can I find some more people who are a little bit like them? 01:12:15.040 |
Let me expand the circle some more, find some more people. 01:12:19.320 |
And you just keep going until you get some feedback 01:12:25.480 |
And so I think that's basically what happened. 01:12:28.240 |
Now, you asked me about how to make a video go viral 01:12:34.240 |
I don't think that if you or I decided to make a video 01:12:39.320 |
about 96 million balls, that it would also go viral. 01:12:42.600 |
It's possible that Derek made like the canonical video 01:13:03.960 |
and then figuring out that more and more people did enjoy it 01:13:16.000 |
That's a, I don't know, the dynamics of psychology, 01:13:19.960 |
And so what do you think about the idea of clipping? 01:13:24.200 |
Like too many people annoyed me into doing it, 01:13:29.720 |
They said it would be very beneficial to add clips 01:13:37.760 |
Like I'm re-uploading a video, like a short clip, 01:13:56.640 |
Or is that just on a long list of amazing features 01:14:00.600 |
- Yeah, I mean, it's not something that I think 01:14:09.360 |
and I think it's actually great for you as a creator. 01:14:13.720 |
If you think about, I mean, let's say the NBA is uploading 01:14:22.120 |
Well, people might search for Warriors versus Rockets 01:14:50.080 |
you wanna make clips and add titles and things like that 01:14:54.480 |
so that they can find it as easily as possible. 01:15:00.320 |
perhaps a distant future when the YouTube algorithm 01:15:04.000 |
figures that out, sort of automatically detects 01:15:08.280 |
the parts of the video that are really interesting, 01:15:14.080 |
and sort of clip them out in this incredibly rich space. 01:15:24.560 |
And there's a huge space of users that would find, 01:15:28.440 |
you know, 30% of those topics are really interesting. 01:15:33.240 |
It's something that's beyond my ability to clip out, right? 01:15:37.520 |
But the algorithm might be able to figure all that out, 01:15:43.480 |
Do you have a, do you think about this kind of thing? 01:15:46.080 |
Do you have a hope, a dream that one day the algorithm 01:15:48.440 |
will be able to do that kind of deep content analysis? 01:15:54.720 |
but it really does depend on understanding the video well. 01:16:17.960 |
You could probably find out where the goals were scored. 01:16:25.040 |
And that might require a human to do some annotation. 01:16:28.080 |
But I think that trying to identify coherent topics 01:16:32.880 |
in a transcript, like the one of our conversation, 01:16:42.440 |
- And I was speaking more to the general problem, 01:16:44.880 |
actually, of being able to do both a soccer match 01:16:49.800 |
sort of almost, my hope was that there exists 01:16:53.520 |
an algorithm that's able to find exciting things in video. 01:16:59.360 |
- So Google now on Google search will help you find 01:17:04.840 |
the segment of the video that you're interested in. 01:17:06.880 |
So if you search for something like how to change 01:17:13.360 |
then if there's a long video about your dishwasher, 01:17:15.600 |
and this is the part where the person shows you 01:17:17.760 |
how to change the filter, then it will highlight that area. 01:17:29.200 |
like what's the difference between showing the full video 01:17:32.640 |
Do you know how it's presented in search results? 01:17:36.200 |
And the other thing I would say is that right now 01:17:45.760 |
- But folks are working on the more automatic version. 01:17:49.880 |
It's interesting, people might not imagine this, 01:18:06.040 |
- And I wish, I know there's privacy concerns, 01:18:15.360 |
which is sort of putting a camera on the users 01:18:19.000 |
To study their, like I did a lot of emotion recognition work 01:18:23.200 |
and so on, to study actual sort of richer signal. 01:18:29.320 |
like VR video to YouTube, and I've done this a few times, 01:18:33.600 |
so I've uploaded myself, it's a horrible idea. 01:18:51.400 |
of the VR experience was, and it's interesting 01:18:54.400 |
'cause that reveals to you what people looked at. 01:19:00.560 |
- In the case of the lecture, it's pretty boring. 01:19:03.000 |
It is what we're expecting, but we did a few funny videos 01:19:10.360 |
In the beginning, they all look at the main person 01:19:15.120 |
It's fascinating, so that's a really strong signal 01:19:21.920 |
I don't know how you get that from people just watching, 01:19:34.080 |
Maybe comment, is there a way to get that signal 01:19:36.160 |
where this was like, this is when their eyes opened up 01:19:39.200 |
and they're like, for me with the Ray Dalio video, 01:19:42.880 |
at first I was like, oh, okay, this is another one of these 01:19:47.880 |
and then you start watching, it's like, okay, 01:19:50.160 |
there's a really crisp, clean, deep explanation 01:19:56.600 |
That moment, is there a way to detect that moment? 01:19:59.760 |
- The only way I can think of is by asking people 01:20:07.160 |
in terms of doing video analysis, deep video analysis. 01:20:21.880 |
- You never know, and the Wright brothers thought 01:20:30.600 |
So what are the biggest challenges, would you say? 01:20:34.840 |
Is it the broad challenge of understanding video, 01:20:38.640 |
understanding natural language, understanding the challenge 01:20:41.400 |
before the entire machine learning community, 01:20:47.800 |
that's even more challenging than understanding 01:20:52.840 |
what's your sense of what the biggest challenge is? 01:21:00.960 |
It's like, you're trying to classify something 01:21:12.160 |
at least from a machine learning perspective, 01:21:38.380 |
And that's just figuring out who's who, right? 01:21:46.500 |
Like, is that an interesting moment, as you said, 01:21:52.860 |
- So, okay, so Yan LeCun, I'm not sure if you're familiar 01:22:02.020 |
what he's referring to as self-supervised learning 01:22:12.140 |
is watching video and predicting the next frame. 01:22:20.820 |
but his thought is, because it's unsupervised, 01:22:31.740 |
you'll be able to learn about the nature of reality, 01:22:34.140 |
the physics, the common sense reasoning required 01:22:36.480 |
by just teaching a system to predict the next frame. 01:22:47.060 |
do you think an algorithm that just watches all of YouTube, 01:23:02.100 |
be able to do common sense reasoning and so on? 01:23:08.100 |
that already watch all the videos on YouTube, right? 01:23:10.780 |
But they're just looking for very specific things, right? 01:23:20.300 |
And I don't know if predicting the next frame 01:23:25.700 |
because I'm not an expert on compression algorithms, 01:23:30.700 |
but I understand that that's kind of what compression, 01:23:34.740 |
is they basically try to predict the next frame 01:23:37.580 |
and then fix up the places where they got it wrong. 01:23:49.900 |
that just being able to predict the next frame 01:23:56.140 |
and even a tiny bit of error on a per frame basis 01:24:04.100 |
the idea of compression is one way to do compression 01:24:07.900 |
is to describe through text what's contained in the video. 01:24:10.340 |
That's the ultimate high level of compression. 01:24:16.580 |
you're trying to maintain the same visual quality 01:24:24.300 |
from a bigger perspective of what compression is, 01:24:39.700 |
of actually understanding what's going on in the scene. 01:24:48.900 |
and maybe the content of what they're saying and so on. 01:24:57.260 |
'cause it's an interesting, compelling notion, 01:25:09.580 |
we have been working on trying to summarize videos 01:25:24.340 |
- So if you were to say the problem is 100% solved 01:25:32.560 |
where are we on that timeline, would you say? 01:25:46.220 |
what does YouTube look like 10, 20, 30 years from now? 01:26:00.580 |
and I watched a tremendous amount of television 01:26:24.420 |
It's more tailored to the things that my kids wanna watch. 01:26:30.820 |
that they would never have found on television. 01:26:39.100 |
that's where we're headed is that people watch YouTube 01:26:42.500 |
kind of in the same way that I watched television 01:26:46.120 |
- So from a search and discovery perspective, 01:26:49.220 |
what are you excited about in the five, 10, 20, 30 years? 01:27:01.900 |
So it's the task of search of typing in the text 01:27:06.460 |
or discovering new videos by the next recommendation. 01:27:09.480 |
I personally am really happy with the experience. 01:27:11.980 |
I continuously, I rarely watch a video that's not awesome 01:27:15.060 |
from my own perspective, but what else is possible? 01:27:26.100 |
is not only very important to YouTube and to creators, 01:27:30.540 |
but I think it will help enrich people's lives 01:27:34.500 |
because there's a lot that I'm still finding out 01:27:37.260 |
is available on YouTube that I didn't even know. 01:27:44.580 |
that I could watch USC football games from the 1970s. 01:27:49.580 |
Like I didn't even know that was possible until last year 01:28:01.060 |
that this stuff was already on YouTube even when I got here. 01:28:10.300 |
we wanna make sure that YouTube finds a way to ensure 01:28:15.300 |
that it's acting responsibly with respect to society 01:28:23.300 |
So we wanna take all of the great things that it does 01:28:44.860 |
Especially with live video, you get to watch events. 01:28:49.540 |
I mean, it's really, it's the way you experience 01:28:56.620 |
So do you see it becoming more than just video? 01:29:00.860 |
Do you see creators creating visual experiences 01:29:07.260 |
but sort of virtual reality and entering that space, 01:29:11.420 |
totally outside what YouTube is thinking about? 01:29:14.060 |
- I mean, I think Google is thinking about virtual reality. 01:29:17.060 |
I don't think about virtual reality too much. 01:29:20.740 |
I know that we would wanna make sure that YouTube is there 01:29:48.620 |
I'm really excited about what YouTube has in store for us. 01:29:52.220 |
It's one of the greatest products I've ever used 01:30:01.380 |
and thank you to our presenting sponsor, Cash App. 01:30:10.020 |
a STEM education nonprofit that inspires hundreds 01:30:12.820 |
of thousands of young minds to become future leaders 01:30:17.380 |
If you enjoy this podcast, subscribe on YouTube, 01:30:26.900 |
And now let me leave you with some words of wisdom 01:30:32.460 |
The real voyage of discovery consists not in seeking