Back to Index

YouTube Algorithm Basics (Cristos Goodrow, VP Engineering at Google) | AI Podcast Clips


Chapters

0:0 YouTube Algorithm Basics
6:25 YouTube recommendation system
8:55 Quality of videos
12:31 Like dislike subscribe
15:41 Content analysis
19:34 Collaborative Filtering
22:42 Game the Algorithm
25:22 Helping the Algorithm
27:35 User Experience
29:2 Value
32:20 YouTube Algorithm
35:7 B Experiments

Transcript

- Maybe the basics of the quote unquote YouTube algorithm. What does the YouTube algorithm look at to make recommendation for what to watch next? And it's from a machine learning perspective, or when you search for a particular term, how does it know what to show you next? 'Cause it seems to, at least for me, do an incredible job of both.

- Well, that's kind of you to say. It didn't used to do a very good job. (laughing) But it's gotten better over the years. Even I observe that it's improved quite a bit. Those are two different situations. Like when you search for something, YouTube uses the best technology we can get from Google to make sure that the YouTube search system finds what someone's looking for.

And of course, the very first things that one thinks about is, okay, well, does the word occur in the title? For instance. But there are much more sophisticated things where we're mostly trying to do some syntactic match or maybe a semantic match based on words that we can add to the document itself.

For instance, maybe is this video watched a lot after this query? That's something that we can observe. And then as a result, make sure that that document would be retrieved for that query. Now, when you talk about what kind of videos would be recommended to watch next, that's something, again, we've been working on for many years.

And probably the first real attempt to do that well was to use collaborative filtering. So you-- - Can you describe what collaborative filtering is? - Sure, it's just, basically what we do is we observe which videos get watched close together by the same person. And if you observe that, and if you can imagine creating a graph where the videos that get watched close together by the most people are sort of very close to one another in this graph, and videos that don't frequently get watched close together by the same person or the same people are far apart, then you end up with this graph that we call the related graph that basically represents videos that are very similar or related in some way.

And what's amazing about that is that it puts all the videos that are in the same language together, for instance. And we didn't even have to think about language. It just does it, right? And it puts all the videos that are about sports together, and it puts most of the music videos together, and it puts all of these sorts of videos together just because that's sort of the way the people using YouTube behave.

- So that already cleans up a lot of the problem. It takes care of the lowest hanging fruit, which happens to be a huge one of just managing these millions of videos. - That's right. I remember a few years ago, I was talking to someone who was trying to propose that we do a research project concerning people who are bilingual.

And this person was making this proposal based on the idea that YouTube could not possibly be good at recommending videos well to people who are bilingual. And so she was telling me about this, and I said, "Well, can you give me an example "of what problem do you think we have on YouTube "with the recommendations?" And so she said, "Well, I'm a researcher in the US, "and when I'm looking for academic topics, "I wanna see them in English." And so she searched for one, found a video, and then looked at the Watch Next suggestions, and they were all in English.

And so she said, "Oh, I see. "YouTube must think that I speak only English." And so she said, "Now, I'm actually originally from Turkey, "and sometimes when I'm cooking, "let's say I wanna make some baklava, "I really like to watch videos that are in Turkish." And so she searched for a video about making the baklava, and then selected it, and it was in Turkish, and the Watch Next recommendations were in Turkish.

And she just couldn't believe how this was possible. And how is it that you know that I speak both these two languages and put all the videos together? And it's just sort of an outcome of this related graph that's created through collaborative filtering. - So for me, one of my huge interests is just human psychology, right?

And that's such a powerful platform on which to utilize human psychology to discover what individual people wanna watch next. But it's also be just fascinating to me. You know, Google Search has ability to look at your own history. And I've done that before, just what I've searched, three years, for many, many years.

And it's a fascinating picture of who I am, actually. And I don't think anyone's ever summarized, I personally would love that, a summary of who I am as a person on the internet, to me. Because I think it reveals, I think it puts a mirror to me or to others, you know, that's actually quite revealing and interesting.

You know, just maybe the number of, it's a joke, but not really, is the number of cat videos I've watched. Or videos of people falling, you know, stuff that's absurd, that kind of stuff. It's really interesting. And of course, it's really good for the machine learning aspect to show, to figure out what to show next.

But it's interesting. Hey, have you just, as a tangent, played around with the idea of giving a map to people? Sort of, as opposed to just using this information to show what's next, showing them, here are the clusters you've loved over the years, kind of thing. - Well, we do provide the history of all the videos that you've watched.

- Yes. - So you can definitely search through that and look through it and search through it to see what it is that you've been watching on YouTube. We have actually, in various times, experimented with this sort of cluster idea, finding ways to demonstrate or show people what topics they've been interested in or what clusters they've watched from.

It's interesting that you bring this up because in some sense, the way the recommendation system of YouTube sees a user is exactly as the history of all the videos they've watched on YouTube. And so you can think of yourself or any user on YouTube as kind of like a DNA strand of all your videos, right?

That sort of represents you. You can also think of it as maybe a vector in the space of all the videos on YouTube. And so, now, once you think of it as a vector in the space of all the videos on YouTube, then you can start to say, okay, well, which other vectors are close to me and to my vector?

And that's one of the ways that we generate some diverse recommendations is because you're like, okay, well, these people seem to be close with respect to the videos they've watched on YouTube, but here's a topic or a video that one of them has watched and enjoyed, but the other one hasn't.

That could be an opportunity to make a good recommendation. - I got to tell you, I mean, I know, I'm gonna ask for things that are impossible, but I would love to cluster them human beings. Like, I would love to know who has similar trajectories as me 'cause you probably would wanna hang out, right?

There's a social aspect there. Like, actually finding, some of the most fascinating people I find on YouTube have like no followers, and I start following them, and they create incredible content. And on that topic, I just love to ask, there's some videos that just blow my mind in terms of quality and depth, and just in every regard are amazing videos, and they have like 57 views.

Okay, how do you get videos of quality to be seen by many eyes? So the measure of quality, is it just something, yeah, how do you know that something is good? - Well, I mean, I think it depends initially on what sort of video we're talking about. So in the realm of, let's say, you mentioned politics and news.

In that realm, quality news or quality journalism relies on having a journalism department, right? Like you have to have actual journalists and fact checkers and people like that. And so in that situation, and in others, maybe science or in medicine, quality has a lot to do with the authoritativeness and the credibility and the expertise of the people who make the video.

Now, if you think about the other end of the spectrum, what is the highest quality prank video? Or what is the highest quality Minecraft video, right? That might be the one that people enjoy watching the most and watch to the end. Or it might be the one that when we ask people the next day after they watched it, were they satisfied with it?

And so we, especially in the realm of entertainment, have been trying to get at better and better measures of quality or satisfaction or enrichment since I came to YouTube. And we started with, well, the first approximation is the one that gets more views. But we both know that things can get a lot of views and not really be that high quality, especially if people are clicking on something and then immediately realizing that it's not that great and abandoning it.

And that's why we moved from views to thinking about the amount of time people spend watching it, with the premise that in some sense, the time that someone spends watching a video is related to the value that they get from that video. It may not be perfectly related, but it has something to say about how much value they get.

But even that's not good enough, right? Because I myself have spent time clicking through channels on television late at night and ended up watching "Under Siege 2" for some reason I don't know. And if you were to ask me the next day, are you glad that you watched that show on TV last night?

I'd say, yeah, I wish I would have gone to bed or read a book or almost anything else, really. And so that's why some people got the idea a few years ago to try to survey users afterwards. And so we get feedback data from those surveys and then use that in the machine learning system to try to not just predict what you're gonna click on right now, what you might watch for a while, but what when we ask you tomorrow, you'll give four or five stars to.

- So just to summarize, what are the signals from a machine learning perspective that a user can provide? So you mentioned just clicking on the video views, the time watched, maybe the relative time watched, the clicking like and dislike on the video, maybe commenting on the video. - All of those things.

- All of those things. And then the one I wasn't actually quite aware of, even though I might've engaged in it, is a survey afterwards, which is a brilliant idea. Is there other signals? I mean, that's already a really rich space of signals to learn from. Is there something else?

- Well, you mentioned commenting, also sharing the video. If you think it's worthy to be shared with someone else you know. - Within YouTube or outside of YouTube as well? - Either. Let's see, you mentioned like, dislike. - Yeah, like and dislike, how important is that? - It's very important, right?

It's predictive of satisfaction. But it's not perfectly predictive. Subscribe, if you subscribe to the channel of the person who made the video, then that also is a piece of information and it signals satisfaction. Although, over the years, we've learned that people have a wide range of attitudes about what it means to subscribe.

We would ask some users who didn't subscribe very much, but they watched a lot from a few channels, we'd say, "Well, why didn't you subscribe?" And they would say, "Well, I can't afford to pay for anything." And we tried to let them understand, like actually it doesn't cost anything, it's free.

It just helps us know that you are very interested in this creator. But then we've asked other people who subscribe to many things and don't really watch any of the videos from those channels. And we say, "Well, why did you subscribe to this if you weren't really interested in any more videos from that channel?" And they might tell us, "Well, I just, I thought the person did a great job and I just wanted to kind of give him a high five." - Yeah.

- Right? And so-- - Yeah, that's where I sit. I actually subscribe to channels where I just, this person is amazing. I like this person. But then I like this person, I really wanna support them. That's how I click subscribe. Even though I may never actually want to click on their videos when they're releasing it, I just love what they're doing.

And it's maybe outside of my interest area and so on, which is probably the wrong way to use the subscribe button. But I just wanna say congrats. This is great work. (both laughing) - Well, I mean-- - So you have to deal with all the space of people that see the subscribe button is totally different.

- That's right. - So we can't just close our eyes and say, "Sorry, you're using it wrong. We're not gonna pay attention to what you've done." We need to embrace all the ways in which all the different people in the world use the subscribe button or the like and the dislike button.

- So in terms of signals of machine learning, using for the search and for the recommendation, you've mentioned titles, so like metadata, like text data that people provide, description and title, and maybe keywords. So maybe you can speak to the value of those things in search and also this incredible fascinating area of the content itself.

So the video content itself, trying to understand what's happening in the video. So YouTube will release a data set that, in the machine learning computer vision world, this is just an exciting space. How much is that currently, how much are you playing with that currently? How much is your hope for the future of being able to analyze the content of the video itself?

- Well, we have been working on that also since I came to YouTube. - Analyzing the content. - Analyzing the content of the video, right? And what I can tell you is that our ability to do it well is still somewhat crude. We can tell if it's a music video, we can tell if it's a sports video, we can probably tell you that people are playing soccer.

We probably can't tell whether it's Manchester United or my daughter's soccer team. So these things are kind of difficult and using them, we can use them in some ways. So for instance, we use that kind of information to understand and inform these clusters that I talked about. And also maybe to add some words like soccer, for instance, to the video if it doesn't occur in the title or the description, which is remarkable that often it doesn't.

One of the things that I ask creators to do is please help us out with the title and the description. For instance, we were a few years ago having a live stream of some competition for World of Warcraft on YouTube. And it was a very important competition, but if you typed World of Warcraft in search, you wouldn't find it.

- World of Warcraft wasn't in the title? - World of Warcraft wasn't in the title. It was match 478, A team versus B team, and World of Warcraft wasn't in the title. Just like, come on, give me-- - Being literal on the internet is actually very uncool, which is the problem.

- Oh, is that right? - Well, I mean, in some sense, well, some of the greatest videos, I mean, there's a humor to just being indirect, being witty and so on, and actually being, machine learning algorithms want you to be literal. You just wanna say what's in the thing, be very, very simple.

And in some sense, that gets away from wit and humor, so you have to play with both. But you're saying that for now, sort of the content of the title, the content of the description, the actual text, is one of the best ways to, for the algorithm to find your video and put them in the right cluster.

- That's right, and I would go further and say that if you want people, human beings, to select your video in search, then it helps to have, let's say, World of Warcraft in the title, because why would a person, if they're looking at a bunch, they type World of Warcraft, and they have a bunch of videos, all of whom say World of Warcraft, except the one that you uploaded, well, even the person is gonna think, well, maybe this isn't, somehow search made a mistake.

This isn't really about World of Warcraft. So it's important, not just for the machine learning systems, but also for the people who might be looking for this sort of thing. They get a clue that it's what they're looking for by seeing that same thing prominently in the title of the video.

- Okay, let me push back on that. So I think from the algorithm perspective, yes, but if they typed in World of Warcraft and saw a video with the title simply winning, and the thumbnail has a sad orc or something, I don't know, right? Like, I think that's much, it gets your curiosity up.

And then if they could trust that the algorithm was smart enough to figure out somehow that this is indeed a World of Warcraft video, that would have created the most beautiful experience. I think in terms of just the wit and the humor and the curiosity that we human beings naturally have.

But you're saying, I mean, realistically speaking, it's really hard for the algorithm to figure out that the content of that video will be a World of Warcraft video. - And you have to accept that some people are gonna skip it. - Yeah. - Right? I mean, and so you're right.

The people who don't skip it and select it are gonna be delighted. But other people might say, yeah, this is not what I was looking for. - And making stuff discoverable, I think is what you're really working on and hoping, so yeah. So from your perspective, put stuff in the title of the scripture.

- And remember, the collaborative filtering part of the system starts by the same user watching videos together, right? So the way that they're probably gonna do that is by searching for them. - That's a fascinating aspect of it. It's like ant colonies. That's how they find stuff. So, I mean, what degree for collaborative filtering in general is one curious ant, one curious user essential?

So just the person who is more willing to click on random videos and sort of explore these cluster spaces. In your sense, how many people are just like watching the same thing over and over and over and over? And how many are just like the explorers that just kind of like click on stuff and then help the other ant in the ant's colony discover the cool stuff?

Do you have a sense of that at all? - I really don't think I have a sense of the relative sizes of those groups, but I would say that, people come to YouTube with some certain amount of intent. And as long as they, to the extent to which they try to satisfy that intent, that certainly helps our systems, right?

Because our systems rely on kind of a faithful amount of behavior, right? And there are people who try to trick us, right? There are people and machines that try to associate videos together that really don't belong together, but they're trying to get that association made because it's profitable for them.

And so we have to always be resilient to that sort of attempt at gaming the systems. - So speaking to that, there's a lot of people that in a positive way, perhaps, I don't know, I don't like it, but like to want to try to game the system, to get more attention.

Everybody, creators in a positive sense want to get attention, right? So how do you work in this space when people create more and more sort of click-baity titles and thumbnails? Sort of very tasking, Derek has made a video where basically describes that it seems what works is to create a high quality video, really good video, where people would want to watch it once they click on it, but have click-baity titles and thumbnails to get them to click on it in the first place.

And he's saying, "I'm embracing this fact and I'm just going to keep doing it. And I hope you forgive me for doing it. And you will enjoy my videos once you click on them." So in what sense do you see this kind of click-bait style attempt to manipulate, to get people in the door, to manipulate the algorithm or play with the algorithm or game the algorithm?

- I think that you can look at it as an attempt to game the algorithm, but even if you were to take the algorithm out of it and just say, "Okay, well, all these videos happen to be lined up, which the algorithm didn't make any decision about which one to put at the top or the bottom, but they're all lined up there.

Which one are the people going to choose?" And I'll tell you the same thing that I told Derek is, you know, I have a bookshelf and they have two kinds of books on them, science books. I have my math books from when I was a student and they all look identical except for the titles on the covers.

They're all yellow, they're all from Springer, and they're every single one of them, the cover is totally the same. - Yes. - Right? - Yeah. - On the other hand, I have other more pop science type books and they all have very interesting covers, right? And they have provocative titles and things like that.

I mean, I wouldn't say that they're click-baity because they are indeed good books. And I don't think that they cross any line, but, you know, that's just a decision you have to make. Right? Like the people who write "Classical Recursion Theory" by Pierotti-Fredi, he was fine with the yellow title and nothing more.

Whereas I think other people who wrote a more popular type book understand that they need to have a compelling cover and a compelling title. And, you know, I don't think there's anything really wrong with that. We do take steps to make sure that there is a line that you don't cross.

And if you go too far, maybe your thumbnail is especially racy or, you know, it's all caps with too many exclamation points. We observe that users are kind of, you know, sometimes offended by that. And so for the users who are offended by that, we will then depress or suppress those videos.

- And which reminds me, there's also another signal where users can say, I don't know if it was recently added, but I really enjoy it. Just saying, something like, I don't want to see this video anymore or something like, like this is a, like there's certain videos that just cut me the wrong way.

Like just jump out at me. It's like, I don't want to, I don't want this. And it feels really good to clean that up. To be like, I don't, that's not, that's not for me. I don't know. I think that might've been recently added, but that's also a really strong signal.

- Yes, absolutely. Right, we don't want to make a recommendation that people are unhappy with. - And that makes me, that particular one makes me feel good as a user in general and as a machine learning person. 'Cause I feel like I'm helping the algorithm. My interactions on YouTube don't always feel like I'm helping the algorithm.

Like I'm not reminded of that fact. Like for example, Tesla and Autopilot and Elon Musk create a feeling for their customers, for people that own Teslas, that they're helping the algorithm of Tesla vehicle. Like they're all like a really proud, they're helping the fleet learn. I think YouTube doesn't always remind people that you're helping the algorithm get smarter.

And for me, I love that idea. Like we're all collaboratively, like Wikipedia gives that sense. They're all together creating a beautiful thing. YouTube doesn't always remind me of that. This conversation is reminding me of that, but. - Well, that's a good tip. We should keep that fact in mind when we design these features.

I'm not sure I really thought about it that way, but that's a very interesting perspective. - It's an interesting question of personalization that I feel like when I click like on a video, I'm just improving my experience. It would be great. It would make me personally, people are different, but make me feel great if I was helping also the YouTube algorithm broadly say something.

You know what I'm saying? Like there's a, I don't know if that's human nature, but you want the products you love, and I certainly love YouTube. You want to help it get smarter and smarter and smarter 'cause there's some kind of coupling between our lives together being better. If YouTube was better than I will, my life will be better.

And there's that kind of reasoning. Not sure what that is. And I'm not sure how many people share that feeling. That could be just a machine learning feeling. But on that point, how much personalization is there in terms of next video recommendations? So is it kind of all really boiling down to clustering?

Like if I'm in your clusters to me and so on, and that kind of thing, or how much is personalized to me the individual completely? - It's very, very personalized. So your experience will be quite a bit different from anybody else's who's watching that same video, at least when they're logged in.

And the reason is is that we found that users often want two different kinds of things when they're watching a video. Sometimes they want to keep watching more on that topic or more in that genre. And other times they just are done and they're ready to move on to something else.

And so the question is, well, what is the something else? And one of the first things one can imagine is, well, maybe something else is the latest video from some channel to which you've subscribed. And that's gonna be very different for you than it is for me, right? And even if it's not something that you subscribe to, it's something that you watch a lot.

And again, that'll be very different on a person by person basis. And so even the watch next, as well as the homepage, of course, is quite personalized. - So what, we mentioned some of the signals, but what does success look like? What does success look like in terms of the algorithm creating a great long-term experience for a user?

Or put another way, if you look at the videos I've watched this month, how do you know the algorithm succeeded for me? - I think, first of all, if you come back and watch more YouTube, then that's one indication that you found some value from it. - So just the number of hours is a powerful indicator.

- Well, I mean, not the hours themselves, but the fact that you return on another day. So that's probably the most simple indicator. People don't come back to things that they don't find value in, right? There's a lot of other things that they could do. But like I said, I mean, ideally we would like everybody to feel that YouTube enriches their lives and that every video they watched is the best one they've ever watched since they've started watching YouTube.

And so that's why we survey them and ask them, like, is this one to five stars? And so our version of success is every time someone takes that survey, they say it's five stars. And if we ask them, is this the best video you've ever seen on YouTube? They say yes, every single time.

So it's hard to imagine that we would actually achieve that. Maybe asymptotically we would get there, but that would be what we think success is. - It's funny, I've recently said somewhere, I don't know, maybe tweeted, but that Ray Dalio has this video on the economic machine. I forget what it's called, but it's a 30 minute video.

And I said, it's the greatest video I've ever watched on YouTube. It's like, I watched the whole thing and my mind was blown. It's a very crisp, clean description of how at least the American economic system works. It's a beautiful video. And I was just, I wanted to click on something to say this is the best thing.

This is the best thing ever, please let me, I can't believe I discovered it. I mean, the views and the likes reflect its quality, but I was almost upset that I haven't found it earlier and wanted to find other things like it. I don't think I've ever felt that this is the best video I've ever watched.

And that was that. And to me, the ultimate utopia, the best experiences were every single video. Where I don't see any of the videos I regret and every single video I watch is one that actually helps me grow, helps me enjoy life, be happy and so on. - Well, so that's a heck of a, that's one of the most beautiful and ambitious, I think, machine learning tasks.

So you've mentioned kind of the YouTube algorithm that isn't E equals MC squared. It's not a single equation. It's potentially sort of more than a million lines of code. Sort of, is it more akin to what autonomous, successful autonomous vehicles today are, which is they're just basically patches on top of patches of heuristics and human experts really tuning the algorithm and have some machine learning modules?

Or is it becoming more and more a giant machine learning system with humans just doing a little bit of tweaking here and there? What's your sense? First of all, do you even have a sense of what is the YouTube algorithm at this point? And however much you do have a sense, what does it look like?

- Well, we don't usually think about it as the algorithm because it's a bunch of systems that work on different services. The other thing that I think people don't understand is that what you might refer to as the YouTube algorithm from outside of YouTube is actually a bunch of code and machine learning systems and heuristics, but that's married with the behavior of all the people who come to YouTube every day.

- So the people part of the code, essentially. - Exactly, right? Like if there were no people who came to YouTube tomorrow, then the algorithm wouldn't work anymore, right? So that's a critical part of the algorithm. And so when people talk about, well, the algorithm does this, the algorithm does that, it's sometimes hard to understand, well, it could be the viewers are doing that and the algorithm is mostly just keeping track of what the viewers do and then reacting to those things in sort of more fine-grained situations.

And I think that this is the way that the recommendation system and the search system and probably many machine learning systems evolve is you start trying to solve a problem and the first way to solve a problem is often with a simple heuristic, right? And you wanna say, what are the videos we're gonna recommend?

Well, how about the most popular ones, right? And that's where you start. And over time, you collect some data and you refine your situation so that you're making less heuristics and you're building a system that can actually learn what to do in different situations based on some observations of those situations in the past.

And you keep chipping away at these heuristics over time. And so I think that just like with diversity, I think the first diversity measure we took was, okay, not more than three videos in a row from the same channel, right? It's a pretty simple heuristic to encourage diversity, but it worked, right?

Who needs to see four, five, six videos in a row from the same channel? And over time, we try to chip away at that and make it more fine-grained and basically have it remove the heuristics in favor of something that can react to individuals and individual situations. - So how do you, you mentioned, you know, we know that something worked.

How do you get a sense when decisions of the kind of A/B testing that this idea was a good one, this was not so good? How do you measure that and across which time scale, across how many users, that kind of thing? - Well, you mentioned that A/B experiments.

And so just about every single change we make to YouTube, we do it only after we've run a A/B experiment. And so in those experiments, which run from one week to months, we measure hundreds, literally hundreds of different variables and measure changes with confidence intervals in all of them.

Because we really are trying to get a sense for ultimately does this improve the experience for viewers? That's the question we're trying to answer. And an experiment is one way because we can see certain things go up and down. So for instance, if we noticed in the experiment, people are dismissing videos less frequently, or they're saying that they're more satisfied, they're giving more videos five stars after they watch them, then those would be indications of that the experiment is successful, that it's improving the situation for viewers.

But we can also look at other things, like we might do user studies where we invite some people in and ask them, like, what do you think about this? What do you think about that? How do you feel about this? And other various kinds of user research. But ultimately, before we launch something, we're gonna wanna run an experiment.

So we get a sense for what the impact is gonna be, not just to the viewers, but also to the different channels and all of that. (silence) (silence) (silence) (silence) (silence) (silence) (silence)