back to index

Gustav Soderstrom: Spotify | Lex Fridman Podcast #29


Chapters

0:0
3:29 Purpose of Music
21:15 Technical Challenge in Reducing Olli
23:35 Video Content
26:4 How Do You Grow a User Base
26:12 How Do You Grow User Base
27:55 The Access Model versus the Ownership Model
34:15 Can Playlist Be Used as Data
36:21 Collaborative Filtering
46:23 Anchor
64:6 Discover Weekly
72:11 Deep Embedding
79:28 Smart Speakers
94:42 Spotify Model
95:10 Business Model
103:35 Vr

Whisper Transcript | Transcript Only Page

00:00:00.000 | The following is a conversation with Gustav Sørenstrøm.
00:00:03.920 | He's the Chief Research and Development Officer at Spotify,
00:00:07.280 | leading their product design, data technology, and engineering teams.
00:00:11.200 | As I've said before, in my research and in life in general,
00:00:15.280 | I love music, listening to it and creating it,
00:00:18.720 | and using technology, especially personalization through machine learning,
00:00:23.600 | to enrich the music discovery and listening experience.
00:00:27.920 | That is what Spotify has been doing for years, continually innovating,
00:00:31.920 | defining how we experience music as a society in a digital age.
00:00:36.080 | That's what Gustav and I talk about among many other topics,
00:00:39.280 | including our shared appreciation of the movie "True Romance,"
00:00:43.280 | in my view, one of the great movies of all time.
00:00:46.160 | This is the Artificial Intelligence Podcast.
00:00:49.360 | If you enjoy it, subscribe on YouTube, give it five stars on iTunes, support on
00:00:53.760 | Patreon, or simply connect with me on Twitter at Lex Friedman, spelled F-R-I-D-M-A-N.
00:01:01.280 | And now, here's my conversation with Gustav Sørenstrøm.
00:01:06.400 | Spotify has over 50 million songs in its catalog, so
00:01:11.200 | let me ask the all-important question. I feel like you're the right person to ask.
00:01:16.320 | What is the definitive greatest song of all time?
00:01:20.960 | It varies for me, personally. So you can't speak definitively for everyone?
00:01:27.600 | I wouldn't believe very much in machine learning,
00:01:30.560 | if I did, right? Because everyone had the same taste.
00:01:34.160 | So for you, what is... you have to pick. What is the song?
00:01:38.320 | All right, so it's pretty easy for me. There is this song called
00:01:42.480 | "You're So Cool" by Hans Zimmer, soundtrack to "True Romance."
00:01:47.360 | It was a movie that made a big impression on me, and it's kind of been
00:01:51.200 | following me through my life. Actually, I had it play at my wedding. I sat with the
00:01:56.560 | organist and helped him play it on an organ, which
00:01:59.200 | was a pretty interesting experience. That is probably my,
00:02:03.760 | I would say, top three movie of all time. Yeah, this is an incredible movie.
00:02:08.560 | And it came out during my formative years, and
00:02:12.160 | as I've discovered in music, you shape your music taste
00:02:15.520 | during those years. So it definitely affected me quite a bit.
00:02:18.560 | Did it affect you in any other kind of way? Well, the movie itself affected me
00:02:23.040 | back then. It was a big part of culture. I didn't really adopt any characters
00:02:27.040 | from the movie, but it was a great story of love,
00:02:31.120 | fantastic actors, and really, I didn't even know who Hans Zimmer was at
00:02:36.080 | the time, but fantastic music. And so
00:02:40.560 | that song has followed me, and the movie actually has followed me throughout my life.
00:02:44.480 | That was Quentin Tarantino, actually, I think, director of
00:02:48.000 | "Produce The Hatter". So it's not "Stairway to Heaven" or "Bohemian
00:02:51.520 | Rhapsody". Those are great. They're not my personal
00:02:54.400 | favorites, but I've realized that people have different tastes, and
00:02:57.920 | that's a big part of what we do. Well, for me, I would have to
00:03:01.440 | stick with "Stairway to Heaven". So, 35,000 years ago, I looked this up on
00:03:08.320 | Wikipedia. Flute-like instruments started being used in caves
00:03:11.760 | as part of hunting rituals, in primitive cultural gatherings, things like that.
00:03:16.240 | This is the birth of music. Since then, we had a few folks, Beethoven,
00:03:21.120 | Elvis, Beatles, Justin Bieber, of course, Drake.
00:03:26.320 | So, in your view, let's start high-level philosophical. What is the
00:03:30.640 | purpose of music on this planet of ours? I think music has many different
00:03:38.240 | purposes. I think there's certainly a big purpose, which is the same as
00:03:44.160 | much of entertainment, which is escapism, and to be able to live in some sort of
00:03:50.960 | other mental state for a while. But I also think you have the opposite
00:03:54.320 | of escaping, which is to help you focus on something you are actually doing.
00:03:57.760 | So, I think people use music as a tool to
00:04:01.440 | tune the brain to the activities that they are actually doing.
00:04:06.720 | And it's kind of like, in one sense, maybe it's the rawest signal. If you
00:04:12.960 | think about the brain as neural networks, it's maybe the most efficient hack we
00:04:16.400 | can do to actually actively tune it into some state that you want to be. You
00:04:20.640 | can do it in other ways. You can tell stories to put people in a certain mood.
00:04:23.840 | But music is probably very effective to get you to a certain mood very fast.
00:04:28.640 | You know, there's a social component historically to music, where
00:04:32.720 | people listen to music together. I was just thinking about this, that
00:04:36.480 | to me, and you mentioned machine learning, but to me
00:04:40.080 | personally, music is a really private thing.
00:04:45.360 | I'm speaking for myself. I listen to music.
00:04:48.640 | Almost nobody knows the kind of things I have in my library,
00:04:52.800 | except people who are really close to me, and they really only know
00:04:56.000 | a certain percentage. There's some weird stuff that I'm almost probably
00:04:59.360 | embarrassed by. It's called the guilty pleasures, right?
00:05:02.480 | Everyone has that. The guilty pleasures, yeah.
00:05:04.560 | Hopefully they're not too bad. For me, it's personal. Do you think of
00:05:09.280 | music as something that's social or as something that's personal?
00:05:14.960 | Or does it vary? I think it's the same answer, that
00:05:21.360 | you use it for both. We've thought a lot about this
00:05:24.960 | during these 10 years at Spotify, obviously. In one sense, as you said, music
00:05:29.120 | is incredibly social. You go to concerts and so forth.
00:05:33.440 | On the other hand, it is your escape, and everyone
00:05:38.400 | has these things that are very personal to them.
00:05:42.080 | What we've found is that when it comes to...
00:05:48.400 | Most people claim that they have a friend or two that they are heavily
00:05:51.360 | inspired by, and that they listen to. I actually think
00:05:54.640 | music is very social, but in a smaller group setting, it's an
00:05:58.960 | intimate relationship. It's not something that you
00:06:04.480 | necessarily share broadly. Now, at concerts, you can argue you do,
00:06:08.000 | but then you've gathered a lot of people that you have something in common with.
00:06:11.920 | I think this broadcast sharing of music is something we
00:06:16.800 | tried on social networks and so forth, but
00:06:20.960 | it turns out that people aren't super interested in
00:06:25.120 | what their friends listen to. They're interested in
00:06:29.840 | understanding if they have something in common, perhaps, with a friend, but not
00:06:33.360 | just as information. Right, that's really
00:06:36.960 | interesting. I was just thinking of it this morning,
00:06:39.840 | listening to Spotify. I really have a pretty intimate
00:06:43.920 | relationship with Spotify, with my playlists. I've had them for
00:06:50.160 | many years now, and they've grown with me together.
00:06:53.840 | There's an intimate relationship you have with a library of music
00:06:58.640 | that you've developed, and we'll talk about different ways we can play with that.
00:07:02.560 | Can you do the impossible task and try to
00:07:05.920 | give a history of music listening from your perspective, from before the
00:07:12.400 | internet and after the internet, and just kind of everything leading up
00:07:16.480 | to streaming with Spotify and so on? I'll try. It could be a 100-year podcast.
00:07:21.920 | I'll try to do a brief version. There are some things that
00:07:25.840 | I think are very interesting during the history of music, which is that
00:07:30.000 | before recorded music, to be able to enjoy music, you actually had to be
00:07:34.240 | where the music was produced, because you couldn't record it
00:07:37.280 | and time shift it. Creation and consumption had to happen at the same
00:07:40.320 | time, basically concerts. So you either had to get to the
00:07:44.720 | nearest village to listen to music, and while that was
00:07:48.400 | cumbersome and it severely limited the distribution of music,
00:07:52.400 | it also had some different qualities, which was that
00:07:55.280 | the creator could always interact with the audience. It was always live.
00:07:59.280 | And also there was no time cap on the music. So I think it's not a coincidence
00:08:03.120 | that these early classical works, they're much longer than
00:08:06.800 | the three minutes. The three minutes came in as a
00:08:10.400 | restriction of the first wax disc that could only contain
00:08:13.520 | a three-minute song on one side, right? So
00:08:17.200 | actually the recorded music severely limited the
00:08:21.440 | or put constraints, I won't say limit, I mean constraints are often good, but it
00:08:24.400 | put very hard constraints on the music
00:08:26.160 | format. So you kind of said like instead of doing these opus
00:08:29.760 | on like many, you know, tens of minutes or something,
00:08:33.120 | now you get three and a half minutes because then you're out of wax on this
00:08:36.000 | disc. But in return, you get an amazing
00:08:39.440 | distribution. Your reach will widen, right?
00:08:41.920 | Just on that point real quick, without the mass
00:08:46.320 | scale distribution, there's a scarcity component
00:08:50.160 | where you kind of look forward to it. We had that, it's like the Netflix
00:08:56.720 | versus HBO Game of Thrones, you like wait for the event because you
00:09:00.960 | can't really listen to it. So you like look forward to it and then
00:09:04.400 | it's, you derive perhaps more pleasure because
00:09:07.680 | it's more rare for you to listen to a particular piece.
00:09:10.560 | You think there's value to that scarcity? Yeah,
00:09:13.600 | I think that that is definitely a thing and there's always this
00:09:16.880 | component of if you have something in infinite amounts, will you value it
00:09:21.120 | as much? Probably not. Humanity is always seeking some,
00:09:26.080 | is relative, so you're always seeking something you didn't have and when you
00:09:28.880 | have it, you don't appreciate it as much. So
00:09:30.720 | I think that's probably true, but I think that's why concerts exist, so you can
00:09:34.640 | actually have both. But I think net, if you couldn't listen
00:09:38.640 | to music in your car driving, that'd be worse,
00:09:42.960 | that cost would be bigger than the benefit of the anticipation I think
00:09:46.560 | that you would have. So yeah, it started with live concerts,
00:09:51.760 | then it's being able to, you know, the phonograph
00:09:56.720 | invented, right? You start to be able to record music.
00:10:00.480 | Exactly, so then you got this massive distribution that made it possible
00:10:04.240 | to create two things, I think. First of all, cultural
00:10:06.880 | phenomenons, they probably need distribution
00:10:09.600 | to be able to happen, but it also opened
00:10:12.800 | access to, you know, for a new kind of artist. So you started to have these
00:10:17.600 | phenomenons like Beatles and Elvis and so forth, that were really
00:10:21.120 | a function of distribution, I think, obviously of talent and innovation, but
00:10:24.800 | there was also a technical component. And of course the next big innovation to
00:10:28.400 | come along was radio, broadcast radio. And I think radio is interesting
00:10:34.480 | because it started not as a music medium, it
00:10:37.200 | started as an information medium for news and
00:10:41.440 | then radio needed to find something to fill the time with so that they could
00:10:45.360 | honestly play more ads and make more money, and music was
00:10:48.800 | free. So then you had this massive distribution where you could program to
00:10:52.720 | people. I think those things, that ecosystem,
00:10:56.000 | is what created the ability for hits. But it was also a very broadcast
00:11:01.760 | medium, so you would tend to get these massive,
00:11:04.400 | massive hits, but maybe not such a long tail.
00:11:08.400 | In terms of choice, of everybody listening to the same stuff.
00:11:11.520 | Yeah, and as you said, I think there are some social benefits to that.
00:11:15.760 | I think, for example, there's a high statistical chance that if I talk
00:11:19.680 | about the latest episode of Game of Thrones, we have something to talk about
00:11:22.800 | just statistically. In the age of individual choice, maybe some of that
00:11:26.320 | goes away. So I do see the value of
00:11:32.320 | shared cultural components, but I also obviously love personalization.
00:11:37.440 | So let's catch this up to the internet. So
00:11:40.800 | maybe Napster, well first of all, there's like mp3s,
00:11:44.000 | there's like tape, CDs. There was a digitalization of music with a CD, really. It was
00:11:48.480 | physical distribution, but the music became
00:11:50.720 | digital. And so they were files, but basically boxed software,
00:11:55.440 | to use a software analogy. And then you could start downloading these files.
00:12:00.880 | And I think there are two interesting things that happened. Back to
00:12:04.240 | music used to be longer before it was constrained by the distribution medium.
00:12:08.960 | I don't think that was a coincidence. And then really the only music genre to have
00:12:13.200 | developed mostly after music was a file again on the
00:12:16.720 | internet is EDM. And EDM is often much longer than the
00:12:20.080 | traditional music. I think it's interesting to think
00:12:23.600 | about the fact that music is no longer constrained in
00:12:26.960 | minutes per song or something. It's a legacy of an old distribution
00:12:31.360 | technology. And you see some of this new music that
00:12:33.840 | breaks the format. Not so much as I would have expected actually by now,
00:12:37.680 | but it still happens. So first of all, I don't really know what EDM is.
00:12:42.080 | Electronic dance music. You could say Avicii
00:12:45.520 | was one of the biggest in this genre. So the main constraint is of time. Something
00:12:51.040 | that has three, four, five minutes on. So you could have songs that were eight
00:12:54.960 | minutes, ten minutes and so forth. Because it started as a digital
00:12:59.840 | product that you downloaded. So you didn't have
00:13:02.560 | this constraint anymore. So I think it's something really
00:13:06.400 | interesting that I don't think has fully happened yet.
00:13:09.280 | We're kind of jumping ahead a little bit to where we are. But I think there's
00:13:12.960 | tons of formal innovation in music that should happen now. That couldn't
00:13:18.720 | happen when you needed to really adhere to the distribution constraints.
00:13:22.080 | If you didn't adhere to that, you would get no distribution.
00:13:25.280 | So Björk for example, the Icelandic artist,
00:13:29.280 | she made a full iPad app as an album. That was very expensive.
00:13:33.920 | Even though the app still has great distribution,
00:13:37.120 | she gets nowhere near the distribution versus staying within the three minute
00:13:40.480 | format. So I think now that music is fully
00:13:44.080 | digital inside these streaming services, there is
00:13:46.800 | the opportunity to change the format again and allow creators to be much more
00:13:51.520 | creative without limiting their distribution ability.
00:13:55.120 | That's interesting that you're right. It's surprising that we don't see that
00:14:00.000 | taking advantage more often. It's almost like the constraints of the
00:14:04.320 | distribution from the 50s and 60s have molded the culture to where we
00:14:09.520 | want the three to five minutes on than anything else.
00:14:14.240 | So we want the song as consumers and as artists.
00:14:18.720 | Because I write a lot of music and I never even thought about writing
00:14:22.400 | something longer than 10 minutes.
00:14:26.400 | It's really interesting that those constraints. Because all your
00:14:29.600 | training data has been three and a half minute songs.
00:14:31.920 | It's right. So yeah, digitization of data
00:14:36.320 | led to then MP3s. Yeah, so I think you had this file then
00:14:41.440 | that was distributed physically. But then you had the components of digital
00:14:45.840 | distribution. And then the internet happened.
00:14:48.400 | And there was this vacuum where you had a format that could be digitally shipped,
00:14:52.560 | but there was no business model. And then all these pirate networks
00:14:57.760 | happened. Napster and in Sweden, Pirate Bay,
00:15:01.600 | which was one of the biggest. And I think from a consumer
00:15:06.640 | point of view, which kind of leads up to the inception of
00:15:10.160 | Spotify from a consumer point of view, consumers for the first time had this
00:15:15.120 | access model to music where they could, without kind of any
00:15:20.160 | marginal cost, they could try different tracks.
00:15:25.760 | You could use music in new ways. There was no marginal cost.
00:15:28.880 | And that was a fantastic consumer experience to have access to all the
00:15:31.680 | music ever made. I think was fantastic. But it was also horrible for artists
00:15:36.400 | because there was no business model around it. So they didn't make any money.
00:15:39.680 | So the user need almost drove the user interface before there was a
00:15:45.280 | business model. And then there were these download
00:15:47.760 | stores that allowed you to download files, which was a solution, but it didn't
00:15:53.600 | solve the access problem. There was still a marginal cost of 99
00:15:56.880 | cents to try one more track. And I think that that heavily limits how
00:16:00.720 | you listen to music. The example I always give is
00:16:05.040 | in Spotify, a huge amount of people listen to music while they sleep, while
00:16:08.960 | they go to sleep and while they sleep. If that costed you 99 cents per three
00:16:13.200 | minutes, you probably wouldn't do that. And you would be much less
00:16:16.480 | adventurous if there was a real dollar cost to exploring music.
00:16:19.280 | So the access model is interesting in that it changes your music behavior.
00:16:22.880 | You can be, you can take much more risk because there's no marginal cost to it.
00:16:28.240 | Maybe let me linger on piracy for a second because I find,
00:16:31.680 | especially coming from Russia, piracy is something that's very interesting.
00:16:36.720 | To me,
00:16:39.120 | not me, of course, ever, but I have friends who've partook in piracy
00:16:47.040 | of music, software, TV shows, sporting events. And usually to me what
00:16:54.880 | that shows is not that they can actually pay the
00:16:58.800 | money and they're not trying to save money.
00:17:01.920 | They're choosing the best experience. So what to me piracy shows is a business
00:17:08.160 | opportunity in all these domains. And that's where I think you're right.
00:17:12.480 | Spotify stepped in, is basically piracy was an experience. You can
00:17:17.760 | explore, find music you like, and actually the interface of piracy is
00:17:23.920 | horrible because it's, I mean, it's bad metadata.
00:17:28.000 | Yeah, bad metadata, long download times, all kinds of stuff.
00:17:31.040 | And what Spotify does is basically first rewards artists and second
00:17:37.520 | makes the experience of exploring music much better. I mean, the same is true,
00:17:41.920 | I think, for movies and so on. Piracy reveals,
00:17:45.600 | in the software space, for example, I'm a huge user and fan of Adobe products
00:17:50.560 | and there was much more incentive to pirate Adobe products
00:17:55.360 | before they went to a monthly subscription plan.
00:17:58.400 | And now all of the said friends that used to pirate Adobe products that I
00:18:04.400 | know now actually pay gladly for the
00:18:07.280 | monthly subscription. I think you're right. I think it's a sign
00:18:10.720 | of an opportunity for product development
00:18:12.960 | and that sometimes there's a product market fit
00:18:17.760 | before there's a business model fit in product development. I think that's
00:18:22.640 | a sign of it. In Sweden, I think it was a bit of both.
00:18:25.840 | There was a culture where we even had a political party called
00:18:30.880 | the Pirate Party and this was during the time when
00:18:34.240 | people said that information should be free. It was somehow wrong to
00:18:38.000 | charge for ones and zeros. So I think people
00:18:41.600 | felt that artists should probably make money somehow else
00:18:45.280 | and concerts or something. So at least in Sweden, it was part
00:18:48.640 | really social acceptance, even at the political level.
00:18:51.680 | But that also forced Spotify to compete with
00:18:55.040 | with free, which I don't think would actually could have happened
00:18:59.360 | anywhere else in the world. The music industry needed to be
00:19:02.480 | doing bad enough to take that risk and Sweden was like the perfect testing
00:19:06.480 | ground. It had government funded high bandwidth, low
00:19:10.000 | latency broadband, which meant that the product would work
00:19:13.440 | and it was also there was no music revenue anyway. So they were kind of
00:19:16.880 | like, I don't think this is going to work but
00:19:19.120 | why not? So this product is one that I don't think could have happened in
00:19:23.200 | America, the world's largest music market, for example.
00:19:25.840 | So how do you compete with free? Because that's an interesting world
00:19:29.520 | of the internet where most people don't like to pay for things.
00:19:34.400 | So Spotify steps in and tries to, yes, compete with free.
00:19:39.040 | How do you do it? So I think two things. One is
00:19:42.320 | people are starting to pay for things on the internet. I think
00:19:45.760 | one way to think about it was that advertising was the first
00:19:49.680 | business model because no one would put a credit card on internet. Transactional
00:19:52.960 | with Amazon was the second and maybe subscription is the third
00:19:56.080 | and if you look offline, subscription is the biggest of those.
00:19:59.600 | So that may still happen. I think people are starting to pay but definitely back
00:20:02.720 | then we needed to compete with free and the first thing you need to do is
00:20:06.400 | obviously to lower the price to free and then you need to be better somehow
00:20:12.240 | and the way that Spotify was better was on the user experience, on the
00:20:16.160 | actual performance, the latency of, you know, even if you had
00:20:22.880 | high bandwidth broadband, it would still take you 30 seconds to a minute to
00:20:29.120 | download one of these tracks. So the Spotify experience of starting
00:20:32.320 | within the perceptual limit of immediacy, about 250 milliseconds,
00:20:36.480 | meant that the whole trick was it felt as if you had downloaded all of PirateBay.
00:20:41.680 | It was on your hard drive. It was that fast even though it wasn't
00:20:45.280 | and it was still free but somehow you were actually still
00:20:49.040 | being a legal citizen. That was the trick that Spotify managed to
00:20:53.600 | to pull off. So I've actually heard you
00:20:56.880 | say this or write this and I was surprised that I wasn't aware of it
00:21:00.560 | because I just took it for granted. You know, whenever an awesome thing
00:21:03.760 | comes along you're just like, "Oh, of course it has to be this way.
00:21:07.360 | That's exactly right." That it felt like the entire world's libraries at my
00:21:11.120 | fingertips because of that latency being reduced.
00:21:15.440 | What was the technical challenge in reducing the latency?
00:21:18.640 | So there was a group of really, really talented engineers.
00:21:23.360 | One of them called Ludwig Stregius. He wrote the...
00:21:26.640 | actually from Gothenburg. He wrote the initial...
00:21:30.720 | the uTorrent client, which is kind of an interesting backstory to Spotify.
00:21:34.320 | You know, that we have one of the top developers from
00:21:38.000 | BitTorrent clients as well. So he wrote uTorrent, the world's smallest
00:21:41.600 | BitTorrent client. And then he was acquired very early by
00:21:47.520 | Daniel and Martin, who founded Spotify. And they actually sold the uTorrent
00:21:51.920 | client to BitTorrent but kept Ludwig. So Spotify had a lot of experience
00:21:57.360 | within peer-to-peer networking. So the original
00:22:01.600 | innovation was a distribution innovation, where Spotify built an
00:22:05.440 | end-to-end media distribution system up until only a few years ago. We actually
00:22:08.800 | hosted all the music ourselves. So we had both the server side and the
00:22:12.000 | client and that meant that we could do things such as having
00:22:15.280 | a peer-to-peer solution to use local caching
00:22:18.560 | on the client side, because back then the world was mostly desktop.
00:22:22.000 | But we could also do things like hack the TCP protocols,
00:22:25.600 | things like Nagel's algorithm for kind of exponential back-off
00:22:29.440 | or ramp up and just go full throttle and optimize for latency
00:22:33.520 | at the cost of bandwidth. And all of this end-to-end control meant that we
00:22:38.640 | could do an experience that felt like a step
00:22:41.520 | change. These days we actually are on GCP. We don't host our own
00:22:47.280 | stuff and everyone is really fast these days. So that was the initial
00:22:50.240 | competitive advantage. But then obviously you have to move on over time.
00:22:53.440 | And that was over 10 years ago, right? That was in 2008. The product
00:22:58.160 | was launched in Sweden. It was in a beta, I think, 2007. And it was on the desktop,
00:23:02.400 | right? So it was desktop only. There's no phone.
00:23:05.120 | There was no phone. The iPhone came out in 2008,
00:23:09.440 | but the App Store came out one year later, I think. So the writing was on the
00:23:13.120 | wall, but there was no phone yet. You've mentioned that people would
00:23:18.560 | use Spotify to discover the songs they like and then they would
00:23:21.520 | torrent those songs so they can copy it to their phone.
00:23:26.400 | Just hilarious. Exactly. Not torrent, pirate.
00:23:30.400 | Seriously, piracy does seem to be like a good guide for business models.
00:23:36.480 | Video content. As far as I know, Spotify doesn't have video content.
00:23:40.560 | Well, we do have music videos and we do have videos on the
00:23:44.480 | service, but the way we think about ourselves is that we're an audio
00:23:48.560 | service and we think that if you look at the amount of
00:23:52.480 | time that people spend on audio, it's actually very similar to the amount of
00:23:56.320 | time that people spend on video. So the opportunity should be equally
00:24:01.040 | big, but today it's not at all valued. Video is valued much higher. So we
00:24:05.280 | think it's basically completely undervalued. We think of
00:24:08.480 | ourselves as an audio service, but within that audio service, I think
00:24:12.160 | video can make a lot of sense. I think for
00:24:14.880 | when you're discovering an artist, you probably do want to see them
00:24:18.320 | and understand who they are, to understand their identity.
00:24:20.880 | You won't see that video every time. No, 90% of the time the phone is going to be
00:24:23.920 | in your pocket. For podcasters, you use video. I think
00:24:27.520 | that can make a ton of sense. So we do have video, but we're an audio
00:24:30.240 | service where, think of it as we call it internally
00:24:33.600 | backgroundable video. Video that is helpful, but isn't
00:24:37.360 | the driver of the narrative. I think also if we look at
00:24:42.560 | YouTube, the way people, there's quite a few folks who
00:24:46.160 | listen to music on YouTube. So in some sense, YouTube is a bit of a competitor
00:24:51.360 | to Spotify, which is very strange to me that people use YouTube to listen
00:24:56.800 | to music. They play essentially the music videos,
00:25:00.000 | right, but don't watch the videos and put it in their pocket.
00:25:03.360 | Well, I think it's similar to what, strangely, maybe it's similar to
00:25:10.640 | what we were for the piracy networks, where
00:25:14.320 | YouTube, for historical reasons, have a lot of music videos.
00:25:20.720 | So people use YouTube for a lot of the discovery part of the process, I
00:25:24.480 | think. But then it's not a really good sort of
00:25:27.280 | "MP3 player" because it doesn't even background. Then you have to keep
00:25:30.640 | the app in the foreground. So it's not a good consumption tool,
00:25:34.480 | but it's a decently good discovery tool. I mean, I think YouTube is a fantastic
00:25:37.520 | product and I use it for all kinds of purposes.
00:25:40.320 | That's true. If I were to admit something, I do use YouTube a little bit
00:25:44.160 | for the discovery, to assist in the discovery process of songs.
00:25:47.280 | And then if I like it, I'll add it to Spotify.
00:25:51.040 | But that's OK. That's OK with us. OK, so sorry, we're jumping around a little bit.
00:25:55.600 | So this kind of incredible, you look at
00:25:59.040 | Napster, you look at the early days of Spotify.
00:26:02.480 | How do you, one fascinating point is, how do you grow a user base?
00:26:06.640 | So you're there in Sweden, you have an idea.
00:26:10.400 | I saw the initial sketches that look terrible.
00:26:14.240 | How do you grow a user base from a few folks to
00:26:17.760 | millions? I think there are a bunch of tactical answers.
00:26:22.240 | So first of all, I think you need a great product. I don't think you take a bad
00:26:25.760 | product and market it to be successful.
00:26:30.080 | So you need a great product. But sorry to interrupt, but it's a totally new way to
00:26:33.760 | listen to music, too. So it's not just... Did people realize immediately that
00:26:37.280 | Spotify is a great product? I think they did. So back to the point of
00:26:41.280 | piracy, it was a totally new way to listen to music legally.
00:26:45.760 | But people had been used to the access model in Sweden
00:26:48.960 | and the rest of the world for a long time through piracy. So one way to think
00:26:51.520 | about Spotify, it was just legal and fast piracy.
00:26:54.720 | And so people have been using it for a long time. So they weren't alien to it.
00:26:59.040 | They didn't really understand how it could be legal because it would seem too
00:27:02.240 | fast and too good to be true. Which I think is a great product
00:27:05.040 | proposition if you can be too good to be true.
00:27:08.080 | But what I saw again and again was people showing each other, clicking the
00:27:11.440 | song, showing how fast it started and saying, "Can you believe this?"
00:27:14.080 | So I really think it was about speed. Then we also had an invite
00:27:20.320 | program that was really meant for scaling because we hosted our own
00:27:24.240 | servers. We needed to control scaling. But that built a lot of expectation and
00:27:29.920 | I don't want to say hype because hype implies that it was
00:27:33.280 | that it wasn't true. Excitement around the product. And we've
00:27:38.320 | replicated that when we launched in the US.
00:27:40.880 | We also built up an invite-only program first. So lots of tactics.
00:27:44.960 | But I think you need a great product that solves some problem.
00:27:48.640 | And basically the key innovation, there was technology, but on a metal
00:27:54.400 | level, the innovation was really the access model versus the ownership model.
00:27:58.000 | And that was tricky. A lot of people said that they
00:28:02.400 | wanted to own their music. They would never kind of rent it or
00:28:06.720 | borrow it. But I think the fact that we had a free
00:28:08.880 | tier, which meant that you get to keep this music for life as well,
00:28:13.280 | helped quite a lot. So this is an interesting psychological point
00:28:17.120 | that maybe you can speak to. It was a big shift for me.
00:28:20.880 | It's almost like I had to go to therapy for this.
00:28:25.280 | I think I would describe my early listening experience, and I think a lot
00:28:30.000 | of my friends do, is basically hoarding music. It's you're
00:28:33.680 | like slowly, one song by one song or maybe albums, gathering a collection
00:28:38.720 | of music that you love. And you own it. It's like often,
00:28:42.880 | especially with CDs or tape, you like physically had it.
00:28:46.720 | And what Spotify, what I had to come to grips with, it was kind of
00:28:50.880 | liberating actually, is to throw away all the music.
00:28:55.600 | I've had this therapy session with lots of people.
00:28:59.040 | And I think the mental trick is, so actually we've seen the user data when
00:29:03.120 | Spotify started, a lot of people did the exact same thing. They started hoarding
00:29:07.040 | as if the music would disappear, right? Almost the equivalent of downloading.
00:29:11.440 | And so, you know, we had these playlists that had limits of like
00:29:15.520 | a few hundred thousand tracks, and we figured no one will ever. Well, they do.
00:29:19.120 | Hundreds and hundreds and hundreds of thousands of tracks. And to this day,
00:29:22.800 | you know, some people want to actually save, quote unquote, and play the entire
00:29:26.960 | catalog. But I think that the therapy session goes
00:29:30.080 | something like, instead of throwing away your music,
00:29:35.120 | if you took your files and you stored them in a locker
00:29:38.160 | at Google, it'd be a streaming service. It's just that in that locker, you have
00:29:42.320 | all the world's music now for free. So instead of giving away your music, you
00:29:45.280 | got all the music. It's yours. You could think of it as
00:29:48.480 | having a copy of the world's catalog there forever. So you actually got
00:29:52.080 | more music instead of less. It's just that you just took that hard
00:29:56.800 | disk and you sent it to someone who stored it for you. And once
00:30:00.480 | you go through that mental journey of like, still my files, they're just over
00:30:03.360 | there, and I just have 40 million of them, 50
00:30:05.440 | million of them or something now. Then people are like, okay, that's good.
00:30:09.040 | The problem is, I think, because you paid us a subscription,
00:30:13.280 | if we hadn't had the free tier where you would feel like, even if I don't want to
00:30:16.160 | pay anymore, I still get to keep them. You keep your
00:30:18.960 | playlist forever. They don't disappear even though you stop paying.
00:30:21.600 | I think that was really important. If we would have started as,
00:30:25.520 | you know, you can put in all this time, but if you stop paying, you lose all your
00:30:28.560 | work. I think that would have been a big
00:30:30.560 | challenge and was the big challenge for a lot of our competitors. That's another
00:30:33.920 | reason why I think the free tier is really important. That people need to
00:30:37.280 | feel the security that the work they put in, it will never disappear, even if they
00:30:40.800 | decide not to pay. I like it how you put the work you put in.
00:30:44.560 | I actually stopped even thinking of it that way. I just,
00:30:46.800 | actually Spotify taught me to just enjoy music.
00:30:49.920 | That's great. As opposed to what I was doing before, which is like
00:30:55.280 | in an unhealthy way, hoarding music. Which I found that because I was doing
00:31:00.720 | that, I was listening to a small selection of
00:31:03.760 | songs way too much to where I was getting sick of them.
00:31:07.520 | Whereas Spotify, the more liberating kind of approach is I was just enjoying.
00:31:11.680 | Of course, I listened to "Stairway to Heaven" over and over, but
00:31:14.800 | because of the extra variety, I don't get as sick of them.
00:31:18.960 | There's an interesting statistic I saw. So,
00:31:22.400 | Spotify has, maybe you can correct me, but over 50 million songs,
00:31:26.160 | tracks and over 3 billion playlists. So, 50 million songs and 3 billion
00:31:34.640 | playlists. 60 times more playlists. What do you make of that?
00:31:39.840 | Yeah, so the way I think about it is that
00:31:43.600 | from a statistician or machine learning point of view,
00:31:48.320 | you have all these, if you want to think about reinforcement learning, you
00:31:52.080 | have this state space of all the tracks and you can
00:31:54.560 | take different journeys through this world.
00:31:58.000 | I think of these as like people helping themselves and each other
00:32:05.200 | creating interesting vectors through this space of tracks.
00:32:08.720 | Then it's not so surprising that across many tens of millions of
00:32:12.960 | atomic units, there will be billions of paths
00:32:16.160 | that make sense. We're probably pretty quite far away from
00:32:20.400 | having found all of them. So, kind of our job now
00:32:23.680 | is users, when Spotify started, it was really
00:32:27.280 | a search box that was for the time pretty powerful. Then
00:32:30.960 | I like to refer to this programming language called playlisting,
00:32:34.400 | where if you, as you probably were pretty good at music,
00:32:37.440 | you knew your new releases, you knew your back catalog, you knew your "Starry Way
00:32:40.320 | to Heaven", you could create a soundtrack for
00:32:42.320 | yourself using this playlisting tool that's like meta programming language for
00:32:45.280 | music to soundtrack your life. People who were
00:32:48.800 | good at music, it's back to how do you scale the product.
00:32:51.520 | For people who are good at music, that wasn't actually enough. If you had the
00:32:54.880 | catalog and a good search tool, you can create your own sessions, you
00:32:57.840 | could create really good a soundtrack for your entire life.
00:33:01.760 | Probably perfectly personalized because you did it yourself.
00:33:05.280 | But the problem was most people, many people aren't that good at music, they
00:33:08.320 | just can't spend the time. Even if you're very good at music, it's
00:33:10.960 | gonna be hard to to keep up. So what we did to try to scale this was to
00:33:16.560 | essentially try to build, you can think of them as agents, that
00:33:20.000 | this friend that some people had that helped them navigate this music
00:33:23.760 | catalog, that's what we're trying to do for you.
00:33:26.160 | But also there is something like 200 million active users on Spotify.
00:33:35.040 | So there, okay, so from the machine learning perspective,
00:33:39.760 | you have these 200 million people plus, they're creating, it's really
00:33:46.400 | interesting to think of playlists as, I mean, I don't know if you meant it
00:33:52.880 | that way, but it's almost like a programming language. It's
00:33:56.320 | or at least a trace of exploration of those individual agents,
00:34:01.760 | the listeners. And you have all this new tracks coming in. So it's a
00:34:07.520 | fascinating space that is ripe for machine learning.
00:34:12.720 | So is there, is it possible, how can playlists be used as data
00:34:19.120 | in terms of machine learning and to help Spotify organize the music?
00:34:25.120 | So we found in our data, not surprising, that people who playlisted a lot,
00:34:31.200 | they retained much better, they had a great experience. And so our first
00:34:34.560 | attempt was to playlist for users. And so we acquired
00:34:38.320 | this company called Tunigo of editors and professional playlisters
00:34:42.880 | and kind of leveraged the maximum of human intelligence
00:34:47.120 | to help build kind of these vectors
00:34:50.560 | through the track space for people. And that broadened the product.
00:34:55.920 | Then the obvious next, and we used statistical means
00:34:59.440 | where they could see when they created a playlist, how did that playlist
00:35:03.120 | perform? They could see skips of the songs, they could see how the
00:35:05.920 | songs perform, and they manually iterated the playlist to maximize
00:35:09.680 | performance for a large group of people. But there
00:35:12.800 | were never enough editors to playlist for you personally. So the promise of
00:35:16.720 | machine learning was to go from kind of group personalization
00:35:19.760 | using editors and tools and statistics to individualization. And then what's so
00:35:25.520 | interesting about the three billion playlists we have is,
00:35:29.360 | we ended, the truth is we lucked out. This was not a priority strategy, as is
00:35:33.760 | often the case. It looks really smart in hindsight, but
00:35:36.720 | it was dumb luck. We looked at these playlists and
00:35:41.840 | we had some people in the company, a person named Eric Bernadson,
00:35:45.520 | who was really good at machine learning already back then, in like 2007,
00:35:49.520 | 2008. Back then it was mostly collaborative
00:35:52.560 | filtering and so forth. But we realized that what this is, is
00:35:58.320 | people are grouping tracks for themselves that have some semantic
00:36:01.440 | meaning to them. And then they actually label it with a
00:36:05.200 | playlist name as well. So in a sense, people were grouping
00:36:08.800 | tracks along semantic dimensions and labeling them.
00:36:12.080 | And so could you use that information to find that
00:36:15.840 | latent embedding? And so we started playing around with
00:36:21.760 | collaborative filtering and we saw tremendous success with it.
00:36:27.200 | Basically trying to extract some of these
00:36:30.240 | dimensions. And if you think about it, it's not surprising at all.
00:36:33.760 | It'd be quite surprising if playlists were actually random, if they had no
00:36:38.240 | semantic meaning. For most people, they group these
00:36:41.040 | tracks for some reason. So we just happened across this
00:36:44.640 | incredible data set where people are taking
00:36:46.960 | these tens of millions of tracks and grouped them along
00:36:50.240 | different semantic vectors. And the semantics being outside the
00:36:54.640 | individual users, so it's some kind of universal.
00:36:57.360 | There's a universal embedding that holds across
00:37:01.280 | people on this earth. Yes, I do think that
00:37:05.120 | the embeddings you find are going to be reflective of the people who playlisted.
00:37:08.640 | So if you have a lot of indie lovers who playlist,
00:37:12.000 | your embed is going to perform better there. But what we found was that,
00:37:16.160 | yes, there were these latent similarities.
00:37:20.560 | They were very powerful. And we had, it was interesting because
00:37:25.600 | I think that the people who playlisted the most initially
00:37:28.800 | were the so-called music aficionados who were really into music. And they often
00:37:33.600 | had a certain, their taste was often
00:37:37.520 | geared towards a certain type of music. And so what surprised us, if you look at
00:37:41.760 | the problem from the outside, you might expect that the algorithms
00:37:46.320 | would start performing best with mainstreamers first because
00:37:49.120 | it somehow feels like an easier problem to solve mainstream taste
00:37:52.400 | than really particular taste. It was the complete opposite for us.
00:37:56.240 | The recommendations performed fantastically for people who saw
00:37:58.960 | themselves as having very unique taste. That's probably
00:38:02.720 | because all of them playlisted and they didn't perform so well for
00:38:05.920 | mainstreamers. They actually thought they were a bit too
00:38:08.400 | particular and unorthodox. So we had the complete
00:38:12.000 | opposite of what we expected. Success within the hardest problem first
00:38:15.440 | and then had to try to scale to more mainstream recommendations.
00:38:19.040 | So you've also acquired EchoNest that analyzes song data.
00:38:25.840 | So in your view, maybe you can talk about, so what kind of data is there from a
00:38:31.680 | machine learning perspective? There's a huge amount, we're
00:38:35.920 | talking about playlisting and just user data of what people are
00:38:40.000 | listening to, the playlist they're constructing
00:38:43.280 | and so on. And then there's the actual data within a song.
00:38:48.080 | What makes a song, I don't know, the actual
00:38:51.200 | waveforms. How do you mix the two? How much value is there in each? To me
00:38:57.680 | it seems like user data is a romantic notion that the song
00:39:03.760 | itself would contain useful information. But if I were to guess,
00:39:07.680 | user data would be much more powerful. Like playlists would be much more
00:39:11.120 | powerful. Yeah, so we use both. Our biggest success
00:39:16.160 | initially was with playlist data without understanding
00:39:20.480 | anything about the structure of the song. But when we acquired EchoNest, they had
00:39:24.320 | the inverse problem. They actually didn't have any
00:39:27.520 | play data. They were just a provider of recommendations, but they
00:39:30.560 | didn't actually have any play data. So they looked at the structure of
00:39:34.560 | songs sonically and they looked at Wikipedia for
00:39:38.880 | cultural references and so forth, right? And did a lot of NLU and so forth. So we
00:39:42.960 | got that skill into the company and combined
00:39:46.320 | kind of our user data with their
00:39:51.200 | content-based. So you can think of it as we were user-based and they were
00:39:54.080 | content-based in their recommendations. And we combined those two. And for some
00:39:57.760 | cases where you have a new song that has no
00:39:59.600 | play data, obviously you have to try to go by
00:40:03.360 | either who the artist is or the sonic information in the song or what
00:40:08.800 | it's similar to. So there's definitely value in both and
00:40:11.920 | we do a lot in both. But I would say yes, the user data captures things that
00:40:17.280 | have to do with culture in the greater society
00:40:19.760 | that you would never see in the content itself.
00:40:23.520 | But that said, we have seen, we have a research lab in
00:40:27.520 | Paris when we can talk more about that on
00:40:31.360 | kind of machine learning on the creator side. What it can do for creators, not
00:40:34.080 | just for the consumers. But where we looked at how does the
00:40:37.840 | structure of a song actually affect the listening behavior? And it turns out
00:40:41.600 | that there is a lot of, we can predict things
00:40:44.560 | like skips based on the song itself. We could
00:40:48.800 | say that maybe you should move that chorus a bit
00:40:50.960 | because your skip is going to go up here.
00:40:52.560 | There is a lot of latent structure in the music, which is not surprising
00:40:56.080 | because it is some sort of mind hack. So there should be structure. That's
00:40:59.920 | probably what we respond to. You just blew my mind actually
00:41:03.200 | from the creator perspective. So that's a really interesting topic
00:41:07.520 | that probably most creators aren't taking advantage of.
00:41:11.600 | So I've recently got to interact with a few
00:41:15.600 | folks, YouTubers, who are like obsessed with this idea of
00:41:22.960 | what do I do to make sure people keep watching
00:41:26.880 | the video? And they like look at the analytics of which point do people turn
00:41:31.200 | it off and so on. First of all, I don't think that's
00:41:34.240 | healthy because you can do it a little too much.
00:41:38.320 | But it is a really powerful tool for helping the creative process.
00:41:42.960 | You just made me realize you could do the same thing for
00:41:46.240 | creation of music. So is that something you've looked into?
00:41:50.240 | Can you speak to how much opportunity there is for that?
00:41:55.200 | Yeah, I listened to the podcast with Zoroash
00:41:58.720 | and I thought it was fantastic and I reacted to the same thing where he said
00:42:02.400 | he posted something in the morning,
00:42:04.960 | immediately watched the feedback, where the drop-off was and then responded to
00:42:08.160 | that in the afternoon. Which is quite different from how
00:42:12.000 | people make podcasts for example. I mean the feedback loop is almost
00:42:15.520 | non-existent. So if we back out one level, I think
00:42:20.800 | actually both for music and podcasts, which we also
00:42:24.000 | do at Spotify, I think there's a tremendous opportunity
00:42:27.200 | just for the creation workflow. I think it's really interesting speaking
00:42:31.600 | to you, because you're a musician, a developer
00:42:34.640 | and a podcaster. If you think about those three
00:42:37.120 | different roles, if you make the leap as a musician,
00:42:41.840 | if you think about it as a software tool chain, really,
00:42:45.840 | your DAW with the stems, that's the IDE, right? That's where you work in source
00:42:50.240 | code format with what you're creating.
00:42:54.000 | Then you sit around and you play with that and when you're happy you compile
00:42:56.560 | that thing into some sort of AAC or MP3 or something.
00:43:00.400 | You do that because you get distribution. There are so many run times for that MP3
00:43:03.680 | across the world in car stares and stuff. So
00:43:05.440 | you kind of compile this executable and you ship it out in kind of an old-fashioned
00:43:09.200 | boxed software analogy. And then you hope for the
00:43:13.040 | best, right? But as a software developer,
00:43:18.160 | you would never do that. First you go on GitHub and you collaborate with other
00:43:21.120 | creators. And then you think it'd be crazy to
00:43:24.400 | just ship one version of your software without doing an A/B test,
00:43:27.680 | without any feedback loop. Issue tracking.
00:43:31.440 | Exactly. And then you would look at the feedback loops and try to optimize
00:43:35.040 | that thing, right? So I think if you think of it as a very
00:43:38.480 | specific software tool chain, it looks quite arcane.
00:43:43.360 | The tools that a music creator has versus what a software developer has.
00:43:47.440 | So that's kind of how we think about it. Why wouldn't a
00:43:52.000 | music creator have something like GitHub where you could collaborate
00:43:55.520 | much more easily? So we bought this company called Soundtrap,
00:43:59.120 | which has a kind of Google Docs for music approach,
00:44:02.960 | where you can collaborate with other people on the kind of source code format
00:44:06.480 | with stems. And I think introducing things like
00:44:09.600 | AI tools there to help you as you're creating music,
00:44:14.000 | both in helping you put accompaniment to your music,
00:44:21.360 | like drums or something, help you master and mix automatically,
00:44:27.200 | help you understand how this track will perform. Exactly what you would expect
00:44:30.880 | as a software developer. I think it makes a lot of sense. And I
00:44:34.000 | think the same goes for a podcaster. I think podcasters will expect to
00:44:37.920 | have the same kind of feedback loop that Zirosh has.
00:44:40.480 | Like, why wouldn't you? Maybe it's not healthy, but...
00:44:44.480 | Sorry, I wanted to criticize the fact that you can overdo it.
00:44:48.080 | Because a lot of the... And we're in a new era
00:44:52.240 | of that, so you can become addicted to it.
00:44:56.640 | And therefore, what people say, you become a slave to the YouTube algorithm.
00:45:02.640 | It's always a danger of a new technology,
00:45:06.880 | as opposed to, say, if you're creating a song,
00:45:10.160 | becoming too obsessed about the intro riff to the song that keeps people
00:45:16.160 | listening, versus actually the entirety of the creation process.
00:45:19.280 | It's a balance. But the fact that there's zero...
00:45:22.240 | I mean, you're blowing my mind right now, because you're
00:45:25.520 | completely right that there's no signal whatsoever,
00:45:28.960 | there's no feedback whatsoever on the creation process in music or podcasting,
00:45:34.240 | almost at all. And are you saying that Spotify is hoping to help create tools
00:45:41.680 | to... Not tools, but... - No, tools, actually. - Actually tools for creators.
00:45:47.200 | - Absolutely. So we have... We've made some acquisitions the last few years
00:45:52.400 | around music creation. This company called Soundtrap, which is a
00:45:55.520 | digital audio workstation, but that is browser-based.
00:45:59.040 | And their focus was really the Google Docs approach, where you can collaborate
00:46:02.000 | with people much more easily than you could in previous tools. So we
00:46:06.400 | have some of these tools that we're working with that we want to make
00:46:08.800 | accessible, and then we can connect it with our
00:46:12.240 | consumption data. We can create this feedback loop where
00:46:15.280 | we could help you understand, we could help you
00:46:18.560 | create and help you understand how you will perform. We also
00:46:22.320 | acquired this other company within podcasting called Anchor, which is one of
00:46:25.600 | the biggest podcasting tools, mobile-focused, so really focused on
00:46:30.000 | simple creation or easy access to creation. But that also
00:46:34.000 | gives us this feedback loop. And even before that, we
00:46:38.240 | invested in something called Spotify for Artists and Spotify for Podcasters,
00:46:43.440 | which is an app that you can download, you can verify that you are that creator.
00:46:47.200 | And then you get things that software developers have had for
00:46:52.720 | years. You can see where, if you look at your podcast, for example,
00:46:56.000 | on Spotify or a song that you release, you can see
00:46:59.200 | how it's performing, which cities it's performing in, who's listening to it,
00:47:02.480 | what's the demographic breakup. So similar in the sense that you can
00:47:07.040 | understand how you're actually doing on the platform.
00:47:10.400 | So we definitely want to build tools. I think you also interviewed the
00:47:15.520 | head of research for Adobe, and I think that's an,
00:47:19.520 | back to Photoshop that you like, I think that's an interesting analogy as
00:47:22.960 | well. Photoshop, I think, has been very
00:47:25.840 | innovative in helping photographers and artists, and I think
00:47:30.960 | there should be the same kind of tools for
00:47:33.200 | for music creators, where you could get AI assistance, for example, as you're
00:47:37.040 | creating music, as you can do with Adobe, where you can,
00:47:41.200 | I want a sky over here, and you can get help creating that sky.
00:47:44.160 | The really fascinating thing is what Adobe
00:47:48.000 | doesn't have is a distribution for the content you create.
00:47:52.640 | So you don't have the data of, if I create,
00:47:55.680 | if I, you know, whatever creation I make in Photoshop or Premiere,
00:48:01.440 | I can't get like immediate feedback like I can on YouTube, for example, about
00:48:05.920 | the way people are responding. And if Spotify is creating those tools,
00:48:09.920 | that's a really exciting, actually, world.
00:48:13.680 | But let's talk a little about podcasts. So I have trouble talking to one
00:48:20.400 | person, so it's a bit terrifying and kind of
00:48:24.480 | hard to fathom, but on average, 60 to 100,000
00:48:29.280 | people will listen to this episode. Okay, so it's intimidating.
00:48:34.160 | Yeah, it's intimidating. So I hosted on Blueberry.
00:48:38.800 | I don't know if I'm pronouncing that correctly, actually. It looks like most
00:48:42.400 | people listen to it on Apple Podcasts, Castbox, and Pocketcast, and only about
00:48:47.200 | a thousand listen on Spotify. Just my podcast, right?
00:48:53.840 | So where do you see a time when Spotify will
00:48:59.920 | dominate this? So Spotify is relatively new into this.
00:49:04.480 | In podcasting. Sorry, yeah, in podcasting.
00:49:07.520 | What's the deal with podcasting and Spotify?
00:49:10.800 | How serious is Spotify about podcasting? Do you see a time where everybody would
00:49:15.440 | listen to, you know, probably a huge amount of people,
00:49:18.480 | majority perhaps, listen to music on Spotify? Do you see a
00:49:23.040 | time when the same is true for podcasting? Well, I certainly hope so.
00:49:28.560 | That is our mission. Our mission as a company is actually to
00:49:31.840 | enable a million creators to live off of their art and a billion people be
00:49:35.200 | inspired by it. And what I think is interesting about that mission is
00:49:38.320 | it actually puts the creators first, even though it started as a consumer-focused
00:49:42.240 | company, and it says to be able to live off of
00:49:44.480 | their art, not just make some money off of their art as well.
00:49:47.840 | So it's quite an ambitious project. And
00:49:51.920 | so we think about creators of all kinds and
00:49:55.520 | we kind of expanded our mission from being music to being
00:49:58.880 | audio a while back. And that's not so much because
00:50:05.920 | we think we made that decision. We think that decision was
00:50:10.000 | was made for us. We think the world made that decision. Whether we like it or not,
00:50:14.960 | when you put in your headphones, you're going to make a choice between
00:50:18.960 | music and a new episode of your podcast or something else.
00:50:25.440 | We're in that world whether we like it or not. And that's how radio works.
00:50:28.960 | So we decided that we think it's about audio.
00:50:32.320 | You can see the rise of audiobooks and so forth. We think audio is this great
00:50:35.600 | opportunity. So we decided to enter it. And obviously
00:50:40.720 | Apple and Apple Podcasts is absolutely dominating
00:50:44.240 | in podcasting. And we didn't have a single podcast
00:50:47.840 | only like two years ago. What we did though was
00:50:51.440 | we looked at this and said, "Can we bring something to this?"
00:50:56.640 | We want to do this, but back to the original Spotify, we had to do
00:51:00.240 | something that consumers actually value to be able to do this. And the reason
00:51:05.600 | we've gone from not existing at all to being the
00:51:08.080 | quite a wide margin, the second largest podcast
00:51:12.320 | consumption, still wide gap to iTunes, but we're growing quite fast.
00:51:17.120 | I think it's because when we looked at the consumer problem,
00:51:21.040 | people said surprisingly that they wanted their podcasts and
00:51:24.560 | music in the same application. So what we did was we took a little
00:51:29.040 | bit of a different approach where we said instead of building a separate
00:51:31.360 | podcast app, we thought, "Is there a consumer problem to solve
00:51:34.960 | here because the others are very successful already?"
00:51:37.280 | And we thought there was in making a more seamless experience
00:51:40.480 | where you can have your podcast and your music in the same application.
00:51:45.120 | Because we think it's audio to you and that has been successful and
00:51:48.640 | that meant that we actually had 200 million people to
00:51:51.600 | offer this to instead of starting from zero.
00:51:54.000 | So I think we have a good chance because we're taking a different approach than
00:51:57.520 | the competition. And back to the other thing I mentioned
00:52:00.320 | about creators, because we're looking at the
00:52:04.000 | end-to-end flow, I think there's a tremendous amount of
00:52:06.960 | innovation to do around podcasts as a format.
00:52:09.920 | When we have creation tools and consumption, I think we could
00:52:13.840 | start improving what podcasting is. I mean podcast is this
00:52:17.440 | this opaque big like one two hour file that you're streaming, which it really
00:52:23.200 | doesn't make that much sense in 2019 that
00:52:25.920 | it's not interactive, there's no feedback loops, nothing like that.
00:52:28.960 | So I think if we're gonna win it's gonna have to be because we build a better
00:52:32.080 | product for creators and for consumers. So we'll
00:52:36.000 | see, but it's certainly our goal. We have a long way to go.
00:52:39.120 | Well the creators part is really exciting. You already got me
00:52:42.240 | hooked there. It's the only stats I have. Blueberry just recently added the stats
00:52:46.960 | of whether it's listened to the end or not.
00:52:51.440 | And that's like a huge improvement, but that's still
00:52:56.080 | nowhere to where you could possibly go in terms of statistics. You just download
00:52:59.440 | the Spotify podcasters app and verify and then
00:53:01.600 | then you'll know where people dropped out in this episode. Oh wow, okay.
00:53:05.520 | The moment I started talking, okay. I might be depressed by this.
00:53:10.160 | But okay, so one other question. The original Spotify for music,
00:53:18.320 | and I have a question about podcasting in this line, is
00:53:21.760 | the idea of albums. I have music aficionados, friends who are
00:53:28.560 | really big fans of music, often really enjoy
00:53:32.880 | albums, listening to entire albums of an artist.
00:53:36.480 | Correct me if I'm wrong, but I feel like Spotify has helped
00:53:41.040 | replace the idea of an album with playlists.
00:53:44.320 | So you create your own albums. It's kind of the way, at least I've
00:53:48.720 | experienced music and I really enjoy it that way.
00:53:51.760 | One of the things that was missing in podcasting for me,
00:53:55.600 | I don't know if it's missing. I don't know. It's an open question for me.
00:53:59.200 | But the way I listen to podcasts is the way I would listen to albums.
00:54:02.720 | So I take Joe Rogan Experience, and that's an album.
00:54:06.240 | And I listen, you know, I put that on, and I listen one episode after the next,
00:54:11.600 | then there's a sequence and so on. Is there room for
00:54:17.120 | doing what you did for music, doing what Spotify did for music,
00:54:20.720 | but creating playlists, sort of this kind of playlisting idea of
00:54:26.080 | breaking apart from podcasting, from individual podcasts and creating
00:54:30.480 | kind of this interplay? Or have you thought about
00:54:34.640 | that space? It's a great question. So I think in
00:54:38.480 | music, you're right. Basically, you bought an album. So it was like you bought
00:54:42.400 | a small catalog of like 10 tracks, right? It was, again, it was actually a
00:54:46.000 | lot of consumption. You think it's about what you like,
00:54:49.600 | but it's based on the business model. Right. So you paid for this 10-track
00:54:53.680 | service, and then you listen to that for a while. And then when everything was
00:54:57.120 | flat-priced, you tended to listen differently.
00:55:00.080 | Now, so I think the album is still tremendously important. That's
00:55:03.120 | why we have it. And you can save albums and so forth. And
00:55:05.440 | you have a huge amount of people who really listen according to albums.
00:55:08.480 | And I like that because it is a creator format. You can tell a longer story
00:55:12.320 | over several tracks. And so some people listen to just one track. Some people
00:55:16.240 | actually want to hear that whole story. Now, in podcast, I think
00:55:22.560 | it's different. You can argue that podcasts might be more like shows on
00:55:26.480 | Netflix. You have like a full season of Narcos,
00:55:30.000 | and you're probably not going to do like one episode of Narcos and then one of
00:55:33.040 | House of Cards. There's a narrative there, and you
00:55:38.240 | love the cast and you love these characters. So I think people will
00:55:41.520 | love shows, and I think they will
00:55:46.000 | listen to those shows. I do think you follow a bunch of shows at the same
00:55:48.880 | time. So there's certainly an opportunity to bring you the latest episode of
00:55:52.640 | whatever the five, six, ten things that you're into.
00:55:56.160 | But I think people are going to listen to
00:56:00.400 | specific hosts and love those hosts for a long time because I think there's
00:56:04.640 | something different with podcasts where this
00:56:09.040 | format of the experience of the
00:56:12.880 | audience is actually sitting here right between us.
00:56:15.440 | Whereas if you look at something on TV, the audio actually would come from,
00:56:18.960 | you would sit over there, and the audio would come to you from both of us as if
00:56:22.160 | you were watching, not as you were part of the conversation.
00:56:24.800 | So my experience is having listened to podcasts like yours
00:56:28.000 | and Joe Rogan, I feel like I know all of these people. They have no idea
00:56:32.080 | who I am, but I feel like I've listened to so many hours of them.
00:56:35.040 | It's very different from me watching a TV show or an interview.
00:56:39.440 | So I think you kind of fall in love with people
00:56:43.040 | and experience it in a different way. So I think
00:56:46.560 | shows and hosts are going to be very important. I don't think
00:56:49.760 | that's going to go away into some sort of thing where
00:56:51.920 | you don't even know who you're listening to. I don't think that's going
00:56:54.000 | to happen. What I do think is, I think there's a
00:56:56.400 | tremendous discovery opportunity in podcasts
00:57:00.400 | because the catalog is growing quite quickly.
00:57:04.000 | And I think podcasts is only a few, like five, six hundred thousand shows
00:57:10.400 | right now. If you look back to YouTube, that's
00:57:12.800 | another analogy of creators. No one really knows if you would lift
00:57:16.720 | the lid on YouTube, but it's probably billions
00:57:19.120 | of episodes. And so I think the podcast catalog will probably grow
00:57:23.520 | tremendously because the creation tools are getting easier.
00:57:27.040 | And then you're going to have this discovery opportunity that I think is
00:57:30.800 | really big. So a lot of people tell me that they love their shows,
00:57:34.800 | but discovery in podcasts kind of suck. It's really hard to get into a new show.
00:57:38.800 | They're usually quite long. It's a big time investment. So I think there's
00:57:41.520 | plenty of opportunity in the discovery part.
00:57:45.600 | Yeah, for sure. A hundred percent. And even the dumbest,
00:57:49.520 | there's so many low-hanging fruit, too. For example,
00:57:54.480 | just knowing what episode to listen to first
00:57:58.480 | to try out a podcast. Exactly. Because most podcasts don't have an order to
00:58:03.200 | them. They can be listened to out of order.
00:58:06.400 | And sorry to say, some are better than others episodes.
00:58:12.640 | So some episodes of Joe Rogan are better than others. And it's
00:58:16.480 | nice to know which you should listen to to try it out.
00:58:20.400 | And there's, as far as I know, almost no information
00:58:24.400 | in terms of like upvotes on how good an episode is.
00:58:28.640 | Exactly. So I think part of the problem is
00:58:32.080 | it's kind of like music. There isn't one answer. People use music for different
00:58:35.600 | things. And there's actually many different types of music. There's workout
00:58:38.080 | music and there's classical piano music and focus music and
00:58:41.200 | and so forth. I think the same with podcasts. Some podcasts are sequential.
00:58:45.360 | They're supposed to be listened to in order. It's actually
00:58:49.760 | telling a narrative. Some podcasts are one topic, kind of like
00:58:54.800 | yours, but different guests. So you could jump in anywhere.
00:58:57.280 | Some podcasts actually have completely different topics. And for those podcasts,
00:59:00.560 | it might be that we should recommend one episode
00:59:04.560 | because it's about AI from someone. But then they talk about
00:59:08.480 | something that you're not interested in the rest of the episodes.
00:59:10.880 | So I think what we're spending a lot of time on now is just first
00:59:14.560 | understanding the domain and creating kind of the knowledge graph
00:59:18.400 | of how do these objects relate and how do people consume. And I think we'll find
00:59:22.960 | that it's going to be different. I'm excited.
00:59:27.440 | Spotify is the first people I'm aware of that are
00:59:32.320 | trying to do this for podcasting. Podcasting has been like a wild west
00:59:36.800 | up until now. It's been a very... We want to be very careful though because it's
00:59:41.360 | been a very good wild west. I think it's this fragile
00:59:44.960 | ecosystem and we want to make sure that you
00:59:49.360 | don't barge in and say like, "Oh, we're gonna
00:59:52.080 | internetize this thing." And you have to think about the
00:59:55.760 | creators. You have to understand how they get distribution today, who
00:59:59.920 | listens to how they make money today, try to make sure that their
01:00:03.760 | business model works, that they understand.
01:00:06.080 | I think it's back to doing something, improving their products
01:00:09.440 | like feedback loops and distribution. So jumping back into terms of this
01:00:15.760 | fascinating world of recommender system and listening to music and using
01:00:19.920 | machine learning to analyze things, do you think it's
01:00:23.600 | better to... What currently, correct me if I'm wrong,
01:00:28.240 | but currently Spotify lets people pick what they listen to for the
01:00:32.720 | most part. There's a discovery process but you kind of
01:00:35.520 | organize playlists. Is it better to let people pick what they listen to
01:00:40.800 | or recommend what they should listen to? Something like Stations by Spotify that
01:00:46.320 | I saw that you're playing around with. Maybe you can tell me what's the status
01:00:50.480 | of that. This is a Pandora style app that just kind of...
01:00:54.400 | As opposed to you select the music you listen to, it kind of
01:00:58.800 | feeds you the music you listen to. What's the status of Stations by Spotify?
01:01:04.080 | What's its future? The story of Spotify as we have grown
01:01:07.760 | has been that we made it more accessible
01:01:09.600 | to different audiences. Stations is another one of those where
01:01:15.360 | the question is, some people want to be very specific. They actually want to hear
01:01:19.040 | "Stairway to Heaven" right now. That needs to be very easy to do.
01:01:24.000 | Some people or even the same person at some point might say
01:01:27.840 | "I want to feel upbeat" or "I want to feel happy" or
01:01:31.520 | "I want songs to sing in the car". So they put in
01:01:34.640 | the information at a very different level and then we need to
01:01:37.760 | translate that into what that means musically. So Stations is a test to
01:01:42.800 | create like a consumption input vector that is much simpler where you can just
01:01:45.920 | tune it a little bit and see if that increases the overall
01:01:49.360 | reach. But we're trying to kind of serve the entire gamut of super advanced so-called
01:01:54.560 | music aficionados all the way to people who
01:01:59.520 | they love listening to music but it's not their number one priority in life.
01:02:03.040 | They're not going to sit and follow every new release from every new
01:02:05.600 | artist. They need to be able to influence music
01:02:08.560 | at a different level. So we're trying, you
01:02:12.640 | can think of it as different products and I think when
01:02:14.880 | one of the interesting things to answer your question on
01:02:19.440 | if it's better to let the user choose or to play, I think the answer is
01:02:23.760 | the challenge when machine learning kind of came along
01:02:27.840 | there was a lot of thinking about what does product development mean
01:02:31.520 | in a machine learning context. People like Andrew Ng for example
01:02:36.560 | when he went to Baidu he started doing a lot of practical machine learning, went
01:02:39.600 | from academia and he thought a lot about this and he
01:02:42.640 | had this notion that a product manager, designer, an engineer, they used to
01:02:46.320 | work around this wireframe. Kind of describe what the product should look
01:02:49.280 | like or something to talk about. When you're doing like a chatbot or a
01:02:52.320 | playlist, what are you going to say? Like it should be good.
01:02:55.520 | That's not a good product description. So how do you do that and he came up
01:02:58.880 | with this notion that the test set is the new wireframe. The
01:03:03.520 | job of the product manager is to source a good test set that is
01:03:06.080 | representative of what, like if you say like I want to play this
01:03:09.120 | that is Songstressing in the car. The job of the product manager is to go
01:03:12.880 | and source like a good test set of what that means.
01:03:15.440 | Then you can work with engineering to have algorithms to try to produce that
01:03:18.960 | right. So we try to think a lot about how to
01:03:22.080 | structure product development for a machine learning age and what we
01:03:27.040 | discovered was that a lot of it is actually in the expectation
01:03:30.560 | and you can go two ways.
01:03:35.280 | Let's say that if you set the expectation with the user that this
01:03:39.920 | is a discovery product like Discover Weekly,
01:03:42.640 | you're actually setting the expectation that most of what we show you will not
01:03:45.760 | be relevant. When you're in the discovery process
01:03:48.080 | you're going to accept that actually if you find one gem every
01:03:51.760 | Monday that you totally love, you're probably going to be happy.
01:03:55.200 | Even though the statistical meaning one out of ten is terrible or one out of 20
01:03:59.600 | is terrible from a user point of view because the setting was discovered is
01:04:02.400 | fine. Can I say to interrupt real quick, I just
01:04:05.840 | actually learned about Discover Weekly which is a Spotify,
01:04:10.560 | I don't know, it's a feature of Spotify that shows you
01:04:13.760 | cool songs to listen to. Maybe I can do issue tracking, I couldn't
01:04:18.480 | find it on my Spotify app. It's in your library. It's in the library,
01:04:22.640 | it's in the list of libraries because I was like whoa this is cool I
01:04:25.120 | didn't know this existed and I tried to find it.
01:04:27.440 | I will show it to you and feedback to our product team.
01:04:31.280 | Yeah there you go but yeah so yeah sorry
01:04:34.480 | just to mention the expectation there is
01:04:38.800 | basically that you're going to discover new songs.
01:04:42.240 | Yeah so then you can be quite adventurous in
01:04:45.440 | the recommendations you do but we have another product called
01:04:50.400 | Daily Mix which kind of implies that these are only going to be your
01:04:53.600 | favorites. So if you have one out of ten that is
01:04:56.240 | good and nine out of ten that doesn't work for you,
01:04:58.320 | you're going to think it's a horrible product. So actually a lot of the product
01:05:00.640 | development we learned over the years is about
01:05:02.800 | setting the right expectations. So for Daily Mix, you know algorithmically
01:05:07.440 | we would pick among things that feel very safe in
01:05:10.240 | your taste space. With Discover Weekly we go kind of wild
01:05:13.360 | because the expectation is most of this is not gonna. So a lot of
01:05:17.200 | that, a lot of to answer your question there
01:05:19.200 | a lot of should you let the user pick or not it depends.
01:05:23.040 | We have some products where the whole point is that the user can click play
01:05:26.320 | put the phone in the pocket and it should be really good music for like
01:05:29.280 | an hour. We have other products where you probably need to say like no
01:05:33.120 | no save no no and it's very interactive.
01:05:37.120 | I see that makes sense and then the radio product the station's product is
01:05:40.480 | one of these like click play put in your pocket for hours.
01:05:43.440 | That's really interesting so you're thinking of different test sets
01:05:47.120 | for different users and trying to create products that sort of optimize
01:05:53.760 | optimize for those test sets that represent a specific set of users.
01:05:58.560 | Yes I think one thing that I think is interesting is
01:06:03.680 | we invested quite heavily in editorial in people creating playlists
01:06:07.920 | using statistical data and that was successful for us and then we also
01:06:11.600 | invested in machine learning and for the longest time you know within
01:06:16.240 | Spotify and within the rest of the industry there was always this
01:06:18.640 | narrative of humans versus the machine. Algo versus editorial and editors
01:06:24.160 | would say like well if I had that data if I could see your
01:06:27.600 | playlisting history and I made a choice for you I would have
01:06:30.320 | made a better choice and they would have because they
01:06:32.960 | understand they're much smarter than these algorithms. The human is
01:06:35.760 | incredibly smart compared to our algorithms. They can take culture
01:06:39.760 | into account and so forth. The problem is that they can't make 200
01:06:43.440 | million decisions you know per hour for every user that
01:06:47.360 | logs in so the algo may be not as sophisticated but much more
01:06:51.120 | efficient. So there was this there was this
01:06:53.440 | contradiction but then a few years ago we started
01:06:57.280 | focusing on this kind of human in the loop thinking around machine learning
01:07:01.280 | and we actually coined an internal term for it called algotorial
01:07:05.200 | the combination of algorithms and editors where
01:07:08.800 | if we take a concrete example you think of the editor
01:07:12.480 | this paid expert that we have that's really good at something like
01:07:17.920 | soul, hip-hop, EDM something right there are two experts no one in the industry
01:07:23.520 | so they have all the cultural knowledge you think of them as the product manager
01:07:27.520 | and you say that let's say that you want to create a
01:07:31.920 | you think that there's a there's a product need in the world for something
01:07:35.040 | like songs to sing in the car or songs to sing in the shower
01:07:37.360 | I'm taking that example because it exists people love to scream
01:07:40.720 | songs in the car when they drive right yeah so you want to create that product
01:07:44.640 | then you have this product manager who's a musical expert
01:07:47.520 | they create they come up with a concept like I think this is a missing thing in
01:07:51.040 | humanity like a playlist called songs in the car
01:07:54.720 | they create the the framing the image the title
01:07:58.480 | and they create a test set of they create a group of songs like a few
01:08:01.920 | thousand songs out of the catalog that they manually
01:08:04.320 | curate that are known songs that are great to sing in the car
01:08:08.080 | and they can take like true romance into account they understand things that our
01:08:11.440 | algorithms do not at all so they have this huge set of tracks
01:08:15.120 | then when we deliver that to you we look at your taste vectors and you
01:08:19.200 | get the 20 tracks that are songs to sing in the car in your taste
01:08:23.200 | so you have you have personalization and
01:08:26.320 | editorial input in the same process if that makes sense yeah it makes
01:08:30.960 | total sense and I have several questions around that this is a this is like
01:08:35.280 | fascinating okay so first it is a little bit surprising to me
01:08:40.640 | that the world expert humans are outperforming machines
01:08:47.120 | at specifying songs to sing in the car so maybe you could talk to that a
01:08:54.160 | little bit I don't know if you can put it into words but
01:08:57.280 | what is it how difficult is this problem
01:09:00.800 | uh of do you really uh I guess what I'm trying to ask is there
01:09:06.160 | how difficult is it to encode the cultural references
01:09:10.000 | uh the the context of the song the artists
01:09:13.840 | all all those things together can machine learning really not do that
01:09:17.920 | I mean I think machine learning is great at replicating patterns
01:09:22.640 | if you have the patterns but if you try to write with me a spec of what songs
01:09:26.960 | greatest song to sing in the car definition is is it is it loud does it
01:09:31.200 | have many choruses should it have been in movies it's
01:09:33.920 | it quickly gets incredibly complicated right yeah
01:09:36.960 | and and a lot of it may not be in the structure of the song or the title it
01:09:41.120 | could be cultural references because you know it was a history so so the
01:09:45.920 | definition problems quickly get and I think that was the that
01:09:49.520 | was the insight of Andrew Ng when he said the job of the product
01:09:52.720 | manager is to understand these things that
01:09:54.640 | that algorithms don't and then define what that looks like and then you have
01:09:59.120 | something to train towards right then you have kind of the test set
01:10:02.720 | and then so so today the editors create this pool of tracks and then we
01:10:06.400 | personalize you could easily imagine that once you have this set you could
01:10:09.920 | have some automatic exploration of the rest of the catalog
01:10:12.480 | because then you understand what it is and then the other side of it when
01:10:16.000 | machine learning does help is this taste vector how hard is it to
01:10:21.440 | construct a vector that represents the things an
01:10:26.080 | individual human likes this human preference so you can
01:10:31.600 | you know music isn't like it's not like amazon
01:10:35.520 | like things you usually buy music seems more amorphous like it's this
01:10:41.200 | thing that's hard to specify like what what is well you know if you look at my
01:10:46.320 | playlist what is the music that I love it's harder
01:10:49.360 | it seems to be uh much more difficult to specify concretely
01:10:54.080 | so how hard is it to build a taste vector
01:10:57.200 | it is very hard in the sense that you need a lot of data
01:11:00.720 | and I think what we found was that so it's not
01:11:04.400 | so it's not a stationary problem it changes over time
01:11:07.840 | um and so we've gone through the journey of if if um
01:11:14.240 | you've done a lot of computer vision obviously I've done a bunch of computer
01:11:17.280 | vision in my past and we started kind of with the
01:11:19.840 | handcrafted heuristics for you know this is kind of in the music
01:11:24.880 | this is this and if you consume this you probably like this
01:11:27.520 | so we we have we started there and we have some of that still
01:11:31.280 | then what was interesting about the playlist data was that you could find
01:11:34.240 | these latent things that wouldn't necessarily even make sense to
01:11:37.360 | you that could could even capture maybe
01:11:40.560 | cultural references because they co-occurred
01:11:42.960 | things that that wouldn't have appeared kind of mechanistically either in the
01:11:47.520 | content or so forth so um
01:11:52.400 | I think that um
01:11:56.080 | I think the core assumption is that there are patterns
01:12:01.120 | in in almost everything and if there are patterns
01:12:05.040 | these these embedding techniques are getting better and better now now
01:12:08.400 | as everyone else we're also using kind of deep embeddings where you can
01:12:12.880 | encode binary values and and so forth um and and what I think is
01:12:17.520 | interesting is is this process to try to find things
01:12:20.560 | that um that do not necessarily you wouldn't
01:12:24.480 | actually have have guessed so it is very hard in a in a in an
01:12:28.880 | engineering sense to find the right dimensions it's an
01:12:31.760 | incredible scalability problem to do for hundreds of millions of users and to
01:12:36.160 | update it every day but in but in theory um
01:12:41.680 | in theory embeddings isn't that complicated
01:12:44.880 | the fact that you try to find some principal components or something like
01:12:47.920 | that dimensionality reduction and so forth so
01:12:50.000 | the theory I guess is easy the practice is
01:12:52.000 | is very very hard and it's a it's a huge engineering challenge but fortunately we
01:12:56.800 | have some amazing both research and engineering teams in
01:13:00.320 | in this space yeah I guess the the question is all
01:13:05.280 | I mean it's similar I deal with it with an autonomous vehicle space is the
01:13:08.400 | question is how hard is driving and here is
01:13:14.160 | basically the question is of edge cases uh so embedding probably works
01:13:22.720 | not probably but I would imagine works well in a lot of cases
01:13:27.760 | so there's a bunch of questions that arise then so do
01:13:31.280 | song preferences does your taste vector depend on
01:13:34.720 | context like mood right so there's different moods and
01:13:41.200 | absolutely so how does that take in it is it is it possible to take that as a
01:13:47.600 | consideration or do you just leave that as a interface
01:13:51.600 | problem that allows the user to just control it
01:13:54.000 | so when I'm looking for a workout music I kind of specify it by
01:13:58.320 | choosing certain playlists doing certain search yeah
01:14:01.520 | so that's a great point it's back to the product development
01:14:04.800 | you could try to spend a few years trying to predict which mood you're in
01:14:08.560 | automatically when you open Spotify or you create a tab which is happy and
01:14:12.240 | sad right and you're going to be right 100% of the time with one click
01:14:15.600 | now it's probably much better to let the user tell you if they're happy or sad
01:14:19.440 | or if they want to work out on the other hand if your user interface become 2000
01:14:23.520 | tabs you're introducing so much friction so
01:14:25.760 | no one will use the product so then you have to get better
01:14:28.560 | so it's this thing where I think maybe it was
01:14:32.480 | I remember who coined it but it's called fault tolerant uis right you build a ui
01:14:35.760 | that is tolerant to being wrong and then you can be much less right in
01:14:40.400 | your in your in your algorithms so we you know
01:14:44.160 | we've had to learn a lot of that building the right ui that
01:14:46.800 | fits where the where the machine learning is
01:14:50.240 | and and and a great discovery there which is which was by the teams during
01:14:55.120 | uh one of our hack days was this thing of taking discovery packaging it
01:14:59.600 | into a playlist and saying that these are new tracks
01:15:03.840 | that we think you might like based on this and setting the right expectation
01:15:07.280 | made it made it a great product so I think we
01:15:10.080 | have this benefit that for example Tesla doesn't have that we can we can
01:15:15.440 | we can change the expectation we can we can build a fault tolerant
01:15:18.320 | setting it's very hard to be fault tolerant when you're driving at a
01:15:21.200 | you know 100 miles per hour or something and and we we have the luxury of
01:15:26.160 | being able to say that of being wrong if we have the right
01:15:29.680 | ui which gives us different abilities to take more risk so I actually think
01:15:34.720 | the self-driving problem is is much harder oh yeah
01:15:38.400 | for sure it's much less fun because people die exactly
01:15:45.200 | and since Spotify uh it's such a more fun problem because
01:15:51.280 | failure will I mean failure is beautiful in a way it leads to exploration so it's
01:15:56.640 | it's a really fun reinforcement learning problem the worst case scenario is you
01:15:59.760 | get these wtf tweets like how the hell did I get this this song
01:16:03.280 | which is which is a lot better than the self-driving failure
01:16:07.040 | so what's the feedback that a user what's the signal
01:16:12.080 | that a user provides into the system so the the you mentioned skipping
01:16:19.360 | what is like the strongest signal is uh you didn't mention clicking like
01:16:24.800 | so so we have a few signals that are important obviously
01:16:28.240 | playing playing through so so one of the benefits of music actually even compared
01:16:32.880 | to podcast or or movies is the object itself is really
01:16:37.760 | only about three minutes so you get a lot of chances to recommend
01:16:41.360 | and the feedback loop is is every three minutes instead of every
01:16:44.800 | two hours or something so you actually get
01:16:47.520 | kind of noisy but but quite fast feedback
01:16:50.880 | and so you can see if people played through or if the which is you know the
01:16:53.760 | inverse of skip really that's an important signal on the other
01:16:57.040 | hand much of the consumption happens when your phone is in your pocket maybe
01:17:00.480 | you're running or driving or you're playing on a speaker
01:17:03.040 | and so you not skipping doesn't mean that you love that song it might be that
01:17:06.240 | it wasn't bad enough that you would walk up and skip so it's a noisy signal
01:17:10.560 | then then we have the equivalent of the like which is you saved it to your
01:17:13.200 | library that's a pretty strong signal of
01:17:15.440 | affection and then we have the more explicit signal of
01:17:20.640 | playlisting like you took the time to create a playlist you put it in there
01:17:24.000 | there's a very little small chance that if you took
01:17:27.520 | all that trouble this is not a really important track to you
01:17:30.480 | and then we understand also what other tracks it relates to so we have
01:17:34.800 | we have the playlisting we have the like and then we have the listening or skip
01:17:39.120 | and and you have to have very different approaches to all of them because at
01:17:42.720 | different levels of of noise one one is very voluminous but
01:17:46.080 | noisy and the other is rare but you can you can probably trust it yeah
01:17:50.720 | it's interesting because uh i i think between those signals captures
01:17:54.960 | all the information you'd want to capture i mean there's a feeling
01:17:58.800 | a shallow feeling for me that there's sometimes i'll hear a song that's like
01:18:02.320 | yes this is you know this is the right song for
01:18:05.040 | the moment but there's really no way to express
01:18:08.160 | that fact except by listening through it all the way
01:18:11.680 | yeah and maybe playing it again at that time or something yeah
01:18:15.280 | there's no need for a button that says this was the best song could have heard
01:18:19.680 | at this moment well we're playing around with that with
01:18:22.480 | kind of the thumbs up concept saying like i really like this
01:18:25.200 | just kind of talking to the algorithm it's unclear if that's
01:18:28.720 | the best way for humans to interact maybe it is maybe they should think of
01:18:32.160 | spotify as a person an agent sitting there trying to serve you and you can
01:18:35.920 | say like bad spotify good spotify right now the
01:18:39.360 | analogy we've had is more you shouldn't think of of us we should
01:18:43.280 | be invisible and the feedback is if you save it
01:18:46.640 | kind of you work for yourself you do a playlist because you think is great and
01:18:49.920 | we can learn from that it's kind of back to back to tesla how
01:18:53.680 | they kind of have this shadow mode they sit in what you drive
01:18:56.800 | we kind of took the same analogy we sit in what you playlist
01:19:00.400 | and then maybe we can we can offer you an autopilot where you can take over for
01:19:03.360 | a while or something like that and then back off if you say like that's
01:19:06.720 | not that's not good enough but but i think it's interesting to figure
01:19:09.840 | out what your mental model is if spotify is an ai that you talk to
01:19:15.200 | which i think might be a bit too abstract for for many
01:19:18.880 | consumers or if you still think of it as it's my music app
01:19:22.560 | but it's just more helpful and depends on the device it's
01:19:26.480 | running on which brings us to smart speakers
01:19:31.040 | so i have a lot of the spotify listening i do is on
01:19:35.360 | things that on devices i can talk to whether it's from amazon google or
01:19:39.600 | apple what's the role of spotify on those
01:19:42.320 | devices how do you think of it differently than
01:19:44.800 | on the phone or on the desktop there are a few things to say about the
01:19:50.960 | first of all it's incredibly exciting they're growing like
01:19:53.360 | crazy especially here in the in the in the u.s
01:19:57.360 | and it's solving a consumer need that i think is
01:20:04.640 | is you can think of it as
01:20:08.400 | just remote interactivity you can control this thing from from from across
01:20:11.840 | the room and it may feel like a small thing but
01:20:14.720 | it turns out that friction matters to consumers being
01:20:17.920 | able to say play pause and so forth from across
01:20:20.960 | the room is is very powerful so basically you made you made the
01:20:24.720 | living room interactive now and
01:20:29.040 | what we see in our data is that the number one use case for these speakers
01:20:33.600 | is music music and podcast so fortunately for us it's been important
01:20:39.200 | to these companies to have those use case covered so they
01:20:42.720 | want to spotify on this we have very good relationships with
01:20:45.520 | with them and we're seeing we're seeing tremendous
01:20:50.000 | success with them what what i think it's interesting about them is
01:20:55.200 | it's already working we we we kind of had this epiphany
01:21:01.360 | many years ago back when we started using sonos if you went through all the
01:21:05.280 | trouble of setting up your sonos system you had this magical experience where
01:21:08.800 | you had all the music ever made in your living room and and we we we
01:21:13.440 | made this assumption that the the home everyone used to have a cd
01:21:16.720 | player at home but they never managed to get their files
01:21:19.440 | working in the home having this network attached storage was too cumbersome for
01:21:22.880 | most consumers so we made the assumption that the home
01:21:25.840 | would skip from the cd all the way to the streaming box
01:21:29.040 | where where you would get you would buy the stereo and have all the music built
01:21:32.000 | in that took longer than we thought but with the voice speakers that was the
01:21:35.040 | unlocking that made kind of the connected speaker
01:21:38.480 | happen in the home so so it really it really exploded and
01:21:43.600 | we saw this engagement that we predicted would happen
01:21:47.040 | what i think is interesting though is where it's going from now
01:21:50.320 | right now you think of them as voice speakers but i think if you look at
01:21:54.480 | uh google io for example they just added a camera
01:21:58.480 | to it where you know when the alarm goes off instead of saying
01:22:02.640 | hey google stop you can just wave your hand
01:22:06.320 | so i think they're going to think more of it as a
01:22:09.440 | as an agent or as a as an assistant truly an assistant and an assistant that
01:22:14.320 | can see you it's going to be much more effective than
01:22:16.880 | than a blind assistant so i think these things will morph and we won't
01:22:20.160 | necessarily think of them as quote-unquote voice speakers anymore
01:22:23.920 | just as
01:22:26.320 | interactive access to the internet in the home
01:22:30.080 | but i still think that the biggest use case for those will be
01:22:34.240 | will be audio so for that reason we're investing heavily in it
01:22:37.600 | and we built our own nlu stack to be able to the the challenge here is
01:22:43.680 | how do you innovate in that world it's it's it lowers friction for consumers
01:22:47.280 | but it's also much more constrained there you have no pixels to play with
01:22:50.560 | in an audio only world it's really the
01:22:53.280 | vocabulary that is the interface so we started
01:22:56.880 | investing and playing around quite a lot with that trying to understand
01:22:59.680 | what the future will be of you speaking and gesturing and
01:23:03.200 | waving at your music and actually uh you're actually nudging
01:23:06.880 | closer to the autonomous vehicle space because from everything i've seen the
01:23:11.520 | level of frustration people experience upon failure
01:23:14.640 | of natural language understanding is much higher
01:23:17.760 | than failure in other contexts people get frustrated really fast
01:23:21.680 | so if you screw that experience up even just a little bit they give up really
01:23:26.240 | quickly yeah and i think you see that in the data
01:23:29.680 | while while it's tremendously successful the most common interactions are play
01:23:34.880 | pause and you know next the things where if
01:23:38.320 | you compare it to taking up your phone unlocking it bringing up the app and
01:23:41.200 | skipping clicking skip yeah it was it was much
01:23:44.400 | lower friction but then uh for for longer more
01:23:48.160 | complicated things like can you find me that song
01:23:50.640 | people still bring up their phone and search and then play it on their speaker
01:23:53.360 | so we tried again to build a fault tolerant ui where for the more for the
01:23:57.280 | more complicated things you can still pick up your phone have
01:24:00.400 | powerful full keyboard search and then try to optimize for where there
01:24:04.880 | is actually lower friction and try to it's it's kind of like the
01:24:08.160 | test autopilot thing you have to be at the level where
01:24:11.440 | you're helpful if you're too smart and just in the way people are going to get
01:24:15.360 | frustrated and first of all i'm not obsessed with
01:24:18.480 | stairway to heaven it's just a good song but let me mention that as a use case
01:24:22.320 | because it's an interesting one i've literally told
01:24:26.000 | one of i don't want to say the name of the speaker because it'll when people
01:24:29.120 | are listening to it it'll make their speaker go off but i talk to the
01:24:32.560 | speaker and i say play stairway to heaven and every time
01:24:37.840 | it like not every time but a large percentage of the time plays the wrong
01:24:41.200 | stairway to heaven it plays like some cover of the and
01:24:47.040 | that part of the experience i actually wonder from a business perspective does
01:24:51.280 | spotify control that entire experience or no
01:24:56.320 | it seems like the nlu the the natural language stuff
01:24:59.840 | is controlled by the speaker and then spotify stays at a layer below that
01:25:04.720 | it's a good and complicated question some of which is
01:25:08.800 | dependent on the on the partner so it's hard to comment on the on the specifics
01:25:13.920 | but the question is the right one the
01:25:16.640 | challenge is if you can't use any other
01:25:19.680 | personalization i mean we know which stairway to heaven
01:25:22.400 | and and the truth is maybe for for one person it is exactly the cover that they
01:25:26.480 | want and they would be very frustrated if it
01:25:28.880 | plays i i think we i think we default to the right version but
01:25:32.880 | but you actually want to be able to do the cover for the person that just play
01:25:35.840 | the cover 50 times or spotify is just going to seem stupid
01:25:39.440 | so you want to be able to leverage the personalization but you have this stack
01:25:43.040 | where where you have the the asr and this thing called the end best list of
01:25:47.600 | the end best guesses here and then the person comes in at the
01:25:50.960 | end you actually want the personalization to be here when you're
01:25:53.280 | guessing about what they actually meant so we're working with these partners um
01:25:57.840 | and it's a complicated it's a complicated thing where
01:26:02.240 | you want to you want to be able so first of all you want to be very careful with
01:26:05.920 | your users data you don't want to share your users data without their permission
01:26:09.200 | but you want to share some data so that their experience gets better
01:26:12.240 | um so that these partners can understand enough but not too much and so forth
01:26:16.400 | so it's really the the trick is that it's like a business
01:26:20.720 | driven relationship where you're doing product development across companies
01:26:23.760 | together yeah which is which is really
01:26:25.840 | complicated but this is exactly why we built our own
01:26:29.360 | nlu so that we actually can make personalized guesses because this is the
01:26:34.160 | biggest frustration from a user point of view they don't
01:26:36.640 | understand about asrs and nbest lists and
01:26:39.280 | and business deals they're like how hard can it be i've told this thing
01:26:42.720 | 50 times this version and still it plays the wrong thing it can't it can't be
01:26:46.000 | hard so we try to take that user approach if
01:26:48.800 | the user the user is not going to understand the
01:26:51.280 | complications of business we have to solve it let's talk
01:26:55.280 | about sort of a complicated subject that i myself i'm quite
01:27:01.520 | torn about the idea sort of of um
01:27:06.400 | paying artists right i saw as of august 31st
01:27:11.920 | 2018 over 11 billion dollars were paid to rights holders
01:27:17.200 | so and further distributed to artists from spotify
01:27:21.280 | so a lot of money is being paid to artists first of all
01:27:25.520 | the whole time as a consumer for me when i look at spotify
01:27:29.680 | i'm not sure i'm remembering correctly but i think you said exactly how i feel
01:27:33.760 | which is this is too good to be true like
01:27:38.400 | when i started using spotify i assumed you guys would go bankrupt in like a
01:27:42.160 | month it's like this is too good a lot of
01:27:44.720 | people did
01:27:47.360 | it's like this is amazing uh so one question i have is sort of the
01:27:52.960 | bigger question how do you make money in this complicated world
01:27:56.320 | how do you deal with the relationship with record labels who
01:28:02.400 | are complicated uh these big you're essentially in have the task
01:28:09.440 | of herding cats but like rich and powerful cats
01:28:16.080 | and also have the task of paying artists enough and paying
01:28:20.080 | those labels enough and still making money in the internet space where people
01:28:24.320 | are not willing to pay hundreds of dollars a month so how do
01:28:29.200 | you navigate the space how do you navigate that's a beautiful
01:28:32.160 | description herding rich cats yeah i've never heard that before
01:28:36.720 | now it is very complicated and i think uh
01:28:39.760 | certainly actually betting against spotify has been statistically a very
01:28:44.080 | smart thing to do just looking at the at the line of roadkill in music
01:28:48.640 | streaming services um it's it's kind of i think if i had
01:28:54.160 | understood the complexity when i joined spotify
01:28:57.440 | unfortunately fortunately i didn't know enough about
01:29:00.800 | the the music industry to understand the complexities because then i would have
01:29:03.920 | made a more rational guess that it wouldn't work
01:29:06.240 | so you know ignorance is bliss but i think
01:29:11.200 | there have been a few distinct challenges i think as i said one of the
01:29:15.760 | things that made it work at all was that sweden and the nordics
01:29:19.040 | was a lost market so um there were you know there was there was no risk
01:29:23.760 | for labels to try this i don't think it would have worked if
01:29:27.680 | if the market was uh was healthy so so that was the initial condition then
01:29:34.480 | then we had this tremendous challenge with the model itself so
01:29:38.400 | now most people were pirating but for the people who bought a download or a cd
01:29:43.600 | the artists would get all the revenue for all the future plays
01:29:47.600 | then right so you got it all up front whereas the streaming model was like
01:29:51.360 | almost nothing day one almost nothing day two
01:29:53.440 | and then at some point this curve of incremental revenue
01:29:57.600 | would intersect with your day one payment and that took a long time to
01:30:01.520 | play out before before um the music labels they understood
01:30:06.080 | that but on the artist side it took a lot of time to understand that actually
01:30:09.920 | if i have a big hit that is going to be played for for for many years this is a
01:30:13.040 | much better model because i get paid based on how much
01:30:16.000 | people use the product not how much they thought they would use
01:30:18.880 | it day one or so forth so it was a complicated model to get
01:30:23.040 | across and but time helped with that right and
01:30:25.440 | now now the revenues to the music industry
01:30:28.720 | actually are bigger again then you know it's gone through this
01:30:31.760 | incredible dip and now they're back up and so we're
01:30:33.920 | very we say proud of having having been a
01:30:36.960 | part of that um so there have been distinct problems
01:30:40.560 | i think when it comes to the to the labels
01:30:45.200 | we have taken the painful approach some of our competition at the time they kind
01:30:49.920 | of they kind of looked at other
01:30:52.640 | companies and said if we just if we just ignore the rights
01:30:55.840 | we get really big really fast we're going to be too big for the
01:30:59.600 | for the labels to kind of too big to fail they're not going to kill us we
01:31:03.040 | didn't take that approach we went legal from day one
01:31:06.080 | and we we negotiated and negotiated and negotiated it was very slow it's very
01:31:09.680 | frustrating we were angry at seeing other companies
01:31:12.240 | taking shortcuts and seeming to get away with it
01:31:14.720 | it was this this this game theory thing where over many rounds of playing the
01:31:18.800 | game this would be the right strategy and even
01:31:21.760 | though clearly there's a lot of frustrations
01:31:24.720 | at times during renegotiations there is this there is this weird trust
01:31:29.200 | where we have been honest and fair we've never screwed them they've never
01:31:34.960 | screwed us it's tenuous but there's this trust and like they know
01:31:39.120 | that if music doesn't get really big if lots
01:31:42.800 | of people do not want to listen to music and want to pay for it
01:31:45.360 | spotify has no business model so we actually are incredibly
01:31:48.640 | aligned right other companies not to be tennis but other companies have other
01:31:53.200 | business models where even if they made no music from no money for music
01:31:56.800 | they'd still be profitable companies but spotify won't so and i think the
01:32:00.400 | industry sees that we are actually aligned business-wise
01:32:05.200 | so there is this this trust that allows us to
01:32:08.800 | to do product development even if it's scary
01:32:11.920 | um you know taking risks the free model itself
01:32:15.840 | was an incredible risk for the music industry to take that they should get
01:32:19.680 | credit for now some of it was that they had nothing to lose in sweden but
01:32:22.400 | frankly a lot of the labels also took risk and so i
01:32:26.160 | think we built up that trust with it with the i think uh hurting
01:32:30.000 | with cats sounds a bit what's the word it sounds
01:32:33.680 | like yeah dismissive of the cats dismissive
01:32:35.920 | no every cat mattered they're all beautiful and very important
01:32:39.360 | exactly they've taken a lot of risks and certainly it's been frustrating a lot of
01:32:43.520 | good yeah so it's it's it's really like
01:32:46.400 | playing it's it's game theory if you play the
01:32:49.360 | if you play the game many times then you can have the statistical outcome that
01:32:53.760 | you bet on and it feels very painful when you're in the middle of that
01:32:57.040 | thing i mean there's risk there's trust there's relationships
01:33:00.560 | from uh just having read the biography of steve jobs
01:33:05.040 | similar kind of relationships were discussed in itunes
01:33:08.400 | the idea of selling a song for a dollar was very uncomfortable
01:33:12.080 | for labels and exactly and there was no it was the same kind of thing it was
01:33:16.800 | trust it was game theory as as a lot of
01:33:20.160 | relationships that had to be built and uh it's really a terrifyingly
01:33:24.560 | difficult process that apple could go through a
01:33:28.720 | little bit because they could afford for that process to fail for
01:33:32.960 | spotify it seems terrifying because uh you can't initially i think a lot of it
01:33:39.600 | comes out comes down to you know honestly daniel and his tenacity
01:33:43.120 | in in negotiating which seems like an impossible
01:33:45.920 | it's a fun task because you know he was completely unknown and so forth but
01:33:50.880 | maybe that was also the reason that that it worked
01:33:56.240 | but i think uh
01:33:59.280 | yeah i think game theory is probably the best way to think about it you could
01:34:03.360 | straight go straight for this like nash equilibrium that
01:34:06.320 | someone is going to defect or or you play many times you try to actually
01:34:10.400 | go for the top left the corporations sell is there any magical reason why
01:34:16.880 | spotify seems to have won this so a lot of people have tried to do
01:34:22.080 | what spotify tried to do and spotify has come out well so the
01:34:26.320 | answer is that there's no magical reason because i don't believe in magic
01:34:30.000 | but i think there are there are reasons um
01:34:33.520 | and i think some of them are that people have
01:34:37.440 | misunderstood a lot of what we actually do
01:34:41.440 | the actual the actual spotify model is very complicated they've looked at the
01:34:45.360 | premium model and said it seems like you can you can
01:34:48.800 | charge 9.99 for music and people are going to pay but that's
01:34:52.400 | not what happened actually when we launched the original mobile product
01:34:55.520 | everyone said they would never pay what happened was they started on the on
01:34:59.280 | the free product and then their engagement grew so much
01:35:02.640 | that eventually they said maybe it is worth 9.99 right
01:35:06.720 | it's uh it's your propensity to pay grows with your engagement
01:35:10.080 | so we have this super complicated business model where you operate two
01:35:13.600 | different business model advertising and premium at the same time
01:35:16.800 | and i think that is hard to replicate i have i struggle to think of other
01:35:20.240 | companies that run large-scale advertising and
01:35:22.960 | subscription products at the same time so i think the business model is
01:35:26.960 | actually much more complicated than people
01:35:29.040 | think it is and and so some people went after just the premium part without the
01:35:33.360 | free part and ran into a wall where no one wanted
01:35:36.400 | to pay some people went after just music
01:35:39.440 | music should be free just ads which doesn't give you enough revenue and
01:35:42.800 | doesn't work for the music industry so i think that combination is um it's
01:35:46.960 | kind of opaque from the outside so maybe i shouldn't say it here and
01:35:49.920 | reveal the secret but that that turns out to be harder to
01:35:53.200 | replicate than you would think so there's a lot of
01:35:57.120 | brilliant business strategy here brilliance or luck probably more luck
01:36:02.480 | but it doesn't really matter it looks brilliant in retrospect
01:36:05.520 | let's call it brilliant yeah when the books are written it'll be brilliant
01:36:10.480 | you've uh mentioned that your philosophy is to embrace change
01:36:16.560 | so how will the music streaming and music listening world change over the
01:36:21.920 | next 10 years 20 years you look out into the
01:36:25.600 | far future what do you think i think that music and
01:36:30.480 | for that matter audio podcasts audio books i think it's
01:36:34.960 | one of the few core human needs i think it there is no good reason to
01:36:39.440 | me why it shouldn't be at the scale of something like
01:36:42.320 | messaging or social networking i don't think it's a niche thing
01:36:46.000 | to listen to music or news or something so i think scale is obviously one of the
01:36:49.680 | things that i really hope for i think i hope that it's going to be billions of
01:36:54.080 | users i hope eventually everyone in the world gets access to all
01:36:57.200 | the world's music ever made so obviously i think it's going to be a
01:37:00.320 | much bigger business otherwise we we wouldn't be betting this big
01:37:04.000 | uh now if you if you look more at how it is consumed what i'm hoping is back
01:37:11.280 | to this analogy of the software tool chain
01:37:15.920 | where i think i sometimes uh internally i make this analogy to
01:37:21.840 | to text messaging text messaging was also based on
01:37:26.800 | standards in the in the area of mobile carriers you had the sms
01:37:30.640 | the 140 character 120 carat sms and it was great because everyone
01:37:36.160 | agreed on the standard so as a consumer you got a lot of distributions and
01:37:39.280 | interoperability but it was a very constrained format
01:37:42.560 | and and when the industry wanted to add pictures to that format to do the mms
01:37:46.320 | i looked it up and i think it took from the late 80s to early 2000s this is like
01:37:50.160 | a 15 20 year product cycle to bring pictures
01:37:52.640 | into that now once that entire value chain of
01:37:58.080 | creation and consumption got wrapped in one software stack
01:38:01.200 | within something like snapchat or whatsapp
01:38:04.560 | like the first week they added disappearing messages like then two
01:38:07.680 | weeks later they added stories like the pace of
01:38:10.160 | innovation when you're on one software stack
01:38:12.160 | and you can you can you can affect both creation and consumption
01:38:16.000 | i think it's going to be rapid so with these streaming services we now for the
01:38:19.360 | first time in history have enough i hope people on one of these
01:38:24.560 | services actually whether it's spotify or amazon or apple or youtube
01:38:28.000 | and hopefully enough creators that you can actually start working
01:38:31.440 | with the format again and and that excites me
01:38:33.760 | i think being able to change these constraints from 100 years
01:38:37.120 | that could really that could really do something interesting i don't i really
01:38:40.640 | hope it's not just going to be the iteration on on the same thing for the
01:38:45.360 | next 10 to 20 years as well yeah changing the creation of music a
01:38:49.520 | creation of audio creation of podcast is a really fascinating possibility i
01:38:54.640 | myself don't understand what it is about podcasts that's so
01:38:58.560 | intimate it just is i listen to a lot of podcasts i think
01:39:02.880 | it touches on a human on a deep human need
01:39:07.120 | for connection that people do feel like they're connected
01:39:11.520 | to when they listen i don't understand what the psychology of that is
01:39:15.840 | but in this world is becoming more and more disconnected
01:39:20.720 | it feels like this is fulfilling a certain kind of need
01:39:24.800 | and uh empowering the creator as opposed to just the listener
01:39:29.200 | it's really interesting that's a this i'm really excited that you're working
01:39:34.000 | on this yeah i think one of the things that is inspiring for our teams to work
01:39:36.960 | on podcast is exactly that whether you think like i
01:39:40.720 | like i probably do that it's something biological about
01:39:44.320 | perceiving to be in the middle of the conversation that makes you listen in a
01:39:47.120 | different way it doesn't really matter people seem to
01:39:49.280 | perceive it differently and uh there was this narrative for a long
01:39:52.480 | time that you know if you look at video everything kind of in the foreground it
01:39:56.480 | got shorter and shorter and shorter because of financial pressures and
01:40:00.000 | monetization and so forth and eventually at the end there's always
01:40:03.280 | like 20 seconds clip people just screaming
01:40:06.320 | something and and uh i'm really i feel really good
01:40:11.120 | about the fact that you you could have
01:40:14.080 | interpreted that as people have no attention span anymore
01:40:16.880 | they don't want to listen to things they're not interested in deeper stories
01:40:21.040 | like you know people are people are getting dumber but then podcast came
01:40:24.160 | along and it's almost like no no the need still existed
01:40:27.440 | once but maybe maybe it was the fact that you're not prepared to look at your
01:40:31.520 | phone like this for two hours but if you can drive at the same time it
01:40:35.040 | seems like people really want to dig deeper
01:40:37.280 | and they want to hear like the more complicated version so to me that is
01:40:40.720 | very inspiring that that podcast is actually long form it
01:40:43.840 | gives me a lot of hope for for humanity that people seem really
01:40:47.440 | interested in hearing deeper more complicated conversations
01:40:50.640 | this is uh i don't understand it it's fascinating so the majority
01:40:55.920 | for this podcast listen to the whole thing this whole conversation we've been
01:40:59.920 | talking for an hour and 45 minutes and somebody will i mean
01:41:04.640 | most people will be listening to these words i'm speaking right now you
01:41:07.520 | wouldn't have thought that 10 years ago with where the world seemed
01:41:10.720 | to go that's very positive i think that's really exciting and
01:41:14.160 | empowering the creator in there is is really exciting
01:41:18.400 | last question you also have a passion for just
01:41:22.000 | mobile in general how do you see the smartphone world
01:41:27.200 | this the digital space of uh of smartphones and just everything
01:41:34.320 | that's on the move whether it's uh internet of things and
01:41:37.840 | so on changing over the next 10 years and so
01:41:42.000 | on i think that one way to think about it
01:41:44.800 | is that computing might be moving out of these
01:41:49.840 | multi-purpose devices the computer we had in the phone
01:41:53.440 | into specific you know specific purpose devices and you know it will be ambient
01:41:58.560 | that you know at least in my home you just
01:42:02.160 | shout something at someone and there's always like one of these speakers close
01:42:04.960 | enough and so you start behaving differently it's as
01:42:09.520 | if you have the internet ambient ambiently around you and you can
01:42:12.960 | ask it things
01:42:15.680 | so i think computing will kind of get more integrated and we
01:42:20.480 | won't necessarily think of it as as connected to a device in the same
01:42:24.880 | thing in the same way that we do today i don't know the the path to that maybe
01:42:30.240 | we used to have these desktop computers and then we partially replaced that
01:42:34.800 | with the with the laptops and left you know we
01:42:37.440 | had desktop at home and at work and then we got these phones and we started
01:42:40.640 | leaving the the laptop at home for a while and maybe the
01:42:44.560 | maybe for stretches of time you're going to start using the watch and you can
01:42:47.280 | leave your your phone at home like for a run or
01:42:49.680 | something and you know we're on this progressive path where
01:42:54.720 | you i think what what is happening with the voice
01:42:58.000 | is that you have an you have an interactive
01:43:01.760 | interaction paradigm that doesn't require as large physical devices so i
01:43:07.200 | definitely think there's a future where you can have your your airpods and and
01:43:12.160 | your watch and you can do a lot of computing and
01:43:16.720 | i i don't think it's going to be this binary thing i think it's going to be
01:43:20.720 | like many of us still have a laptop we just use it less
01:43:24.000 | and so you shift your your consumption over and
01:43:28.400 | i don't know about ar glasses and so forth i'm excited about i spent a lot
01:43:33.200 | of time in that area but i still think it's quite far away
01:43:35.760 | ar vr all yes vr is is happening and working i think the the recent
01:43:41.360 | oculus quest is quite impressive i think ar is further away at least that type of
01:43:46.240 | ar i think but i do think
01:43:50.800 | your phone or watch or glasses understanding where you are and maybe
01:43:54.560 | what you're looking at and being able to give you audio cues about that or you
01:43:57.200 | can say like what is this and it tells you what it is that i
01:44:01.200 | think might happen you know you use your your watch or your glasses as a as a
01:44:06.320 | mouse pointer on reality i think it might be a while before i
01:44:09.600 | might be wrong i hope i'm wrong but i think it might be a while before we walk
01:44:12.160 | around with these big like lab glasses that
01:44:14.480 | project things i agree with you there's a it's actually really difficult when you
01:44:18.160 | have to understand the physical world enough to
01:44:22.400 | uh project onto it well i lied about the last question uh
01:44:28.240 | because i just thought of audio and my favorite topic which is the
01:44:33.360 | movie her do you think
01:44:37.760 | whether it's part of spotify or not we'll have
01:44:41.760 | i don't know if you've seen the movie her absolutely
01:44:45.680 | and uh their audio is the primary form of interaction
01:44:51.360 | and the connection with another entity that you can actually have a
01:44:55.520 | relationship with actually fall in love with
01:44:57.680 | based on voice alone audio alone do you how far do you think that's possible
01:45:02.640 | first of all based on audio alone to fall in love with somebody
01:45:05.360 | somebody or well yeah let's go with somebody just
01:45:08.640 | have a relationship based on audio alone and second
01:45:12.480 | question to that can we create an artificial intelligence system
01:45:16.800 | that allows one to fall in love with it
01:45:20.160 | and her him with you so this is my personal
01:45:24.080 | personal answer uh speaking for me as a person
01:45:28.400 | the answer is quite unequivocally yes on on both i think what we just said
01:45:33.840 | about podcasts and the feeling of being in the middle of
01:45:36.480 | a conversation if you could have an assistant where
01:45:41.360 | and we just said that feels like a very personal setting so if you walk around
01:45:44.720 | with these headphones and this thing you're speaking
01:45:47.120 | with this thing all of the time that feels like it's in your brain i
01:45:50.720 | think it's it's going to be much easier to fall in
01:45:53.520 | love with than something that would be on your screen
01:45:55.440 | i think that's entirely possible and then from the you can probably answer
01:45:59.120 | this better than me but from the concept of if it's going to be
01:46:02.640 | possible to build a machine that that can achieve
01:46:06.720 | that i think whether you whether you think of it as a if you can
01:46:10.480 | fake it the philosophical zombie that it assimilates it enough or it somehow
01:46:14.640 | actually is i think there's it's only question if you if you ask
01:46:19.040 | me about time i'd have a different answer but if you say i've given
01:46:22.320 | some half infinite time absolutely i think it's just
01:46:26.560 | atoms and arrangement of information well i personally think that love is a
01:46:31.760 | lot simpler than people think so we started with true romance and
01:46:36.560 | ended in love i don't see a better place to end beautiful
01:46:40.320 | gustav thanks so much for talking today thank you so much it was a lot of fun
01:46:43.200 | it was fun
01:47:01.200 | [BLANK_AUDIO]