back to indexHow NotebookLM Was Made
Chapters
0:0 Introductions
1:39 From Project Tailwind to NotebookLM
9:25 Learning from 65,000 Discord members
12:15 How NotebookLM works
18:0 Working with Steven Johnson
23:0 How to prioritize features
25:13 Structuring the data pipelines
29:50 How to eval
34:34 Steering the podcast outputs
37:51 Defining speakers personalities
39:4 How do you make audio engaging?
45:47 Humor is AGI
51:38 Designing for non-determinism
53:35 API when?
55:5 Multilingual support and dialect considerations
57:50 Managing system prompts and feature requests
60:58 Future of NotebookLM
64:59 Podcasts for your codebase
67:16 Plans for real-time chat
68:27 Wrap up
00:00:00.000 |
Hey everyone, we're here today as guests on Latent Space. 00:00:06.360 |
They've had some great guests on this show before. 00:00:10.400 |
the hosts of another podcast, join as guests. 00:00:13.200 |
- I mean, a huge thank you to Swix and Alessio 00:00:16.500 |
for the invite, thanks for having us on the show. 00:00:18.320 |
- Yeah, really, it seems like they brought us here 00:00:19.880 |
to talk a little bit about our show, our podcast. 00:00:22.600 |
- Yeah, I mean, we've had lots of listeners ourselves, 00:00:26.280 |
- Oh yeah, we've made a ton of audio overviews 00:00:45.880 |
and bringing you even better options in the future. 00:00:52.800 |
- Hey everyone, welcome to the Latent Space Podcast. 00:00:58.600 |
and I'm joined by Nicole Swicks, founder of Small.ai. 00:01:03.880 |
with our special guests, Ryzen Martin and Usama, 00:01:14.480 |
- So AI podcasters meet human podcasters, always fun. 00:01:27.560 |
to the audio overviews that people have been making. 00:01:32.880 |
You know, what is your path into the sort of Google AI org 00:01:41.560 |
I lead the Notebook LM team inside of Google Labs. 00:01:45.240 |
So specifically that's the org that we're in. 00:01:49.960 |
And our whole mandate is really to build AI products. 00:01:56.280 |
Our entire thing is just like try a bunch of things 00:02:05.280 |
and I worked in ads right before and then startups. 00:02:07.800 |
I tell people like at every time that I changed orgs, 00:02:13.800 |
Like specifically like in between ads and payments, 00:02:25.840 |
I was like, oh, these people are really cool. 00:02:27.520 |
I don't know if I'm like a super good fit with this space, 00:02:34.080 |
And then I worked on like zero to one features 00:02:38.480 |
But then the time came again where I was like, 00:02:54.840 |
because especially with the recent success of Notebook LM, 00:03:06.560 |
We do sort of the data center supply chain planning stuff. 00:03:10.200 |
Google has like the largest sort of footprint. 00:03:11.960 |
Obviously there's a lot of management stuff to do there. 00:03:14.240 |
But then there was this thing called Area 120 at Google, 00:03:19.520 |
But I sort of wanted to do like more zero to one building 00:03:23.320 |
and landed a role there where we're trying to build 00:03:25.560 |
like a creator commerce platform called Kaya. 00:03:43.680 |
and do it in the wild and sort of co-create and all of that. 00:03:47.040 |
So yeah, we've just been trying a bunch of different things 00:03:53.920 |
Let's talk about the brief history of NotebookLM. 00:03:57.080 |
You had a tweet, which is very helpful for doing research. 00:04:22.920 |
- I wasn't, that's how you like had the basic prototype 00:04:30.320 |
And I remember, I was like, wow, this is crazy. 00:04:39.240 |
But at the same time, my manager at the time, Josh, 00:04:42.000 |
he was like, "Hey, but I want you to really think about 00:04:55.120 |
that was working on a project called Talk to Small Corpus. 00:05:11.280 |
Like I went to college while I was working a full-time job. 00:05:15.840 |
this would have really helped me with my studying, right? 00:05:24.400 |
We took a lot of like the Talk to Small Corpus prototypes 00:05:27.000 |
and I showed it to a lot of like college students, 00:05:32.960 |
Like I didn't even have to explain it to them. 00:05:35.200 |
And we just continued to iterate the prototype from there 00:05:55.600 |
And it really was just like a way for us to describe 00:05:58.520 |
the amount of data that we thought like could be, 00:06:02.280 |
- Yeah, but even then, you're still like doing rag stuff 00:06:04.760 |
because, you know, the context lens back then 00:06:12.360 |
we were building the prototypes and at the same time, 00:06:14.960 |
I think like the rest of the world was, right? 00:06:17.160 |
We were seeing all of these like chat with PDF stuff 00:06:19.600 |
come up and I was like, "Come on, we gotta go." 00:06:21.680 |
Like we have to like push this out into the world. 00:06:30.760 |
- Was the initial product just text-to-speech 00:06:33.800 |
or were you also doing kind of like a synthesizing 00:06:38.120 |
Or were you just helping people read through it? 00:06:59.400 |
So as part of the first thing that we launched, 00:07:05.360 |
So you could chat with the doc just through text 00:07:07.960 |
and it would automatically generate a summary as well. 00:07:12.840 |
It would also generate the key topics in your document. 00:07:15.920 |
And it could support up to like 10 documents. 00:07:23.880 |
- And then what was the discussion from there 00:07:27.320 |
Is there any maybe intermediate step of the product 00:07:30.760 |
that people missed between this was launch or? 00:07:33.600 |
- It was interesting because every step of the way, 00:07:35.760 |
I think we hit like some pretty critical milestones. 00:07:40.240 |
I think there was so much excitement of like, 00:07:41.840 |
"Wow, what is this thing that Google is launching?" 00:07:47.640 |
That's actually when we also launched the Discord server, 00:07:50.400 |
which has been huge for us because for us in particular, 00:07:56.680 |
was to be able to launch features and get feedback ASAP. 00:08:01.480 |
like I want to hear what they think right now. 00:08:04.960 |
And the Discord has just been so great for that. 00:08:07.160 |
But then we basically took the feedback from I/O. 00:08:13.680 |
We added sort of like the ability to save notes, 00:08:15.880 |
write notes, we generate follow-up questions. 00:08:27.720 |
We rolled out to over 200 countries and territories. 00:08:33.280 |
both in the UI and like the actual source stuff. 00:08:38.080 |
there was like an explosion of like users in Japan. 00:08:41.200 |
This was super interesting in terms of just like 00:08:47.600 |
I have to read all of these rules in English, 00:08:54.760 |
Like with LLMs, you kind of get this natural, 00:08:59.680 |
and you can ask in your sort of preferred mode. 00:09:01.960 |
And I think that's not just like a language thing too. 00:09:06.120 |
I do this test with Wealth of Nations all the time, 00:09:07.960 |
'cause it's like a pretty complicated text to read. 00:09:10.400 |
- The Evan Smith classic, it's like 400 pages this thing. 00:09:12.560 |
- Yeah, but I like this test 'cause I'm like, 00:09:25.840 |
- I just checked in on a Notebook LM Discord, 65,000 people. 00:09:29.520 |
- Crazy, just like for one project within Google. 00:09:32.640 |
It's not like, it's not labs, it's just Notebook LM. 00:09:50.840 |
or there's just an influx of people being like, 00:10:08.200 |
I think the second thing is really the use cases. 00:10:11.600 |
I was like, hey, I have a hunch of how people will use it, 00:10:16.800 |
not just the context of like the use of Notebook LM, 00:10:23.640 |
Especially people who actually have trouble using it, 00:10:31.400 |
Like what was your problem that was like so worth solving? 00:10:34.600 |
The third thing is also just hearing sort of like 00:10:37.120 |
when we have wins and when we don't have wins, 00:10:39.480 |
because there's actually a lot of functionality 00:10:45.840 |
As part of having this sort of small project, right, 00:10:50.960 |
So it's not just about just like rolling things out 00:10:57.160 |
Like hopefully we get to a place where it's like, 00:10:58.640 |
there's just a really strong core feature set 00:11:00.560 |
and the things that aren't as great, we can just unlaunch. 00:11:04.440 |
- I'm in the process of unlaunching some stuff. 00:11:10.880 |
that you could highlight the text in your source passage 00:11:17.000 |
And it was like a very complicated piece of our architecture 00:11:24.040 |
So we were like, okay, let's do a 50/50 sunset of this thing 00:11:33.720 |
that lets you feature flag these things easily? 00:11:42.040 |
- Yeah, as a PM, like this is your number one tool, right? 00:11:49.240 |
- Yeah, I mean, we just run Mendel experiments 00:11:54.160 |
but on Twitter, somebody was able to get around our flags 00:11:58.440 |
They were like, "Check out what the Notebook LM team 00:12:03.000 |
And I was at lunch with the rest of the team. 00:12:17.320 |
but I don't think we need to do it on the podcast now. 00:12:21.720 |
- Can we just talk about what's behind the magic? 00:12:30.000 |
I know you might not be able to share everything, 00:12:34.440 |
How do you take the data and put it in the model? 00:12:54.800 |
we were building this thing sort of outside Notebook LM 00:12:58.600 |
Like just the idea is like content transformation, right? 00:13:03.200 |
Like everyone knows that everyone's been poking at it, 00:13:08.640 |
And like one of the ways we thought was like, okay, 00:13:12.480 |
people learn better when they're hearing things, 00:13:21.920 |
into the realm of like, maybe we try like, you know, 00:13:24.840 |
two people are having a conversation kind of format. 00:13:35.960 |
tried out like a few different sort of sources. 00:13:38.520 |
The main idea was like, go from some sort of sources 00:13:41.280 |
and transform it into a listenable, engaging audio format. 00:13:46.560 |
we like unlocked a bunch more sort of learnings. 00:13:53.760 |
because like the information density is getting unrolled 00:14:01.720 |
and they're both technically like AI personas, right? 00:14:04.280 |
That have different angles of looking at things 00:14:07.160 |
and like, they'll have a discussion about it. 00:14:23.480 |
like anything that they've written themselves. 00:14:29.480 |
like we work with the DeepMind audio folks pretty closely. 00:14:45.240 |
So we sort of like generally put those things together 00:14:48.680 |
in a way that we could reliably produce the audio. 00:14:52.760 |
- I would add like, there's something really nuanced, 00:14:59.840 |
where if it's just reading an actual text response, 00:15:05.440 |
I do it all the time with like reading my text messages 00:15:18.120 |
And it's really hard to consume content in that way. 00:15:23.800 |
hey, it's actually just like, it's fine for like short stuff 00:15:26.520 |
like texting, but even that, it's like not that great. 00:15:29.440 |
So I think the frontier of experimentation here 00:15:31.960 |
was really thinking about there is a transform 00:15:39.640 |
Or here's like a hundred page slide deck or something. 00:15:47.480 |
And I think this is where like that two-person persona, 00:15:52.640 |
they have takes on the material that you've presented, 00:15:56.440 |
that's where it really sort of like brings the content 00:16:04.240 |
is like, you don't actually know what's going to happen 00:16:06.920 |
when you press generate, you know, for better or for worse, 00:16:09.280 |
like to the extent that like people are like, 00:16:10.880 |
no, I actually want it to be more predictable now. 00:16:22.400 |
And I think I've seen enough of these where I'm like, 00:16:26.000 |
Like you knew I was going to say like something really cool. 00:16:30.320 |
I think we want to try to preserve as much of that wow, 00:16:34.080 |
because I do think like exposing like all the knobs 00:16:40.480 |
It's like, hey, is that like the actual thing? 00:16:45.720 |
- Have you found differences in having one model 00:16:50.120 |
and then using text-to-speech to kind of fake two people? 00:16:52.800 |
Or like, are you actually using two different 00:16:55.600 |
kind of system prompts to like have a conversation 00:17:00.800 |
if persona system prompts make a big difference 00:17:05.760 |
- I guess like generally we use a lot of inference 00:17:08.960 |
as you can tell with like the spinning thing takes a while. 00:17:14.080 |
of different things happening under the hood. 00:17:17.440 |
and they have their sort of drawbacks and benefits. 00:17:23.880 |
like the two different personas, like persist throughout 00:17:27.440 |
It's like, there's a bit of like imperfection in there. 00:17:30.880 |
Like we had to really lean into the fact that like 00:17:39.960 |
Like that was sort of like what we need to diverge from. 00:17:42.840 |
most chatbots will just narrate the same kind of answer, 00:17:46.360 |
like given the same sources for the most part, 00:17:49.640 |
So yeah, there's like experimentation there under the hood, 00:17:52.680 |
like with the model to like make sure that it's spitting 00:17:54.960 |
out like different takes and different personas 00:18:00.760 |
- Yeah, I think Steven Johnson, I think he's on your team. 00:18:10.600 |
So Steven joined actually in the very early days, 00:18:13.320 |
I think before it was even a fully funded project. 00:18:22.480 |
Steven is a New York Times bestselling author 00:18:30.120 |
just like a true sort of celebrity by himself. 00:18:35.120 |
I want to come here and I want to build the thing 00:18:46.720 |
Like, you seem to be doing great on your own. 00:18:52.600 |
And aside from like providing a lot of inspiration, 00:18:55.520 |
to be honest, like when I watched Steven work, 00:18:58.000 |
I was like, oh, nobody works like this, right? 00:19:02.760 |
Like he is such a dedicated like researcher and journalist 00:19:21.400 |
I was like, oh, I could definitely use like a mini Steven, 00:19:26.800 |
And then I thought very quickly about like the adjacent roles 00:19:29.480 |
that could use sort of this like research and analysis tool. 00:19:33.000 |
And so aside from being, you know, chief dreamer, 00:19:46.480 |
- Did you make him express his thoughts while he worked 00:19:56.760 |
- Yeah, this is a part of the PM toolkit, right? 00:20:07.480 |
And I did the same thing with students all the time. 00:20:12.760 |
I would ask them like, oh, how do you feel now? 00:20:18.360 |
Or why are you upset about like this particular thing? 00:20:20.240 |
Why are you cranky about this particular topic? 00:20:22.760 |
And it was very similar, I think, for Steven, 00:20:36.960 |
he was doing this sort of like self-questioning, right? 00:20:40.080 |
Like now we talk about like chain of, you know, 00:20:50.520 |
And to be able to bring sort of that expertise in a way 00:20:53.720 |
that was like, you know, maybe like costly inference wise, 00:20:56.520 |
but really have like that ability inside of a tool 00:20:58.680 |
that was like, for starters, free inside of Notebook LM, 00:21:05.120 |
- So did he just commit to using Notebook LM for everything? 00:21:12.720 |
Like in the beginning, there was no product for him to use. 00:21:15.040 |
And so he just kept describing the thing that he wanted. 00:21:17.240 |
And then eventually like we started building the thing 00:21:24.240 |
is he uses the product in ways where it kind of does it, 00:21:30.920 |
at like the absolute max limit of this thing. 00:21:34.360 |
But the way that he describes it is so full of promise 00:21:40.040 |
And all I have to do is sort of like meet him there 00:21:42.360 |
and sort of pressure test whether or not, you know, 00:21:44.480 |
everyday people want it and we just have to build it. 00:21:47.000 |
- I would say OpenAI has a pretty similar person, 00:21:51.000 |
It's very similar, like just from the writing world 00:21:53.240 |
and using it as a tool for thought to shape Chachabitty. 00:22:00.440 |
I'm looking at my Notebook LM now, I've got two sources. 00:22:10.480 |
- Yes, and he has like a higher limit than others. 00:22:14.440 |
- Oh yeah, like I don't think Stephen even has a limit. 00:22:17.480 |
- And he has Notes, Google Drive stuff, PDFs, MP3, whatever. 00:22:24.400 |
is he has actually PDFs of like handwritten Marie Curie notes. 00:22:28.840 |
- I see, so you're doing image recognition as well. 00:22:39.560 |
And it's like, here's how I'm using it to analyze it. 00:22:41.680 |
And I'm using it for like this thing that I'm writing. 00:22:48.480 |
And I think even like when I listened to Stephen's demo, 00:22:55.840 |
And so there's a lot of work still for us to build 00:23:03.280 |
Because I look at all the steps that he had to take 00:23:06.000 |
And I'm like, okay, that's product work for us, right? 00:23:17.000 |
How do you think about adding support for like data sources 00:23:21.520 |
and like supporting more esoteric types of inputs? 00:23:25.440 |
- So I think about the product in three ways, right? 00:23:31.360 |
of like what you could do with those sources. 00:23:34.640 |
which is how do you output it into the world? 00:23:45.080 |
but even basic things like Doc X or like PowerPoint, right? 00:23:51.600 |
"Hey, my professor actually gave me everything in Doc X. 00:24:00.040 |
Like there's just a really long roadmap for sources 00:24:06.440 |
and I think this is like one of the most interesting things 00:24:13.480 |
which is like, "Hey, when did this thing launch? 00:24:20.840 |
is because they're trying to make something new. 00:24:25.320 |
like a lot of the features we're experimenting with 00:24:28.840 |
And so you can imagine that people care a lot 00:24:31.440 |
about the resources that they're putting into Notebook LM 00:24:33.880 |
'cause they're trying to create something new. 00:24:35.920 |
So I think equally as important as the source inputs 00:24:39.320 |
are the outputs that we're helping people to create. 00:24:49.640 |
And that's like one of the most compelling use cases 00:24:55.920 |
and then one-click new documents out of it, right? 00:24:59.040 |
And I think that's something that people think is like, 00:25:05.080 |
Like to do it in your style, in your brand, right? 00:25:14.160 |
Any comments on the engineering side of things? 00:25:17.440 |
I was mostly working on building the text to audio, 00:25:20.600 |
which kind of lives as a separate engineering pipeline 00:25:25.160 |
But I think there's probably tons of Notebook LM 00:25:27.320 |
engineering war stories on dealing with sources. 00:25:30.160 |
And so I don't work too closely with engineers directly, 00:25:34.360 |
to like Gemini's native understanding of images really well, 00:25:39.280 |
- Yeah, I think on the engineering and modeling side, 00:25:41.440 |
I think we are a really good example of a team 00:25:46.960 |
and we're getting a lot of feedback from the users 00:25:48.560 |
and we return the data to the modeling team, right? 00:25:51.760 |
"Hey, actually, you know what people are uploading, 00:25:57.880 |
Especially to the extent that like Notebook LM 00:26:00.000 |
can handle up to 50 sources, 500,000 words each. 00:26:03.720 |
Like you're not going to be able to jam all of that 00:26:07.000 |
So how do we do multimodal embeddings with that? 00:26:09.640 |
There's really like a lot of things that we have to solve 00:26:12.760 |
that are almost there, but not quite there yet. 00:26:18.280 |
I think one of the best things is it has so many of the human 00:26:28.960 |
The audio model is definitely trying to mimic 00:26:30.760 |
like certain human intonations and like sort of natural, 00:26:42.240 |
on like where those things maybe would make sense. 00:26:49.920 |
like, can you take some of the emotions out of it too? 00:26:58.880 |
or we can give a diarized transcription of it. 00:27:01.200 |
But like the transcription doesn't have some of the, 00:27:05.720 |
Do you reconstruct that when people upload audio 00:27:09.320 |
- So when you upload audio today, we just transcribe it. 00:27:14.920 |
we don't transcribe like the emotion from that as a source. 00:27:24.400 |
I think that there is some ability for it to be reused 00:27:36.160 |
hey, today we only have one format, it's deep dive. 00:27:44.960 |
- Yeah, yeah, even if you had like a sad topic, 00:27:49.400 |
silver lining though, we're having a good chat. 00:27:56.800 |
that deep dive went viral is people saying like, 00:28:02.120 |
Any other like favorite use cases that you saw 00:28:04.680 |
from people discovering things in social media? 00:28:10.880 |
I think because I'm always relieved when I watch them, 00:28:13.200 |
I'm like, that was funny and not scary, it's great. 00:28:18.440 |
which was a startup founder putting their landing page 00:28:21.520 |
and being like, all right, let's test whether or not 00:28:25.120 |
And I was like, wow, that's right, that's smart. 00:28:35.600 |
that I'm not comfortable with, I should remove it, 00:28:38.800 |
- Right, I think that the personal hype machine 00:28:46.480 |
and like some people like keep sort of dream journals 00:28:58.160 |
especially 'cause we launched it internally first. 00:29:06.480 |
So all Googlers have to write notes about like, 00:29:11.600 |
And what Googlers were doing is they would write 00:29:16.360 |
and then they would create an audio overview. 00:29:22.080 |
like, I feel really good like going into a meeting 00:29:29.200 |
- I think another cool one is just like any Wikipedia article 00:29:33.000 |
like you drop it in and it's just like suddenly 00:29:44.720 |
which is basically like he just took like interesting stuff 00:29:47.560 |
from Wikipedia and made audio overviews out of it. 00:29:58.400 |
- Honestly, it's useful even without the audio. 00:30:00.560 |
You know, I feel like the audio does add an element to it, 00:30:03.240 |
but I always want, you know, paired audio and text. 00:30:09.480 |
I feel like it's because you laid the groundwork 00:30:16.080 |
and made it so good, so human, which is weird. 00:30:19.200 |
Like it's this engineering process of humans. 00:30:30.400 |
We were joking with this like a couple of weeks ago. 00:30:36.360 |
and it was literally called "Potatoes for Chefs." 00:30:39.040 |
And I was like, you know, my job is really serious, 00:30:45.000 |
Like the title of the file was like "Potatoes for Chefs." 00:30:52.360 |
for like two different kind of audio transcripts. 00:30:54.920 |
- The question is really like, as you iterate, 00:30:59.160 |
is you establish some kind of tests or a benchmark. 00:31:06.120 |
- What does that look like for making something sound human 00:31:11.040 |
- We have the sort of formal eval process as well, 00:31:13.440 |
but I think like for this particular project, 00:31:15.440 |
we maybe took a slightly different route to begin with. 00:31:23.440 |
- Yeah, like I think the bar that we tried to get to 00:31:41.480 |
So there was a lot of just like critical listening. 00:31:47.000 |
that those improvements actually could go into the model 00:31:49.880 |
and like we're happy with that human element of it. 00:31:53.040 |
And then eventually we had to obviously distill those down 00:31:55.440 |
into an eval set, but like still there's like, 00:31:57.520 |
the team is just like a very, very like avid user 00:32:02.920 |
- I think you just have to be really opinionated. 00:32:12.560 |
because it's like, if you hold that bar high, right? 00:32:14.960 |
Like if you think about like the iterative cycle, 00:32:17.240 |
it's like, hey, we could take like six months 00:32:20.200 |
to ship this thing, to get it to like mid where we were, 00:32:23.640 |
or we could just like listen to this and be like, 00:32:29.240 |
And collectively, like if I have two other people 00:32:35.040 |
just keep improving it to the point where you're like, 00:32:48.640 |
hey, we need to improve the sound model as well? 00:32:54.280 |
and just like generating the transcript as well. 00:33:04.000 |
than some of the other benchmarks that you can make 00:33:06.640 |
for like, you know, Sweebench or get better at this math. 00:33:17.080 |
and like a bunch of different dimensions there. 00:33:24.240 |
But I think the team stage of that was more critical. 00:33:29.760 |
that like what is making it fun and engaging. 00:33:34.160 |
And while we're making other changes that are necessary, 00:33:38.560 |
or, you know, be insensitive. - Hallucinations. 00:33:53.840 |
we really had to make sure that that central tenet 00:33:59.560 |
and something you actually want to listen to, 00:34:02.040 |
which takes like a lot of just active listening time 00:34:09.400 |
because we're dealing with non-deterministic models, 00:34:12.440 |
sometimes you just got a bad roll of the dice 00:34:17.600 |
Basically, how many, do you like do 10 runs at a time? 00:34:20.840 |
And then how do you get rid of the non-determinism? 00:34:26.480 |
I mean, there still will be like bad audio overviews. 00:34:38.600 |
You actually had a great model, great weights, whatever. 00:34:44.120 |
- I actually think like the way that these are constructed, 00:34:48.160 |
if you think about like the different types of controls 00:34:51.720 |
Like what can the user do today to affect it? 00:34:56.600 |
- I have tried to prompt engineer by changing the title. 00:35:02.720 |
the title of the notebook, people have found out 00:35:05.560 |
You can get them to think like the show has changed 00:35:07.960 |
sort of fundamentally. - Someone changed the language 00:35:16.160 |
So it did change the way that we sort of think 00:35:20.240 |
So it's like quality is on the dimensions of entertainment, 00:35:30.720 |
And I think when we talk about like non-determinism, 00:35:33.440 |
it's like, well, as long as it follows like the structure 00:35:37.120 |
It sort of inherently meets all those other qualities. 00:35:39.960 |
And so it makes it a little bit easier for us 00:35:47.560 |
Whether or not the person likes it, I don't know. 00:35:49.800 |
But as we expand to new formats, as we open up controls, 00:35:53.640 |
I think that's where it gets really much harder, 00:35:56.840 |
Like people don't know what they're going to get 00:36:06.320 |
Whereas I don't think we really got like very distribution 00:36:12.080 |
And also because of the way that we'd constrain, 00:36:24.280 |
to something I've been thinking about for AI products 00:36:29.320 |
it seems like it's a combination of you and Steven. 00:36:55.240 |
"Hey," I remember like one of the first ones he sent me, 00:37:10.480 |
we all injected like a little bit of just like, 00:37:13.400 |
"Hey, here's like my take on like how a podcast should be." 00:37:19.880 |
there's probably some collective preference there 00:37:23.320 |
that's generic enough that you can standardize 00:37:26.280 |
But yeah, it's the new formats where I think like, 00:37:29.760 |
- Yeah, I've tried to make a clone by the way. 00:37:33.560 |
- Everyone in AI was like, "Oh no, this is so easy. 00:37:36.400 |
Obviously our models are not as good as yours, 00:37:38.520 |
but I tried to inject a consistent character backstory, 00:37:45.120 |
where they went to school, what their hobbies are. 00:37:47.040 |
Then it just, the models try to bring it in too much. 00:37:51.280 |
So then I'm like, "Okay, like how do I define a personality 00:37:54.400 |
"but it doesn't keep coming up every single time?" 00:37:57.840 |
- Yeah, I mean, we have like a really, really good 00:38:05.080 |
- Just to say like we, just like we had to be opinionated 00:38:16.040 |
you should be able to design the people as well. 00:38:24.920 |
and like it's like what race they are, I don't know. 00:38:30.800 |
- I was like, I love that. - People spend hours on that. 00:38:32.680 |
And I was like, maybe there's something to be learned there 00:38:35.920 |
because like people have fallen in love with the deep dive 00:38:44.640 |
Now, when you hear a deep dive and you've heard them, 00:38:50.440 |
when people are trying to find out their names, 00:38:56.520 |
But the next step here is to sort of introduce like, 00:39:20.120 |
If you could break it down for us, that'd be great. 00:39:26.000 |
- So I'll give you some, like variation in tone and speed. 00:39:30.560 |
You know, there's this sort of writing advice where, 00:39:42.000 |
- So there's the basics, like obviously structure 00:39:45.200 |
Like there needs to be sort of an ultimate goal 00:39:48.360 |
that the voices are trying to get to, human or artificial. 00:39:53.760 |
is if there's just too much agreement between people, 00:40:00.600 |
So there needs to be some sort of tension and buildup, 00:40:04.000 |
you know, withholding information, for example. 00:40:09.240 |
like you're gonna learn more and more about it. 00:40:11.240 |
And audio that maybe becomes even more important 00:40:13.880 |
because like you actually don't have the ability 00:40:30.640 |
There's just like a gradual unrolling of information. 00:40:41.760 |
one of the history of mysteries, maybe episodes, 00:40:44.200 |
like the Wikipedia article is gonna state out 00:40:48.560 |
would probably be in the very first paragraph. 00:40:56.320 |
And maybe that would work for like a certain audience. 00:41:17.880 |
like maybe you seize on a topic and go deeper into it 00:41:21.080 |
and then try to bring yourself back out of it 00:41:32.280 |
it's trying to be as close to just human speech as possible, 00:41:37.280 |
I think was what we found success with so far. 00:41:41.920 |
Like, I think like when you listen to two people talk, 00:41:45.960 |
And then there's like a lot of like that questioning, 00:42:06.160 |
or comedy writers or whatever, stand up comedy, right? 00:42:10.360 |
But audio as well, like there's professional fields 00:42:12.560 |
of studying where people do this for a living, 00:42:15.600 |
but us as AI engineers are just making this up as we go. 00:42:19.800 |
- I mean, it's a great idea, but you definitely didn't. 00:42:26.480 |
- There's a certain appeal to authority that people have. 00:42:46.680 |
'Cause like this person went to school for linguistics 00:42:51.320 |
according to him, like most of his classmates 00:43:06.800 |
So I think, yeah, a lot of we haven't invested 00:43:15.040 |
because I think there's like a very human question 00:43:19.880 |
And there's like a very deep question of like, 00:43:25.080 |
Like, what is the quality that we are all looking for? 00:43:30.720 |
Does something have to be straight to the point? 00:43:36.440 |
about our experiment, about this particular launch is, 00:43:41.000 |
And so we sort of had to squeeze everything we believed 00:43:44.480 |
about what an interesting thing is into one package. 00:43:52.000 |
is sort of novel at first, but it's not interesting, right? 00:43:59.840 |
It's like, ha, ha, ha, I'm gonna try to trick it. 00:44:01.880 |
It's like, that's interesting, spell strawberry, right? 00:44:04.640 |
This is like the fun that like people have with it. 00:44:06.960 |
But like, that's not the LLM being interesting, that's you, 00:44:11.680 |
But it's like, what does it mean to sort of flip it 00:44:14.160 |
on its head and say, no, you be interesting now, right? 00:44:17.240 |
Like you give the chatbot the opportunity to do it. 00:44:20.240 |
And this is not a chatbot per se, it is like just the audio. 00:44:28.600 |
And it's like the things that we've described here, 00:44:30.440 |
which was like, okay, now I have to like lead you 00:44:42.640 |
I think we'll engage with experts like down the road, 00:44:45.360 |
but I think it will have to be in the context of, 00:44:48.200 |
well, what's the next thing we're building, right? 00:44:52.240 |
What do I fundamentally believe needs to be improved? 00:44:55.200 |
And I think there's still like a lot more studying 00:44:59.040 |
well, what are people actually using this for? 00:45:07.720 |
- I think the other, one other element to that is the, 00:45:10.280 |
like the fact that you're bringing your own sources to it. 00:45:21.280 |
It's like your sources and someone's telling you about it. 00:45:33.760 |
- So it's interesting just from the topic itself, 00:45:43.440 |
like if it was someone who was reading it off, 00:45:44.840 |
like, you know, that's like the absolute worst, but like. 00:45:57.520 |
I think humor is actually one of the hardest things. 00:46:42.600 |
- And it's packed with more chicken than a KFC buffet. 00:46:51.200 |
that's like truly delightful, truly surprising, 00:46:53.600 |
but it's like, we didn't tell it to be funny. 00:46:55.240 |
- Humor's contextual also, like super contextual 00:47:00.200 |
but we're prompting for maybe a lot of other things 00:47:03.920 |
- I think the thing about ad generated content, 00:47:06.320 |
if we look at YouTube, like we do videos on YouTube 00:47:09.040 |
and it's like, you know, a lot of people are screaming 00:47:12.640 |
There's like everybody, there's kind of like a meta 00:47:23.320 |
So you can actually generate a type of content 00:47:39.560 |
to reach the biggest audience and like the most clicks. 00:47:42.320 |
But what if every video could be kind of like regenerated 00:47:45.280 |
to be closer to your taste, you know, when you watch it? 00:47:53.240 |
which is I think every time I've gotten information 00:48:03.280 |
that is the format in which I'm going to read it. 00:48:12.280 |
but I'll listen to a 16 minute audio overview 00:48:21.560 |
that like maybe we wanted, but didn't expect. 00:48:25.120 |
Where I also think you're listening to a lot of content 00:48:28.480 |
that normally wouldn't have had content made about it. 00:48:32.880 |
where this woman uploaded her diary from 2004. 00:48:37.080 |
Like nobody was going to make a podcast about a diary. 00:48:39.280 |
Like hopefully not, like it seems kind of embarrassing. 00:48:43.520 |
But she was doing this like live listen of like, 00:48:55.760 |
with like her information in a totally different way. 00:49:01.080 |
Where it's like, I'm creating content for myself 00:49:03.760 |
in a way that suits the way that I want to consume it. 00:49:06.520 |
- Or people compare like retirement plan options. 00:49:14.880 |
And like, even when we started out the experiment, 00:49:16.640 |
like a lot of the goal was to go for really obscure content 00:49:46.440 |
And I think that the way that you treat your, 00:49:54.120 |
I wish I had a transcript right in front of me, 00:49:58.160 |
but usually it's kind of doing their bidding. 00:50:17.800 |
- I think that that is as close to accurate as possible. 00:50:21.560 |
I mean, in general, I try to be careful about saying like, 00:50:27.560 |
But I think to your earlier question of like, 00:50:42.200 |
- Yeah, is it interesting to have two retirement plans? 00:50:46.320 |
No, but to listen to these two talk about it, 00:51:00.480 |
- They do do a lot of get this, which is funny. 00:51:18.600 |
when to trust the AI overlord to decide for you? 00:51:22.960 |
In other words, stick it, let's say products as it is today, 00:51:51.320 |
So compound AI people will be like Databricks, 00:51:53.320 |
have lots of little models, chain them together 00:51:56.720 |
It's deterministic, you control every single piece 00:52:07.880 |
is going to be a spectrum in between those two, 00:52:13.560 |
It also depends on, well, it depends on the task, 00:52:16.120 |
but ultimately depends on what is your desired outcome? 00:52:21.600 |
And I think there's like several potential outputs 00:52:32.920 |
Am I trying to implement this as part of the stack 00:52:37.840 |
particularly for like engineers or something? 00:52:40.840 |
so that I deliver like a super high quality thing? 00:52:44.080 |
I think that the question of like, which of those two, 00:52:49.160 |
But I think fundamentally it comes down to like, 00:53:04.080 |
Because I think if you don't have that strong POV, 00:53:06.240 |
like you're going to get lost in sort of the detail 00:53:09.440 |
And capability is sort of the last thing that matters 00:53:12.360 |
because it's like models will catch up, right? 00:53:16.280 |
whatever in the next five years, it's going to be insane. 00:53:18.880 |
So I think this is like a race to like value. 00:53:21.600 |
And it's like really having a strong opinion about like, 00:53:25.720 |
And how far are you going to be able to push it? 00:53:28.080 |
Sorry, I think maybe that was like very like philosophical. 00:53:32.520 |
And I think that hits a lot of the points it's going to make. 00:53:42.120 |
So we got a list of feature requests, mostly. 00:53:45.400 |
It's funny, nobody actually had any like specific questions 00:53:50.000 |
They just want to know when you're releasing some feature. 00:53:52.280 |
So I know you cannot talk about all of these things, 00:53:54.760 |
but I think maybe it will give people an idea 00:54:05.320 |
as still be kind of like a full head product, 00:54:09.920 |
Or do you want it to be a piece of infrastructure 00:54:15.920 |
I think we work at a place where you could have both. 00:54:30.920 |
And so we're going to keep investing in that. 00:54:35.520 |
there are a lot of developers that are interested 00:54:37.840 |
in using the same technology to build their own thing. 00:54:41.720 |
How soon that's going to be ready, I can't really comment, 00:54:44.080 |
but these are the things that like, hey, we heard it. 00:54:56.480 |
And I think every time someone asks me, it's like, 00:55:06.640 |
I know people kind of hack this a little bit together. 00:55:17.440 |
Like if you go to Rome, people don't really speak Italian, 00:55:21.240 |
Do you think there's a path to which these models, 00:55:24.240 |
especially the speech can learn very like niche dialects, 00:55:31.120 |
Like, I'm curious if you see this as a possibility. 00:55:36.800 |
like we're definitely working on adding more languages. 00:55:42.560 |
but like theoretically we should be able to cover 00:55:46.840 |
- What a ridiculous statement by the way, that's crazy. 00:55:54.680 |
like a small team of like, I don't know, 10 people saying 00:55:57.120 |
that we will support the top 100, 200 languages 00:56:03.240 |
- And I think like the speech team, you know, 00:56:07.720 |
but the speech team is another team and the modeling team, 00:56:11.080 |
like these folks are just like absolutely brilliant 00:56:14.760 |
And I think like when we've talked to them and we've said, 00:56:20.640 |
This is something that like they are game to do. 00:56:25.840 |
The speech team supports like a bunch of other efforts 00:56:27.920 |
across Google, like Gemini Live, for example, 00:56:34.160 |
But yeah, the thing about dialects is really interesting. 00:56:36.320 |
'Cause like in some of our sort of earliest testing 00:56:40.800 |
we actually noticed that sometimes it wouldn't stick 00:56:48.440 |
when we presented it to like a native speaker, 00:56:50.280 |
it would sometimes go from like a Canadian person 00:56:52.440 |
speaking French versus like a French person speaking French 00:56:58.560 |
So there's a lot more sort of speech quality work 00:57:01.360 |
that we need to do there to make sure that it works reliably 00:57:04.240 |
and at least sort of like the standard dialect that we want. 00:57:09.360 |
to sort of do the thing that you're talking about 00:57:28.320 |
I'm sure like the Italian is so strong in the model 00:57:31.480 |
that like when you're trying to like pull that away from it, 00:57:39.880 |
- Well, anyway, if you need Italian, he's got you. 00:57:46.200 |
The managing system prompt, people want a lot of that. 00:57:51.200 |
Definitely looking into it for just core notebook LM. 00:58:01.080 |
we are trying to figure out the best way to do it. 00:58:03.760 |
So we'll launch something sooner rather than later. 00:58:08.280 |
And I think like, you know, just to be fully transparent, 00:58:19.720 |
We'll just put a text fax or something, yeah. 00:58:21.560 |
- I think a lot of people are like, this is almost perfect, 00:58:35.840 |
that try to ship, they're like, oh, here are all the knobs. 00:58:42.160 |
I'll just put it in the docs and you figure it out, right? 00:58:50.120 |
- As opposed to like 10 you could possibly have done. 00:58:57.760 |
I was like, oh, I saw on Twitter, you know, on X, 00:59:02.600 |
Started mocking it up, making the text boxes, 00:59:06.960 |
And then I looked at it and I was kind of sad. 00:59:08.720 |
I was like, oh, right, it's like, oh, it's like, 00:59:11.040 |
this is not cool, this is not fun, this is not magical. 00:59:14.080 |
It is sort of exactly what you would expect knobs to be. 00:59:32.240 |
And so I was like, how do we bring more of that, right? 00:59:34.920 |
That still gives the user the optionality that they want. 00:59:43.920 |
since I've launched this thing that people really want? 00:59:47.120 |
And I can give it to them while preserving like that, 00:59:54.120 |
Like, I'm not gonna come up with that by myself. 00:59:59.760 |
We're all experimenting with sort of how to get the most 01:00:03.200 |
out of like the insight and also ship it quick. 01:00:15.080 |
like going back to all the sort of like craft 01:00:21.880 |
Like the knobs are not as easy to add as simply like, 01:00:34.200 |
But the prioritization is also different though. 01:00:53.000 |
Like I wanna help you get the best output ever. 01:01:00.000 |
- Two more things we definitely wanna talk about. 01:01:11.800 |
Like, is this, and also like the future of the product 01:01:22.160 |
and like you're still looking to build like a broader 01:01:24.400 |
kind of like a interface with data and documents platform? 01:01:40.080 |
I think I'm getting a lot of sort of like positive feedback 01:01:43.960 |
We have some early signal that says it's a really good hook, 01:01:55.720 |
'cause then I could just like simplify the train, right? 01:01:58.360 |
I don't have to think about all this other functionality. 01:02:00.800 |
But I think the reality is that the framework 01:02:03.960 |
kind of like what we were talking about earlier 01:02:09.240 |
and then there's an output is that really extensible one. 01:02:13.320 |
And I think like, particularly when we think about 01:02:17.960 |
especially when we think about commercialization, 01:02:25.080 |
like the space in which you're able to do these things 01:02:34.480 |
I could see that being like a really big business. 01:02:50.520 |
like you have so many amazing teams and products at Google 01:02:53.280 |
that sometimes like, I'm sure you have to figure that out. 01:03:02.120 |
I was like, oh, there's something, well, you know, 01:03:08.400 |
oh, this is like more disorienting than like artifacts. 01:03:12.440 |
And I didn't spend a lot of time thinking about it, 01:03:18.480 |
I'm working with, you know, an LLM, an agent, 01:03:28.760 |
And the thing that I think I feel angsty about 01:03:31.600 |
is like, we've been talking about this for like a year, 01:03:35.520 |
Like, of course, like, I'm going to say that, 01:03:38.640 |
I've had these like mocks that I was just like, 01:03:40.880 |
I want to push the button, but we prioritize other things. 01:03:43.920 |
We were like, okay, what can we like really win at? 01:03:46.200 |
And like, we prioritize audio, for example, instead of that. 01:03:55.880 |
that we want to try to build into notebook too. 01:03:57.560 |
And I'd made this comment on Twitter as well, 01:03:59.880 |
where I was like, now I don't know, actually, right? 01:04:02.720 |
I don't actually know if that is the right thing. 01:04:05.240 |
Like, are people really getting utility out of this? 01:04:12.960 |
I have to rev on it like one layer more, right? 01:04:15.120 |
I have to deliver like a differentiating value 01:04:24.160 |
So you don't have to innovate every single time. 01:04:30.760 |
And when I say that, I think it's sort of like, 01:04:32.480 |
conceptually, like the value that you deliver to the user. 01:04:36.160 |
There are a lot of corners that I have personally cut, 01:04:38.560 |
where it's like, our UX designer is always like, 01:04:50.160 |
But I mean, kidding aside, I think that's true, 01:04:52.800 |
that it's like, we do want to be able to fast follow, 01:04:59.720 |
- Code, especially on our podcast, has a special place. 01:05:07.280 |
I don't see like a connect my GitHub to this thing. 01:05:15.800 |
especially when we had like a much smaller team, 01:05:18.920 |
let's push like an end-to-end journey together. 01:05:22.720 |
Because then once you lay the groundwork of like, 01:05:30.080 |
And it's like, now it's just a matter of like, 01:05:37.400 |
And now I also feel like I have a much better view 01:05:49.880 |
- For what it's worth, the model still understands code. 01:05:52.280 |
So like, I've seen at least one or two people 01:05:57.080 |
put it in there and get like an audio overview of your code. 01:06:06.280 |
Like, even if you haven't like, optimized for it. 01:06:07.560 |
- I think on sort of like the creepy side of things, 01:06:10.920 |
I did watch a student, like with her permission, of course, 01:06:13.720 |
I watched her do her homework in Notebook LM. 01:06:17.200 |
And I didn't tell her like, what kind of homework to bring, 01:06:20.480 |
but she brought like her computer science homework. 01:06:29.800 |
And Notebook LM was like, okay, I've read it. 01:06:32.760 |
And the student was like, okay, here's my code so far. 01:06:41.440 |
And Notebook LM was like, well, number one is wrong. 01:06:48.120 |
And she was like, okay, don't tell me the answer, 01:06:50.720 |
but like, walk me through like how you'd think about this. 01:06:58.000 |
And I asked her, I was like, oh, why did you do that? 01:06:59.480 |
And she was like, well, I actually want to learn it. 01:07:01.240 |
She was like, 'cause I'm going to have to take a quiz 01:07:03.520 |
And I was like, oh yeah, this is a really good point. 01:07:07.920 |
Notebook LM, while the formatting wasn't perfect, 01:07:09.880 |
like did say like, hey, have you thought about using, 01:07:12.560 |
you know, maybe an integer instead of like this? 01:07:16.960 |
- Are you adding like real-time chat on the output? 01:07:19.880 |
Like, you know, there's kind of like the deep dive show 01:07:22.400 |
and then there's like the listeners call in and say, hey. 01:07:26.400 |
- Yeah, we're actively, that's one of the things 01:07:29.560 |
Actually, one of the interesting things is now we're like, 01:07:33.560 |
Like, what are the actual, like kind of going back 01:07:35.960 |
to sort of having a strong POV about the experience. 01:07:41.040 |
Like, what is fundamentally better about doing that? 01:07:43.040 |
That's not just like being able to Q&A your notebook. 01:07:45.400 |
How is that different from like a conversation? 01:07:47.480 |
Is it just the fact that like there was a show 01:07:55.120 |
that like we can continue to unpack, but yes, 01:07:58.240 |
- It's because I formed a parasocial relationship. 01:08:12.800 |
I would say one of the toughest AI engineering disciplines 01:08:30.320 |
either call to action or laying out one principle 01:08:40.080 |
Of course, I'm going to say go to notebooklm.google.com. 01:08:43.760 |
Try it out, join the Discord and tell us what you think. 01:08:46.720 |
- Yeah, especially like you have a technical audience. 01:08:49.240 |
What do you want from a technical engineering audience? 01:08:54.240 |
because the technical and engineering audience 01:08:55.960 |
typically will just say, "Hey, where's the API?" 01:09:00.080 |
But I think what I would really be interested to discover 01:09:09.000 |
Just the most useful thing for me is if you do stop using it 01:09:14.240 |
Because I think contextualizing it within your life, 01:09:19.440 |
like is what really helps me build really cool things. 01:09:24.640 |
- Okay, if I had to pick one, it's just always be building. 01:09:29.360 |
I think like for PMs, it's like such a critical skill 01:09:32.200 |
and just like take time to like pop your head up 01:09:36.480 |
On the weekends, I try to have a lot of discipline. 01:09:38.680 |
Like I only use ChatGPT and like Cloud on the weekend. 01:09:47.680 |
'cause like I don't do that normally like at work. 01:09:56.600 |
Like you can have an idea of like how a product should work 01:10:00.720 |
But it's like, what was your like proof of concept, right? 01:10:03.280 |
Like what gave you conviction that that was the right thing? 01:10:07.000 |
- I feel like consistently like the most magical moments 01:10:13.800 |
when like I'm really, really, really just close 01:10:19.120 |
And sometimes it's like farther than you think it is. 01:10:24.560 |
like there were phases where it was like easy 01:10:29.600 |
what you really need is to like show your thing to someone 01:10:32.120 |
and like they'll come up with creative ways to improve it. 01:10:34.800 |
Like we're all sort of like learning, I think. 01:10:37.400 |
So yeah, like I feel like unless you're hitting 01:10:39.400 |
that bound of like, this is what Gemini 1.5 can do, 01:10:43.720 |
probably like the magic moment is like somewhere there, 01:10:51.880 |
- It's funny because we had a Nicola Scarlini 01:10:55.640 |
And he was like, if the model is always successful, 01:11:03.160 |
- My problem is like sometimes I'm not smart enough 01:11:11.160 |
Like people are always like, I don't know how to use it. 01:11:15.080 |
Like I remember the first time I used Google search, 01:11:19.680 |
It's like anything, I got nothing in my brain, dad. 01:11:23.840 |
And I think there's a lot of like for product builders 01:11:31.680 |
- Principle for AI engineers or like just one advice 01:11:36.760 |
- I guess like, in addition to pushing the bounds 01:11:41.400 |
you're not gonna get it right in the first go. 01:11:49.360 |
I guess that's, I'm basically describing an agent, 01:11:55.520 |
And that holds true for probably every single time 01:12:24.680 |
and other people have said it, I was like, is it? 01:12:32.880 |
and Notebook LLM, like unlocked the, you know, 01:12:39.200 |
I would go so far as to say cloud projects never did. 01:12:43.360 |
I think a lot of it is competent PMing and engineering, 01:12:46.240 |
but also just, you know, it's interesting how 01:12:53.480 |
but like, you know, you built products and UI innovation 01:12:56.880 |
on top of also working with research to improve the model. 01:13:01.200 |
That wasn't planned to be this whole big thing. 01:13:13.320 |
where I was like, you know, we had to ask for more TPUs. 01:13:18.600 |
And, you know, it was a little bit of a subtweet of like, 01:13:25.840 |
I just think like when people try to make big launches, 01:13:30.480 |
and they're just trying to build a good thing, 01:13:44.040 |
We just keep trying, keep trying to make it better.