How NotebookLM Was Made

00:00:00.000 | Hey everyone, we're here today as guests on Latent Space.

00:00:03.400 | - It's great to be here.

00:00:04.720 | I'm a long time listener and fan.

00:00:06.360 | They've had some great guests on this show before.

00:00:08.240 | - Yeah, what an honor to have us,

00:00:10.400 | the hosts of another podcast, join as guests.

00:00:13.200 | - I mean, a huge thank you to Swix and Alessio

00:00:16.500 | for the invite, thanks for having us on the show.

00:00:18.320 | - Yeah, really, it seems like they brought us here

00:00:19.880 | to talk a little bit about our show, our podcast.

00:00:22.600 | - Yeah, I mean, we've had lots of listeners ourselves,

00:00:25.240 | listeners at Deep Dive.

00:00:26.280 | - Oh yeah, we've made a ton of audio overviews

00:00:29.120 | since we launched and we're learning a lot.

00:00:31.240 | - There's probably a lot we can share

00:00:32.520 | around what we're building next, huh?

00:00:34.320 | - Yeah, we'll share a little bit at least.

00:00:35.920 | - The short version is we'll keep learning

00:00:38.000 | and getting better for you.

00:00:39.760 | - We're glad you're along for the ride.

00:00:41.080 | - So yeah, keep listening.

00:00:41.920 | - Keep listening and stay curious.

00:00:44.120 | We promise to keep diving deep

00:00:45.880 | and bringing you even better options in the future.

00:00:49.160 | - Stay curious.

00:00:50.280 | (upbeat music)

00:00:52.800 | - Hey everyone, welcome to the Latent Space Podcast.

00:00:55.240 | This is Alessio, partner and CTO

00:00:56.960 | of Residence at Decibel Partners

00:00:58.600 | and I'm joined by Nicole Swicks, founder of Small.ai.

00:01:01.560 | - Hey, and today we're back in the studio

00:01:03.880 | with our special guests, Ryzen Martin and Usama,

00:01:07.200 | I forgot to get your last name, Shavkat?

00:01:10.320 | - Yes.

00:01:11.160 | - Okay, welcome.

00:01:12.520 | - Hello, thank you for having us.

00:01:13.640 | - Thanks for having us.

00:01:14.480 | - So AI podcasters meet human podcasters, always fun.

00:01:17.800 | Congrats on the success of Notebook LM.

00:01:20.880 | I mean, how does it feel?

00:01:22.560 | - It's been a lot of fun.

00:01:23.520 | A lot of it honestly was unexpected,

00:01:25.840 | but my favorite part is really listening

00:01:27.560 | to the audio overviews that people have been making.

00:01:29.800 | - Maybe we should do a little bit of intros

00:01:31.680 | and tell the story.

00:01:32.880 | You know, what is your path into the sort of Google AI org

00:01:36.200 | or maybe I actually don't even know what org

00:01:38.680 | you guys are in.

00:01:39.760 | - I can start.

00:01:40.600 | My name's Ryza.

00:01:41.560 | I lead the Notebook LM team inside of Google Labs.

00:01:45.240 | So specifically that's the org that we're in.

00:01:47.360 | It's called Google Labs.

00:01:48.240 | It's only about two years old.

00:01:49.960 | And our whole mandate is really to build AI products.

00:01:53.840 | That's it.

00:01:54.680 | We work super closely with DeepMind.

00:01:56.280 | Our entire thing is just like try a bunch of things

00:01:59.160 | and see what's landing with users.

00:02:01.200 | And the background that I have is really,

00:02:03.520 | I worked in payments before this

00:02:05.280 | and I worked in ads right before and then startups.

00:02:07.800 | I tell people like at every time that I changed orgs,

00:02:12.040 | I actually almost quit Google.

00:02:13.800 | Like specifically like in between ads and payments,

00:02:15.800 | I was like, all right, I can't do this.

00:02:17.200 | Like, this is like super hard.

00:02:18.680 | I was like, it's not for me.

00:02:19.640 | I'm like a very zero to one person.

00:02:21.680 | But then I was like, okay, I'll try.

00:02:22.920 | I'll interview with other teams.

00:02:24.640 | And when I interviewed in payments,

00:02:25.840 | I was like, oh, these people are really cool.

00:02:27.520 | I don't know if I'm like a super good fit with this space,

00:02:31.160 | but I'll try it 'cause the people are cool.

00:02:33.040 | And then I really enjoyed that.

00:02:34.080 | And then I worked on like zero to one features

00:02:36.160 | inside of payments and I had a lot of fun.

00:02:38.480 | But then the time came again where I was like,

00:02:40.320 | oh, I don't know.

00:02:41.440 | It's like, it's time to leave.

00:02:42.280 | It's time to start my own thing.

00:02:43.640 | But then I interviewed inside of Google Labs

00:02:45.840 | and I was like, oh darn.

00:02:47.200 | Like there's definitely like-

00:02:48.360 | - They got you again.

00:02:49.200 | - They got me again.

00:02:50.280 | (laughing)

00:02:51.120 | And so now I've been here for two years

00:02:52.840 | and I'm happy that I stayed

00:02:54.840 | because especially with the recent success of Notebook LM,

00:02:58.440 | I'm like, dang, we did it.

00:03:00.120 | I actually got to do it.

00:03:01.200 | So that was really cool.

00:03:02.280 | - Kind of similar, honestly.

00:03:03.920 | I was at a big team at Google.

00:03:06.560 | We do sort of the data center supply chain planning stuff.

00:03:10.200 | Google has like the largest sort of footprint.

00:03:11.960 | Obviously there's a lot of management stuff to do there.

00:03:14.240 | But then there was this thing called Area 120 at Google,

00:03:17.440 | which does not exist anymore.

00:03:19.520 | But I sort of wanted to do like more zero to one building

00:03:23.320 | and landed a role there where we're trying to build

00:03:25.560 | like a creator commerce platform called Kaya.

00:03:29.000 | It launched briefly a couple of years ago.

00:03:32.280 | But then Area 120 sort of transitioned

00:03:34.600 | and morphed into Labs.

00:03:36.440 | And like over the last few years,

00:03:38.400 | like the focus just got a lot clearer.

00:03:40.880 | Like we were trying to build new AI products

00:03:43.680 | and do it in the wild and sort of co-create and all of that.

00:03:47.040 | So yeah, we've just been trying a bunch of different things

00:03:49.800 | and this one really landed,

00:03:51.400 | which has felt pretty phenomenal.

00:03:52.760 | - Really, really landed.

00:03:53.920 | Let's talk about the brief history of NotebookLM.

00:03:57.080 | You had a tweet, which is very helpful for doing research.

00:03:59.800 | May, 2023, during Google I/O,

00:04:01.520 | you announced Project Tailwind.

00:04:03.480 | - Yeah.

00:04:04.320 | - So today is October, 2024.

00:04:07.120 | So you joined October, 2022.

00:04:09.160 | - Actually, I used to lead AI Test Kitchen.

00:04:11.720 | And this was actually, I think not I/O 2023,

00:04:16.640 | I/O 2022 is when we launched AI Test Kitchen

00:04:20.680 | or announced it.

00:04:21.520 | And I don't know if you remember it.

00:04:22.920 | - I wasn't, that's how you like had the basic prototype

00:04:25.880 | for Gemini 8.

00:04:26.720 | - Yes, yes, exactly.

00:04:27.560 | - And like gave beta access to people.

00:04:29.320 | - Yeah, yeah, yeah.

00:04:30.320 | And I remember, I was like, wow, this is crazy.

00:04:33.200 | We're going to launch an LLM into the wild.

00:04:36.280 | And that was the first project

00:04:37.400 | that I was working on at Google.

00:04:39.240 | But at the same time, my manager at the time, Josh,

00:04:42.000 | he was like, "Hey, but I want you to really think about

00:04:44.600 | like what real products would we build

00:04:46.880 | that are not just demos of the technology?"

00:04:49.360 | That was in October of 2022.

00:04:53.200 | I was sitting next to an engineer

00:04:55.120 | that was working on a project called Talk to Small Corpus.

00:04:58.120 | His name was Adam.

00:04:59.360 | And the idea of Talk to Small Corpus

00:05:00.800 | is basically using LLM to talk to your data.

00:05:03.240 | And at the time I was like, wait,

00:05:04.880 | there are some like really practical things

00:05:07.200 | that you can build here.

00:05:08.280 | And I, just a little bit of background,

00:05:10.080 | like I was an adult learner.

00:05:11.280 | Like I went to college while I was working a full-time job.

00:05:14.120 | And the first thing I thought was like,

00:05:15.840 | this would have really helped me with my studying, right?

00:05:19.040 | If I could just like talk to a textbook,

00:05:20.760 | especially like when I was tired after work,

00:05:22.960 | that would have been huge.

00:05:24.400 | We took a lot of like the Talk to Small Corpus prototypes

00:05:27.000 | and I showed it to a lot of like college students,

00:05:29.040 | particularly like adult learners.

00:05:31.080 | They were like, yes, like I get it.

00:05:32.960 | Like I didn't even have to explain it to them.

00:05:35.200 | And we just continued to iterate the prototype from there

00:05:38.240 | to the point where we actually got a slot

00:05:40.600 | as part of the I/O demo in '23.

00:05:42.720 | - And Corpus, was it a textbook?

00:05:44.800 | - Oh my gosh, yeah.

00:05:46.200 | It's funny, actually.

00:05:47.360 | When he explained the project to me,

00:05:48.560 | he was like, "Talk to Small Corpus."

00:05:49.960 | I was like, "Talk to a small corpse?"

00:05:51.160 | - Yeah, nobody says corpus.

00:05:52.000 | - It was like a small corpse?

00:05:53.080 | This is not AI.

00:05:53.920 | - It's very academic.

00:05:54.760 | - Yeah, yeah.

00:05:55.600 | And it really was just like a way for us to describe

00:05:58.520 | the amount of data that we thought like could be,

00:06:01.280 | it could be good for.

00:06:02.280 | - Yeah, but even then, you're still like doing rag stuff

00:06:04.760 | because, you know, the context lens back then

00:06:06.760 | was probably like 2K, 4K.

00:06:08.040 | - Yeah, it was basically rag.

00:06:09.480 | That was essentially what it was.

00:06:10.880 | And I remember, I was like,

00:06:12.360 | we were building the prototypes and at the same time,

00:06:14.960 | I think like the rest of the world was, right?

00:06:17.160 | We were seeing all of these like chat with PDF stuff

00:06:19.600 | come up and I was like, "Come on, we gotta go."

00:06:21.680 | Like we have to like push this out into the world.

00:06:24.320 | I think if there was anything,

00:06:25.200 | I wish we would have launched sooner

00:06:26.680 | because I wanted to learn faster.

00:06:28.600 | But I think like we netted out pretty well.

00:06:30.760 | - Was the initial product just text-to-speech

00:06:33.800 | or were you also doing kind of like a synthesizing

00:06:36.680 | of the content, refining it?

00:06:38.120 | Or were you just helping people read through it?

00:06:40.480 | - Before we did the I/O announcement in '23,

00:06:44.120 | we'd already done a lot of studies.

00:06:46.280 | And one of the first things that I realized

00:06:48.680 | was the first thing anybody ever typed

00:06:50.360 | was summarize the thing, right?

00:06:52.920 | Summarize the document.

00:06:54.080 | And it was like half like a test

00:06:55.920 | and half just like, "Oh, I know the content.

00:06:57.840 | I wanna see how well it does this."

00:06:59.400 | So as part of the first thing that we launched,

00:07:01.880 | it was called Project Tailwind back then.

00:07:03.960 | It was just Q&A.

00:07:05.360 | So you could chat with the doc just through text

00:07:07.960 | and it would automatically generate a summary as well.

00:07:10.560 | I'm not sure if we had it back then.

00:07:12.000 | I think we did.

00:07:12.840 | It would also generate the key topics in your document.

00:07:15.920 | And it could support up to like 10 documents.

00:07:18.600 | So it wasn't just like a single doc.

00:07:20.240 | - And then the I/O demo went well, I guess.

00:07:23.040 | - Yeah.

00:07:23.880 | - And then what was the discussion from there

00:07:25.840 | to where we are today?

00:07:27.320 | Is there any maybe intermediate step of the product

00:07:30.760 | that people missed between this was launch or?

00:07:33.600 | - It was interesting because every step of the way,

00:07:35.760 | I think we hit like some pretty critical milestones.

00:07:38.560 | So I think from the initial demo,

00:07:40.240 | I think there was so much excitement of like,

00:07:41.840 | "Wow, what is this thing that Google is launching?"

00:07:44.800 | And so we capitalized on that.

00:07:46.320 | We built the wait list.

00:07:47.640 | That's actually when we also launched the Discord server,

00:07:50.400 | which has been huge for us because for us in particular,

00:07:53.840 | one of the things that I really wanted to do

00:07:56.680 | was to be able to launch features and get feedback ASAP.

00:08:00.080 | Like the moment somebody tries it,

00:08:01.480 | like I want to hear what they think right now.

00:08:03.440 | And I want to ask follow-up questions.

00:08:04.960 | And the Discord has just been so great for that.

00:08:07.160 | But then we basically took the feedback from I/O.

00:08:10.280 | We continued to refine the product.

00:08:12.360 | So we added more features.

00:08:13.680 | We added sort of like the ability to save notes,

00:08:15.880 | write notes, we generate follow-up questions.

00:08:18.800 | So there's a bunch of stuff in the product

00:08:20.240 | that shows like a lot of that research,

00:08:22.280 | but it was really the rolling out of things.

00:08:23.960 | Like we removed the wait list,

00:08:25.760 | so rolled out to all of the United States.

00:08:27.720 | We rolled out to over 200 countries and territories.

00:08:31.240 | We started supporting more languages,

00:08:33.280 | both in the UI and like the actual source stuff.

00:08:36.200 | We experienced, like in terms of milestones,

00:08:38.080 | there was like an explosion of like users in Japan.

00:08:41.200 | This was super interesting in terms of just like

00:08:43.520 | unexpected, like people would write to us

00:08:45.520 | and they would be like, "This is amazing.

00:08:47.600 | I have to read all of these rules in English,

00:08:50.560 | but I can chat in Japanese."

00:08:52.560 | It's like, oh, wow, that's true, right?

00:08:54.760 | Like with LLMs, you kind of get this natural,

00:08:57.080 | it translates the content for you,

00:08:59.680 | and you can ask in your sort of preferred mode.

00:09:01.960 | And I think that's not just like a language thing too.

00:09:04.760 | I think there's like,

00:09:06.120 | I do this test with Wealth of Nations all the time,

00:09:07.960 | 'cause it's like a pretty complicated text to read.

00:09:10.400 | - The Evan Smith classic, it's like 400 pages this thing.

00:09:12.560 | - Yeah, but I like this test 'cause I'm like,

00:09:14.840 | I ask in like normie, you know, plain speak,

00:09:17.600 | and then it summarizes really well for me.

00:09:19.600 | It sort of adapts to my tone.

00:09:21.120 | - Very capitalist.

00:09:23.680 | - Very on brand.

00:09:25.840 | - I just checked in on a Notebook LM Discord, 65,000 people.

00:09:28.680 | - Yeah.

00:09:29.520 | - Crazy, just like for one project within Google.

00:09:32.640 | It's not like, it's not labs, it's just Notebook LM.

00:09:35.480 | - Just Notebook LM.

00:09:36.560 | - What do you learn from the community?

00:09:39.200 | - I think that the Discord is really great

00:09:42.040 | for hearing about a couple of things.

00:09:43.680 | One, when things are going wrong.

00:09:45.400 | I think, honestly, like our fastest way

00:09:48.160 | that we've been able to find out

00:09:49.520 | if like the servers are down,

00:09:50.840 | or there's just an influx of people being like,

00:09:53.080 | it says system unable to answer,

00:09:54.520 | anybody else getting this?

00:09:56.200 | And I'm like, all right, let's go.

00:09:58.080 | And it actually catches it a lot faster

00:09:59.760 | than like our own monitoring does.

00:10:01.600 | It's like, that's been really cool.

00:10:02.480 | So thank you.

00:10:03.320 | - Cats will need a dog.

00:10:04.160 | (all laughing)

00:10:05.760 | - So thank you to everybody.

00:10:06.920 | Please keep reporting it.

00:10:08.200 | I think the second thing is really the use cases.

00:10:10.280 | I think when we put it out there,

00:10:11.600 | I was like, hey, I have a hunch of how people will use it,

00:10:14.640 | but like to actually hear about, you know,

00:10:16.800 | not just the context of like the use of Notebook LM,

00:10:19.400 | but like, what is this person's life like?

00:10:21.880 | Why do they care about using this tool?

00:10:23.640 | Especially people who actually have trouble using it,

00:10:25.760 | but they keep pushing,

00:10:27.280 | like that's just so critical to understand

00:10:29.520 | what was so motivating, right?

00:10:31.400 | Like what was your problem that was like so worth solving?

00:10:33.480 | So that's like a second thing.

00:10:34.600 | The third thing is also just hearing sort of like

00:10:37.120 | when we have wins and when we don't have wins,

00:10:39.480 | because there's actually a lot of functionality

00:10:41.200 | where I'm like, hmm, I don't know

00:10:42.760 | if that landed super well

00:10:43.840 | or if that was actually super critical.

00:10:45.840 | As part of having this sort of small project, right,

00:10:49.240 | I wanna be able to unlaunch things too.

00:10:50.960 | So it's not just about just like rolling things out

00:10:52.760 | and testing it and being like, wow,

00:10:54.720 | now we have like 99 features.

00:10:57.160 | Like hopefully we get to a place where it's like,

00:10:58.640 | there's just a really strong core feature set

00:11:00.560 | and the things that aren't as great, we can just unlaunch.

00:11:02.400 | - What have you unlaunched?

00:11:03.520 | I have to ask.

00:11:04.440 | - I'm in the process of unlaunching some stuff.

00:11:07.480 | But for example, we had this idea

00:11:10.880 | that you could highlight the text in your source passage

00:11:13.800 | and then you could transform it.

00:11:15.360 | And nobody was really using it.

00:11:17.000 | And it was like a very complicated piece of our architecture

00:11:19.960 | and it's very hard to continue supporting it

00:11:22.280 | in the context of new features.

00:11:24.040 | So we were like, okay, let's do a 50/50 sunset of this thing

00:11:27.040 | and see if anybody complains.

00:11:28.360 | And so far, nobody has.

00:11:29.600 | - Is there like a feature flagging paradigm

00:11:31.760 | inside of your architecture

00:11:33.720 | that lets you feature flag these things easily?

00:11:36.400 | - Yes, and actually-

00:11:37.440 | - What is it called?

00:11:38.280 | Like, I love feature flagging.

00:11:39.960 | - You mean like in terms of just like

00:11:41.200 | being able to expose things to users?

00:11:42.040 | - Yeah, as a PM, like this is your number one tool, right?

00:11:44.280 | - Yeah, yeah.

00:11:45.120 | - Let's try this out.

00:11:46.120 | All right, if it works, roll it out.

00:11:47.960 | If it doesn't, roll it back, you know?

00:11:49.240 | - Yeah, I mean, we just run Mendel experiments

00:11:51.480 | for the most part.

00:11:52.320 | And I actually, I don't know if you saw it,

00:11:54.160 | but on Twitter, somebody was able to get around our flags

00:11:56.760 | and they enabled all the experiments.

00:11:58.440 | They were like, "Check out what the Notebook LM team

00:12:01.120 | is cooking."

00:12:01.960 | And I was like, "Oh!"

00:12:03.000 | And I was at lunch with the rest of the team.

00:12:05.840 | And I was like, I was eating, I was like,

00:12:07.520 | "Guys, guys, Magic Draft week!"

00:12:10.760 | They were like, "Oh no!"

00:12:12.640 | I was like, "Okay, just finish eating

00:12:13.840 | and then let's go figure out what to do."

00:12:15.400 | - Yeah.

00:12:16.240 | - I think a post-mortem would be fun,

00:12:17.320 | but I don't think we need to do it on the podcast now.

00:12:20.880 | - Yeah, yeah.

00:12:21.720 | - Can we just talk about what's behind the magic?

00:12:24.560 | So I think everybody has questions,

00:12:27.520 | hypotheses about what models power it.

00:12:30.000 | I know you might not be able to share everything,

00:12:32.080 | but can you just get people very basic?

00:12:34.440 | How do you take the data and put it in the model?

00:12:36.560 | What text model do you use?

00:12:38.160 | What's the text-to-speech kind of like jump

00:12:40.800 | between the two?

00:12:41.680 | - Sure, yeah.

00:12:42.840 | - I was going to say, Susama,

00:12:44.160 | he manually does all the podcasts.

00:12:45.920 | - Oh, thank you.

00:12:46.760 | - Really fast.

00:12:47.600 | - You're very fast, yeah.

00:12:48.440 | - He's both of the voices at once.

00:12:50.320 | - Voice actor.

00:12:52.280 | - Go ahead, go ahead.

00:12:53.120 | - So just for a bit of background,

00:12:54.800 | we were building this thing sort of outside Notebook LM

00:12:57.760 | to begin with.

00:12:58.600 | Like just the idea is like content transformation, right?

00:13:01.480 | Like we can do different modalities.

00:13:03.200 | Like everyone knows that everyone's been poking at it,

00:13:05.600 | but like, how do you make it really useful?

00:13:08.640 | And like one of the ways we thought was like, okay,

00:13:10.400 | like you maybe like, you know,

00:13:12.480 | people learn better when they're hearing things,

00:13:14.600 | but TTS exists and you can like narrate

00:13:17.000 | whatever's on screen,

00:13:18.080 | but you want to absorb it the same way.

00:13:20.000 | So like, that's where we sort of started out

00:13:21.920 | into the realm of like, maybe we try like, you know,

00:13:24.840 | two people are having a conversation kind of format.

00:13:28.360 | We didn't actually start out thinking

00:13:30.160 | this would live in Notebook, right?

00:13:31.800 | Like Notebook was sort of,

00:13:33.200 | we built this demo out independently,

00:13:35.960 | tried out like a few different sort of sources.

00:13:38.520 | The main idea was like, go from some sort of sources

00:13:41.280 | and transform it into a listenable, engaging audio format.

00:13:45.560 | And then through that process,

00:13:46.560 | we like unlocked a bunch more sort of learnings.

00:13:49.400 | Like for example, in a sense,

00:13:51.280 | like you're not prompting the model as much

00:13:53.760 | because like the information density is getting unrolled

00:13:57.520 | by the model prompting itself in a sense,

00:14:00.480 | because there's two speakers

00:14:01.720 | and they're both technically like AI personas, right?

00:14:04.280 | That have different angles of looking at things

00:14:07.160 | and like, they'll have a discussion about it.

00:14:08.960 | And that sort of, we realized that's kind of

00:14:10.800 | what was making it riveting in a sense.

00:14:12.800 | Like you care about what comes next,

00:14:14.760 | even if you've read the material already,

00:14:17.080 | 'cause like people say they get new insights

00:14:20.000 | on their own journals or books or whatever,

00:14:23.480 | like anything that they've written themselves.

00:14:25.480 | So yeah, from a modeling perspective,

00:14:27.080 | like it's, like Raisa said earlier,

00:14:29.480 | like we work with the DeepMind audio folks pretty closely.

00:14:33.480 | So they're always cooking up new techniques

00:14:35.760 | to like get better, more human-like audio.

00:14:38.880 | And then Gemini 1.5 is really, really good

00:14:42.960 | at absorbing long context.

00:14:45.240 | So we sort of like generally put those things together

00:14:48.680 | in a way that we could reliably produce the audio.

00:14:52.760 | - I would add like, there's something really nuanced,

00:14:55.520 | I think about sort of the evolution

00:14:57.120 | of like the utility of text-to-speech,

00:14:59.840 | where if it's just reading an actual text response,

00:15:04.120 | and I've done this several times,

00:15:05.440 | I do it all the time with like reading my text messages

00:15:07.800 | or like sometimes I'm trying to read

00:15:09.400 | like a really dense paper,

00:15:10.680 | but I'm trying to do actual work,

00:15:12.040 | I'll have it like read out the screen.

00:15:14.200 | There is something really robotic about it

00:15:16.480 | that is not engaging.

00:15:18.120 | And it's really hard to consume content in that way.

00:15:20.880 | And it's never been really effective,

00:15:22.280 | like particularly for me where I'm like,

00:15:23.800 | hey, it's actually just like, it's fine for like short stuff

00:15:26.520 | like texting, but even that, it's like not that great.

00:15:29.440 | So I think the frontier of experimentation here

00:15:31.960 | was really thinking about there is a transform

00:15:35.640 | that needs to happen in between whatever,

00:15:38.240 | here's like my resume, right?

00:15:39.640 | Or here's like a hundred page slide deck or something.

00:15:42.920 | There is a transform that needs to happen

00:15:44.840 | that is inherently editorial.

00:15:47.480 | And I think this is where like that two-person persona,

00:15:51.000 | right, dialogue model,

00:15:52.640 | they have takes on the material that you've presented,

00:15:56.440 | that's where it really sort of like brings the content

00:15:59.040 | to life in a way that's like not robotic.

00:16:02.160 | And I think that's like where the magic is,

00:16:04.240 | is like, you don't actually know what's going to happen

00:16:06.920 | when you press generate, you know, for better or for worse,

00:16:09.280 | like to the extent that like people are like,

00:16:10.880 | no, I actually want it to be more predictable now.

00:16:13.120 | Like I want to be able to tell them,

00:16:15.360 | but I think that initial like, wow,

00:16:17.280 | was because you didn't know, right?

00:16:19.160 | When you upload your resume,

00:16:20.880 | what's it about to say about you?

00:16:22.400 | And I think I've seen enough of these where I'm like,

00:16:24.320 | oh, it gave you good vibes, right?

00:16:26.000 | Like you knew I was going to say like something really cool.

00:16:28.440 | As we start to shape this product,

00:16:30.320 | I think we want to try to preserve as much of that wow,

00:16:32.840 | as much as we can,

00:16:34.080 | because I do think like exposing like all the knobs

00:16:37.200 | and like the dials,

00:16:38.560 | like we've been thinking about this a lot.

00:16:40.480 | It's like, hey, is that like the actual thing?

00:16:43.520 | Is that the thing that people really want?

00:16:45.720 | - Have you found differences in having one model

00:16:48.400 | just generate the conversation

00:16:50.120 | and then using text-to-speech to kind of fake two people?

00:16:52.800 | Or like, are you actually using two different

00:16:55.600 | kind of system prompts to like have a conversation

00:16:58.360 | step-by-step?

00:16:59.320 | I'm always curious, like,

00:17:00.800 | if persona system prompts make a big difference

00:17:03.080 | or like you just put in one prompt

00:17:04.440 | and then you just let it run?

00:17:05.760 | - I guess like generally we use a lot of inference

00:17:08.960 | as you can tell with like the spinning thing takes a while.

00:17:12.920 | So yeah, there's definitely like a bunch

00:17:14.080 | of different things happening under the hood.

00:17:16.080 | We've tried both approaches

00:17:17.440 | and they have their sort of drawbacks and benefits.

00:17:21.360 | I think that that idea of like questioning

00:17:23.880 | like the two different personas, like persist throughout

00:17:26.360 | like whatever approach we try.

00:17:27.440 | It's like, there's a bit of like imperfection in there.

00:17:30.880 | Like we had to really lean into the fact that like

00:17:33.600 | to build something that's engaging,

00:17:35.440 | like it needs to be somewhat human

00:17:37.720 | and it needs to be just not a chatbot.

00:17:39.960 | Like that was sort of like what we need to diverge from.

00:17:42.000 | It's like, you know,

00:17:42.840 | most chatbots will just narrate the same kind of answer,

00:17:46.360 | like given the same sources for the most part,

00:17:48.360 | which is ridiculous.

00:17:49.640 | So yeah, there's like experimentation there under the hood,

00:17:52.680 | like with the model to like make sure that it's spitting

00:17:54.960 | out like different takes and different personas

00:17:57.640 | and different sort of prompting each other

00:17:59.280 | is like a good analogy, I guess.

00:18:00.760 | - Yeah, I think Steven Johnson, I think he's on your team.

00:18:03.920 | I don't know what his role is.

00:18:05.160 | He seems like chief dreamer, writer.

00:18:07.920 | - Yeah, I mean, I can comment on Steven.

00:18:10.600 | So Steven joined actually in the very early days,

00:18:13.320 | I think before it was even a fully funded project.

00:18:15.880 | And I remember when he joined, I was like,

00:18:17.960 | Steven Johnson's going to be on my team.

00:18:20.680 | You know, and for folks who don't know him,

00:18:22.480 | Steven is a New York Times bestselling author

00:18:25.160 | of like 14 books.

00:18:26.400 | He has a PBS show.

00:18:28.120 | He's like incredibly smart,

00:18:30.120 | just like a true sort of celebrity by himself.

00:18:33.640 | And then he joined Google and he was like,

00:18:35.120 | I want to come here and I want to build the thing

00:18:38.040 | that I've always dreamed of,

00:18:39.320 | which is a tool to help me think.

00:18:42.160 | I was like, a what?

00:18:43.000 | Like a tool to help you think?

00:18:45.120 | I was like, what do you need help with?

00:18:46.720 | Like, you seem to be doing great on your own.

00:18:48.920 | And, you know, he would describe this to me

00:18:51.040 | and I would watch his flow.

00:18:52.600 | And aside from like providing a lot of inspiration,

00:18:55.520 | to be honest, like when I watched Steven work,

00:18:58.000 | I was like, oh, nobody works like this, right?

00:19:00.760 | Like this is what makes him special.

00:19:02.760 | Like he is such a dedicated like researcher and journalist

00:19:07.000 | and he's so thorough, he's so smart.

00:19:09.160 | And then I had this realization of like,

00:19:10.840 | maybe Steven is the product.

00:19:13.440 | Maybe the work is to take Steven's expertise

00:19:16.880 | and bring it to like everyday people

00:19:19.000 | that could really benefit from this.

00:19:20.200 | Like just watching him work,

00:19:21.400 | I was like, oh, I could definitely use like a mini Steven,

00:19:23.800 | like doing work for me.

00:19:25.080 | Like that would make me a better PM.

00:19:26.800 | And then I thought very quickly about like the adjacent roles

00:19:29.480 | that could use sort of this like research and analysis tool.

00:19:33.000 | And so aside from being, you know, chief dreamer,

00:19:36.840 | Steven also represents like a super workflow

00:19:40.800 | that I think all of us,

00:19:42.520 | like if we had access to a tool like it,

00:19:44.600 | would just inherently like make us better.

00:19:46.480 | - Did you make him express his thoughts while he worked

00:19:49.480 | or you just silently watched him?

00:19:51.440 | Or how does this work?

00:19:52.800 | - Oh no, now you're making me admit it.

00:19:55.400 | But yes, I did just silently watch him.

00:19:56.760 | - Yeah, this is a part of the PM toolkit, right?

00:19:58.480 | Like user interviews and all that.

00:20:00.600 | - Yeah, I mean, I did interview him,

00:20:02.600 | but I noticed like if I interviewed him,

00:20:04.800 | it was different than if I just watched him.

00:20:07.480 | And I did the same thing with students all the time.

00:20:09.760 | Like I followed a lot of students around,

00:20:11.400 | I watched them study.

00:20:12.760 | I would ask them like, oh, how do you feel now?

00:20:14.800 | Right, or why did you do that?

00:20:15.960 | Like what made you do that actually?

00:20:18.360 | Or why are you upset about like this particular thing?

00:20:20.240 | Why are you cranky about this particular topic?

00:20:22.760 | And it was very similar, I think, for Steven,

00:20:25.200 | especially because he was describing,

00:20:27.080 | he was in the middle of writing a book

00:20:28.800 | and he would describe like, oh, you know,

00:20:30.840 | here's how I research things

00:20:32.560 | and here's how I keep my notes.

00:20:34.440 | Oh, and here's how I do it.

00:20:35.800 | And it was really,

00:20:36.960 | he was doing this sort of like self-questioning, right?

00:20:40.080 | Like now we talk about like chain of, you know,

00:20:42.040 | reasoning or thought, reflection.

00:20:43.960 | And I was like, oh, he's the OG.

00:20:46.160 | Like I watched him do it in real time.

00:20:47.680 | I was like, that's like LLM right there.

00:20:50.520 | And to be able to bring sort of that expertise in a way

00:20:53.720 | that was like, you know, maybe like costly inference wise,

00:20:56.520 | but really have like that ability inside of a tool

00:20:58.680 | that was like, for starters, free inside of Notebook LM,

00:21:01.960 | it was good to learn whether or not

00:21:03.360 | people really did find use out of it.

00:21:05.120 | - So did he just commit to using Notebook LM for everything?

00:21:08.520 | Or did you just model his existing workflow?

00:21:11.880 | - Both, right?

00:21:12.720 | Like in the beginning, there was no product for him to use.

00:21:15.040 | And so he just kept describing the thing that he wanted.

00:21:17.240 | And then eventually like we started building the thing

00:21:19.680 | and then I would start watching him use it.

00:21:22.080 | One of the things that I love about Stephen

00:21:24.240 | is he uses the product in ways where it kind of does it,

00:21:28.440 | but doesn't quite, like he's always using it

00:21:30.920 | at like the absolute max limit of this thing.

00:21:34.360 | But the way that he describes it is so full of promise

00:21:37.160 | where he's like, I can see it going here.

00:21:40.040 | And all I have to do is sort of like meet him there

00:21:42.360 | and sort of pressure test whether or not, you know,

00:21:44.480 | everyday people want it and we just have to build it.

00:21:47.000 | - I would say OpenAI has a pretty similar person,

00:21:49.280 | Andrew Mason, I think his name is.

00:21:51.000 | It's very similar, like just from the writing world

00:21:53.240 | and using it as a tool for thought to shape Chachabitty.

00:21:56.440 | I don't think that people who use AR tools

00:21:58.720 | to their limit are common.

00:22:00.440 | I'm looking at my Notebook LM now, I've got two sources.

00:22:03.400 | You have a little like source limit thing

00:22:05.680 | and my bar is over here, you know,

00:22:07.280 | and it stretches across the whole thing.

00:22:08.440 | I'm like, did he fill it up?

00:22:09.640 | Like what, you know?

00:22:10.480 | - Yes, and he has like a higher limit than others.

00:22:12.840 | I think Stephen- - He fills it up.

00:22:14.440 | - Oh yeah, like I don't think Stephen even has a limit.

00:22:17.480 | - And he has Notes, Google Drive stuff, PDFs, MP3, whatever.

00:22:21.960 | - Yes, and one of my favorite demos,

00:22:23.360 | he just did this recently,

00:22:24.400 | is he has actually PDFs of like handwritten Marie Curie notes.

00:22:28.840 | - I see, so you're doing image recognition as well.

00:22:30.800 | - Yeah, so it does support it today.

00:22:32.800 | So if you have a PDF that's purely images,

00:22:34.920 | it will recognize it.

00:22:36.200 | But his demo is just like super powerful.

00:22:37.840 | He's like, okay, here's Marie Curie's notes.

00:22:39.560 | And it's like, here's how I'm using it to analyze it.

00:22:41.680 | And I'm using it for like this thing that I'm writing.

00:22:43.920 | And that's really compelling.

00:22:45.480 | It's like the everyday person

00:22:46.640 | doesn't think of these applications.

00:22:48.480 | And I think even like when I listened to Stephen's demo,

00:22:50.880 | I see the gap.

00:22:52.040 | I see how Stephen got there,

00:22:53.560 | but I don't see how I could without him.

00:22:55.840 | And so there's a lot of work still for us to build

00:22:58.520 | of like, hey, how do I bring that magic down

00:23:01.320 | to like zero work?

00:23:03.280 | Because I look at all the steps that he had to take

00:23:05.160 | in order to do it.

00:23:06.000 | And I'm like, okay, that's product work for us, right?

00:23:07.880 | Like that's just onboarding.

00:23:09.320 | - And so from an engineering perspective,

00:23:10.880 | people come to you and it's like,

00:23:11.800 | okay, I need to use this handwritten notes

00:23:14.160 | from Marie Curie from hundreds of years ago.

00:23:17.000 | How do you think about adding support for like data sources

00:23:19.840 | and then maybe any fun stories

00:23:21.520 | and like supporting more esoteric types of inputs?

00:23:25.440 | - So I think about the product in three ways, right?

00:23:27.680 | So there's the sources, the source input,

00:23:30.120 | there's like the capabilities

00:23:31.360 | of like what you could do with those sources.

00:23:33.360 | And then there's the third space,

00:23:34.640 | which is how do you output it into the world?

00:23:36.480 | Like how do you put it back out there?

00:23:38.640 | There's a lot of really basic sources

00:23:40.640 | that we don't support still, right?

00:23:42.200 | I think there's sort of like

00:23:43.240 | the handwritten notes stuff is one,

00:23:45.080 | but even basic things like Doc X or like PowerPoint, right?

00:23:49.040 | Like these are the things that people,

00:23:50.760 | everyday people are like,

00:23:51.600 | "Hey, my professor actually gave me everything in Doc X.

00:23:54.600 | Can you support that?"

00:23:55.880 | And then just like basic stuff,

00:23:57.160 | like images and PDFs combined with texts.

00:24:00.040 | Like there's just a really long roadmap for sources

00:24:02.840 | that I think we just have to work on.

00:24:04.160 | So that's like a big piece of it.

00:24:05.560 | On the output side,

00:24:06.440 | and I think this is like one of the most interesting things

00:24:08.080 | that we learned really early on is,

00:24:10.680 | sure, there's like the Q&A analysis stuff,

00:24:13.480 | which is like, "Hey, when did this thing launch?

00:24:15.520 | Okay, you found it in the slide deck.

00:24:16.920 | Here's the answer."

00:24:18.240 | But most of the time,

00:24:19.240 | the reason why people ask those questions

00:24:20.840 | is because they're trying to make something new.

00:24:22.560 | And so when actually,

00:24:23.520 | when some of those early features leaked,

00:24:25.320 | like a lot of the features we're experimenting with

00:24:27.280 | are the output types.

00:24:28.840 | And so you can imagine that people care a lot

00:24:31.440 | about the resources that they're putting into Notebook LM

00:24:33.880 | 'cause they're trying to create something new.

00:24:35.920 | So I think equally as important as the source inputs

00:24:39.320 | are the outputs that we're helping people to create.

00:24:42.160 | And really, shortly on the roadmap,

00:24:44.240 | we're thinking about,

00:24:45.640 | how do we help people use Notebook LM

00:24:47.760 | to distribute knowledge?

00:24:49.640 | And that's like one of the most compelling use cases

00:24:51.520 | is like shared notebooks.

00:24:52.680 | It's like a way to share knowledge.

00:24:54.200 | How do we help people take sources

00:24:55.920 | and then one-click new documents out of it, right?

00:24:59.040 | And I think that's something that people think is like,

00:25:00.680 | "Oh yeah, of course," right?

00:25:01.720 | Like one push a document,

00:25:02.880 | but what does it mean to do it right?

00:25:05.080 | Like to do it in your style, in your brand, right?

00:25:08.040 | To follow your guidelines, stuff like that.

00:25:09.560 | So I think there's a lot of work

00:25:11.160 | on both sides of that equation.

00:25:13.320 | - Interesting.

00:25:14.160 | Any comments on the engineering side of things?

00:25:16.200 | - So yeah, like I said,

00:25:17.440 | I was mostly working on building the text to audio,

00:25:20.600 | which kind of lives as a separate engineering pipeline

00:25:23.160 | almost that we then put into Notebook LM.

00:25:25.160 | But I think there's probably tons of Notebook LM

00:25:27.320 | engineering war stories on dealing with sources.

00:25:30.160 | And so I don't work too closely with engineers directly,

00:25:32.960 | but I think a lot of it does come down

00:25:34.360 | to like Gemini's native understanding of images really well,

00:25:38.280 | like the latest generation.

00:25:39.280 | - Yeah, I think on the engineering and modeling side,

00:25:41.440 | I think we are a really good example of a team

00:25:45.120 | that's put a product out there

00:25:46.960 | and we're getting a lot of feedback from the users

00:25:48.560 | and we return the data to the modeling team, right?

00:25:50.920 | To the extent that we say,

00:25:51.760 | "Hey, actually, you know what people are uploading,

00:25:54.560 | but we can't really support super well?

00:25:56.400 | Text plus image," right?

00:25:57.880 | Especially to the extent that like Notebook LM

00:26:00.000 | can handle up to 50 sources, 500,000 words each.

00:26:03.720 | Like you're not going to be able to jam all of that

00:26:05.840 | into like the context window.

00:26:07.000 | So how do we do multimodal embeddings with that?

00:26:09.640 | There's really like a lot of things that we have to solve

00:26:12.760 | that are almost there, but not quite there yet.

00:26:16.240 | - And then turning it into audio.

00:26:18.280 | I think one of the best things is it has so many of the human

00:26:21.480 | does that happen in the text generation

00:26:23.280 | that then becomes audio?

00:26:24.440 | Or is that a part of like the audio model

00:26:26.680 | that transforms the text?

00:26:27.520 | - It's a bit of both, I would say.

00:26:28.960 | The audio model is definitely trying to mimic

00:26:30.760 | like certain human intonations and like sort of natural,

00:26:34.320 | like breathing and pauses and like laughter

00:26:37.000 | and things like that.

00:26:38.200 | But yeah, in generating like the text,

00:26:40.440 | we also have to sort of give signals

00:26:42.240 | on like where those things maybe would make sense.

00:26:45.400 | - Yeah, and on the input side instead,

00:26:47.800 | having a transcript versus having the audio,

00:26:49.920 | like, can you take some of the emotions out of it too?

00:26:52.720 | If I'm giving, like, for example,

00:26:54.480 | when we did the recaps of our podcast,

00:26:56.320 | we can either give audio of the pod

00:26:58.880 | or we can give a diarized transcription of it.

00:27:01.200 | But like the transcription doesn't have some of the,

00:27:03.240 | you know, voice kind of like things.

00:27:05.720 | Do you reconstruct that when people upload audio

00:27:08.280 | or how does that work?

00:27:09.320 | - So when you upload audio today, we just transcribe it.

00:27:12.240 | So it is quite lossy in the sense that like,

00:27:14.920 | we don't transcribe like the emotion from that as a source.

00:27:18.000 | But when you do upload a text file

00:27:21.760 | and it has a lot of like that annotation,

00:27:24.400 | I think that there is some ability for it to be reused

00:27:27.600 | in like the audio output, right?

00:27:29.120 | But I think it will still contextualize it

00:27:31.760 | in the deep dive format.

00:27:33.160 | So I think that's something that's like

00:27:34.880 | particularly important is like,

00:27:36.160 | hey, today we only have one format, it's deep dive.

00:27:38.720 | It's meant to be pretty general overview

00:27:41.000 | and it is pretty peppy.

00:27:42.240 | It's just very upbeat.

00:27:43.640 | - It's very enthusiastic, yeah.

00:27:44.960 | - Yeah, yeah, even if you had like a sad topic,

00:27:47.680 | I think they would find a way to be like,

00:27:49.400 | silver lining though, we're having a good chat.

00:27:53.960 | - Yeah, that's awesome.

00:27:54.800 | One of the ways, many, many, many ways

00:27:56.800 | that deep dive went viral is people saying like,

00:27:59.440 | if you want to feel good about yourself,

00:28:00.680 | just drop in your LinkedIn.

00:28:02.120 | Any other like favorite use cases that you saw

00:28:04.680 | from people discovering things in social media?

00:28:07.880 | - I mean, there's so many funny ones

00:28:09.440 | and I love the funny ones.

00:28:10.880 | I think because I'm always relieved when I watch them,

00:28:13.200 | I'm like, that was funny and not scary, it's great.

00:28:16.960 | There was another one that was interesting,

00:28:18.440 | which was a startup founder putting their landing page

00:28:21.520 | and being like, all right, let's test whether or not

00:28:23.560 | like the value prop is coming through.

00:28:25.120 | And I was like, wow, that's right, that's smart.

00:28:28.240 | And then I saw a couple of other people

00:28:29.880 | following up on that too.

00:28:31.920 | - Yeah, I put my about page in there

00:28:33.720 | and like, yeah, if there are things

00:28:35.600 | that I'm not comfortable with, I should remove it,

00:28:37.440 | so that it can pick it up.

00:28:38.800 | - Right, I think that the personal hype machine

00:28:40.800 | was like pretty viral one.

00:28:44.160 | I think like people uploaded their dreams

00:28:46.480 | and like some people like keep sort of dream journals

00:28:48.680 | and it like would sort of comment on those

00:28:52.040 | and like it was therapeutic.

00:28:53.600 | - I didn't see those, those are good.

00:28:55.920 | I hear from Googlers all the time,

00:28:58.160 | especially 'cause we launched it internally first.

00:29:00.840 | And I think we launched it during the Q3

00:29:04.600 | sort of like check-in cycle.

00:29:06.480 | So all Googlers have to write notes about like,

00:29:08.920 | hey, what'd you do in Q3?

00:29:11.600 | And what Googlers were doing is they would write

00:29:14.280 | whatever they accomplished in Q3

00:29:16.360 | and then they would create an audio overview.

00:29:18.840 | And these people that I didn't know

00:29:20.200 | would just ping me and be like, wow,

00:29:22.080 | like, I feel really good like going into a meeting

00:29:24.320 | with my manager.

00:29:25.160 | And I was like, good, good, good, good.

00:29:27.200 | You really did that, right?

00:29:28.360 | (laughs)

00:29:29.200 | - I think another cool one is just like any Wikipedia article

00:29:33.000 | like you drop it in and it's just like suddenly

00:29:35.400 | like the best sort of summary overview.

00:29:38.320 | - I think that's what Karpathy did, right?

00:29:40.160 | Like he has now a Spotify channel

00:29:42.240 | called "Histories of Mysteries,"

00:29:44.720 | which is basically like he just took like interesting stuff

00:29:47.560 | from Wikipedia and made audio overviews out of it.

00:29:50.560 | - Yeah, he became a podcaster overnight.

00:29:52.360 | - Yeah, I'm here for it.

00:29:54.200 | I fully support him.

00:29:55.840 | I'm racking up the listens for him.

00:29:58.400 | - Honestly, it's useful even without the audio.

00:30:00.560 | You know, I feel like the audio does add an element to it,

00:30:03.240 | but I always want, you know, paired audio and text.

00:30:06.080 | And it's just amazing to see

00:30:07.560 | what people are organically discovering.

00:30:09.480 | I feel like it's because you laid the groundwork

00:30:11.680 | with NotebookLM and then you came in

00:30:13.680 | and added the sort of TTS portion

00:30:16.080 | and made it so good, so human, which is weird.

00:30:19.200 | Like it's this engineering process of humans.

00:30:21.080 | Oh, one thing I wanted to ask.

00:30:22.440 | Do you have evals?

00:30:23.480 | - Yeah. - Yes.

00:30:24.560 | - What?

00:30:25.400 | - Potatoes for chefs.

00:30:26.720 | (laughing)

00:30:27.560 | - What is that?

00:30:28.400 | What do you mean potatoes?

00:30:29.560 | - Oh, sorry, sorry.

00:30:30.400 | We were joking with this like a couple of weeks ago.

00:30:33.440 | We were doing like side-by-sides,

00:30:34.840 | but like Usama sent me the file

00:30:36.360 | and it was literally called "Potatoes for Chefs."

00:30:39.040 | And I was like, you know, my job is really serious,

00:30:42.080 | but like- - It's kind of funny.

00:30:43.400 | - You have to laugh a little bit.

00:30:45.000 | Like the title of the file was like "Potatoes for Chefs."

00:30:47.680 | - Was it like a training document for chefs?

00:30:50.360 | - It was just a side-by-side

00:30:52.360 | for like two different kind of audio transcripts.

00:30:54.920 | - The question is really like, as you iterate,

00:30:57.400 | the typical engineering advice

00:30:59.160 | is you establish some kind of tests or a benchmark.

00:31:02.920 | You're at like 30%.

00:31:03.920 | You want to get it up to 90, right?

00:31:05.280 | - Yeah.

00:31:06.120 | - What does that look like for making something sound human

00:31:08.520 | and interesting and voice?

00:31:11.040 | - We have the sort of formal eval process as well,

00:31:13.440 | but I think like for this particular project,

00:31:15.440 | we maybe took a slightly different route to begin with.

00:31:17.800 | Like there was a lot of just

00:31:19.160 | within the team listening sessions,

00:31:21.720 | a lot of like sort of like- - Dogfooding.

00:31:23.440 | - Yeah, like I think the bar that we tried to get to

00:31:27.640 | before even starting formal evals

00:31:30.200 | with raters and everything was much higher

00:31:32.680 | than I think other projects would.

00:31:34.240 | Like, 'cause that's, as you said,

00:31:35.360 | like the traditional advice, right?

00:31:36.360 | Like get that ASAP.

00:31:37.520 | Like, what are you looking to improve on?

00:31:40.040 | Whatever benchmark it is.

00:31:41.480 | So there was a lot of just like critical listening.

00:31:44.280 | And I think a lot of making sure

00:31:47.000 | that those improvements actually could go into the model

00:31:49.880 | and like we're happy with that human element of it.

00:31:53.040 | And then eventually we had to obviously distill those down

00:31:55.440 | into an eval set, but like still there's like,

00:31:57.520 | the team is just like a very, very like avid user

00:32:00.960 | of the product at all stages.

00:32:02.920 | - I think you just have to be really opinionated.

00:32:05.040 | I think that sometimes if you are,

00:32:07.760 | your intuition is just sharper

00:32:10.080 | and you can move a lot faster on the product

00:32:12.560 | because it's like, if you hold that bar high, right?

00:32:14.960 | Like if you think about like the iterative cycle,

00:32:17.240 | it's like, hey, we could take like six months

00:32:20.200 | to ship this thing, to get it to like mid where we were,

00:32:23.640 | or we could just like listen to this and be like,

00:32:25.040 | yeah, that's not it, right?

00:32:26.280 | And I don't need a rater to tell me that.

00:32:28.080 | That's my preference, right?

00:32:29.240 | And collectively, like if I have two other people

00:32:31.040 | listen to it, they'll probably agree.

00:32:33.320 | And it's just kind of this step of like,

00:32:35.040 | just keep improving it to the point where you're like,

00:32:37.200 | okay, now I think this is really impressive.

00:32:39.840 | And then like do evals, right?

00:32:42.240 | And then validate that.

00:32:43.280 | - Was the sound model done and frozen

00:32:45.560 | before you started doing all this?

00:32:46.720 | Or are you also saying,

00:32:48.640 | hey, we need to improve the sound model as well?

00:32:50.680 | - Both, yeah.

00:32:51.880 | We were making improvements on the audio

00:32:54.280 | and just like generating the transcript as well.

00:32:58.600 | I think another weird thing here was like,

00:33:00.880 | we need it to be entertaining

00:33:02.360 | and that's much harder to quantify

00:33:04.000 | than some of the other benchmarks that you can make

00:33:06.640 | for like, you know, Sweebench or get better at this math.

00:33:10.480 | - Do you just have people rate one to five

00:33:11.840 | or, you know, or just thumbs up and down?

00:33:14.000 | - For the formal rater evals,

00:33:15.440 | we have sort of like a Likert scale

00:33:17.080 | and like a bunch of different dimensions there.

00:33:19.000 | But we had to sort of break down

00:33:20.840 | that what makes it entertaining

00:33:22.640 | into like a bunch of different factors.

00:33:24.240 | But I think the team stage of that was more critical.

00:33:27.760 | It was like, we need to make sure

00:33:29.760 | that like what is making it fun and engaging.

00:33:32.000 | Like we dialed that as far as it goes.

00:33:34.160 | And while we're making other changes that are necessary,

00:33:36.640 | like obviously they shouldn't make stuff up

00:33:38.560 | or, you know, be insensitive. - Hallucinations.

00:33:40.920 | - Hallucinations. - Other safety things.

00:33:43.440 | - Right, like a bunch of safety stuff.

00:33:45.160 | - Yeah, exactly.

00:33:46.000 | So like with all of that,

00:33:47.520 | and like also just, you know,

00:33:48.840 | following sort of a coherent narrative

00:33:51.040 | and structure is really important.

00:33:52.880 | But like with all of this,

00:33:53.840 | we really had to make sure that that central tenet

00:33:57.080 | of being entertaining and engaging

00:33:59.560 | and something you actually want to listen to,

00:34:01.160 | it just doesn't go away,

00:34:02.040 | which takes like a lot of just active listening time

00:34:04.200 | 'cause you're closest to the prompts,

00:34:06.120 | the model and everything.

00:34:07.320 | - I think sometimes the difficulty is

00:34:09.400 | because we're dealing with non-deterministic models,

00:34:12.440 | sometimes you just got a bad roll of the dice

00:34:14.600 | and it's always on the distribution

00:34:15.960 | that you could get something bad.

00:34:17.600 | Basically, how many, do you like do 10 runs at a time?

00:34:20.840 | And then how do you get rid of the non-determinism?

00:34:23.400 | - Right, yeah.

00:34:24.320 | That's-- - Like bad luck.

00:34:25.640 | - Yeah, yeah, yeah.

00:34:26.480 | I mean, there still will be like bad audio overviews.

00:34:29.440 | There's like a bunch of them that happens.

00:34:31.880 | - Do you mean for like the raider emails?

00:34:33.360 | - For raiders, right?

00:34:34.400 | Like what if that one person

00:34:35.800 | just got like a really bad rating?

00:34:37.120 | You actually had a great prompt.

00:34:38.600 | You actually had a great model, great weights, whatever.

00:34:40.800 | And you just, you had a bad output.

00:34:42.520 | Like, and that's okay, right?

00:34:44.120 | - I actually think like the way that these are constructed,

00:34:48.160 | if you think about like the different types of controls

00:34:50.880 | that the user has, right?

00:34:51.720 | Like what can the user do today to affect it?

00:34:54.520 | - We push a button. - Just use your sources.

00:34:55.760 | You just push a button.

00:34:56.600 | - I have tried to prompt engineer by changing the title.

00:34:58.840 | - Yeah, yeah, yeah.

00:34:59.960 | - Changing the title, people have found out,

00:35:02.720 | the title of the notebook, people have found out

00:35:04.160 | you can add show notes, right?

00:35:05.560 | You can get them to think like the show has changed

00:35:07.960 | sort of fundamentally. - Someone changed the language

00:35:08.800 | of the output.

00:35:09.640 | - Changing the language of the output.

00:35:10.960 | Like those are less well-tested

00:35:13.240 | because we focused on like this one aspect.

00:35:16.160 | So it did change the way that we sort of think

00:35:18.720 | about quality as well, right?

00:35:20.240 | So it's like quality is on the dimensions of entertainment,

00:35:24.080 | of course, like consistency, groundedness.

00:35:26.920 | But in general, does it follow the structure

00:35:29.200 | of the deep dive?

00:35:30.720 | And I think when we talk about like non-determinism,

00:35:33.440 | it's like, well, as long as it follows like the structure

00:35:36.040 | of the deep dive, right?

00:35:37.120 | It sort of inherently meets all those other qualities.

00:35:39.960 | And so it makes it a little bit easier for us

00:35:42.520 | to ship something with confidence

00:35:44.440 | to the extent that it's like,

00:35:45.280 | I know it's gonna make a deep dive.

00:35:46.440 | It's gonna make a good deep dive.

00:35:47.560 | Whether or not the person likes it, I don't know.

00:35:49.800 | But as we expand to new formats, as we open up controls,

00:35:53.640 | I think that's where it gets really much harder,

00:35:55.840 | even with the show notes, right?

00:35:56.840 | Like people don't know what they're going to get

00:35:58.280 | when they do that.

00:35:59.320 | And we see that already where it's like,

00:36:00.880 | this is gonna be a lot harder to validate

00:36:03.480 | in terms of quality,

00:36:04.720 | where now we'll get a greater distribution.

00:36:06.320 | Whereas I don't think we really got like very distribution

00:36:09.400 | because of like that pre-process

00:36:10.760 | that Usama was talking about.

00:36:12.080 | And also because of the way that we'd constrain,

00:36:14.120 | like what were we measuring for?

00:36:16.000 | Literally just like, is it a deep dive?

00:36:18.160 | - And you determine what a deep dive is.

00:36:19.920 | - Yeah.

00:36:20.760 | - Everything needs a PM.

00:36:21.600 | I have, this is very similar

00:36:24.280 | to something I've been thinking about for AI products

00:36:25.840 | in general.

00:36:26.680 | There's always like a chief tastemaker.

00:36:28.120 | And for Notebook LM,

00:36:29.320 | it seems like it's a combination of you and Steven.

00:36:31.400 | - Well, okay.

00:36:32.240 | I want to take a step back.

00:36:33.440 | - And Usama.

00:36:34.280 | I mean, presumably for the voice stuff.

00:36:35.640 | - Usama's like the head chef, right?

00:36:38.320 | Of like deep dive, I think.

00:36:39.640 | - Potatoes.

00:36:40.480 | - Of potatoes.

00:36:41.640 | And I say this because I think even though

00:36:44.960 | we are already a very opinionated team

00:36:46.600 | and Steven, for sure, very opinionated,

00:36:48.600 | I think of the audio generations,

00:36:50.840 | like Usama was the most opinionated, right?

00:36:53.400 | And we all, we all like would say like,

00:36:55.240 | "Hey," I remember like one of the first ones he sent me,

00:36:57.280 | I was like, "Oh, I feel like

00:36:58.120 | "they should introduce themselves.

00:36:59.360 | "I feel like they should say a title."

00:37:01.080 | But then like, we would catch things like,

00:37:03.160 | maybe they shouldn't say their names.

00:37:04.640 | - Yeah, they don't say their names.

00:37:05.880 | - That was a Steven catch.

00:37:07.040 | - Yeah, yeah.

00:37:07.880 | - Like not give them names.

00:37:08.720 | - So stuff like that is just like,

00:37:10.480 | we all injected like a little bit of just like,

00:37:13.400 | "Hey, here's like my take on like how a podcast should be."

00:37:16.280 | Right, and I think like if you're a person

00:37:18.160 | who like regularly listens to podcasts,

00:37:19.880 | there's probably some collective preference there

00:37:23.320 | that's generic enough that you can standardize

00:37:24.960 | into like the deep dive format.

00:37:26.280 | But yeah, it's the new formats where I think like,

00:37:28.360 | "Oh, that's the next test."

00:37:29.760 | - Yeah, I've tried to make a clone by the way.

00:37:31.760 | Of course, everyone did.

00:37:32.720 | - Yeah.

00:37:33.560 | - Everyone in AI was like, "Oh no, this is so easy.

00:37:35.120 | "I'll just take a TTS model."

00:37:36.400 | Obviously our models are not as good as yours,

00:37:38.520 | but I tried to inject a consistent character backstory,

00:37:41.960 | like age, identity, where they went to work,

00:37:45.120 | where they went to school, what their hobbies are.

00:37:47.040 | Then it just, the models try to bring it in too much.

00:37:49.800 | I don't know if you tried this.

00:37:51.280 | So then I'm like, "Okay, like how do I define a personality

00:37:54.400 | "but it doesn't keep coming up every single time?"

00:37:57.840 | - Yeah, I mean, we have like a really, really good

00:38:00.520 | like character designer on our team.

00:38:02.320 | - What?

00:38:03.160 | Like a D&D person?

00:38:05.080 | - Just to say like we, just like we had to be opinionated

00:38:07.480 | about the format, we had to be opinionated

00:38:09.560 | about who are those two people talking.

00:38:11.680 | - Okay.

00:38:12.520 | - Right, and then to the extent that like

00:38:14.520 | you can design the format,

00:38:16.040 | you should be able to design the people as well.

00:38:18.560 | - Yeah, I would love like a, you know,

00:38:20.760 | like when you play Baldur's Gate,

00:38:22.040 | like you roll like 17 on charisma

00:38:24.920 | and like it's like what race they are, I don't know.

00:38:27.240 | - I recently, actually, I was just talking

00:38:28.640 | about character select screens.

00:38:29.960 | - Yeah.

00:38:30.800 | - I was like, I love that. - People spend hours on that.

00:38:31.640 | - I love that, right?

00:38:32.680 | And I was like, maybe there's something to be learned there

00:38:35.920 | because like people have fallen in love with the deep dive

00:38:39.600 | as a format, as a technology,

00:38:42.280 | but also as just like those two personas.

00:38:44.640 | Now, when you hear a deep dive and you've heard them,

00:38:46.520 | you're like, "I know those two," right?

00:38:48.800 | And people, it's so funny when I,

00:38:50.440 | when people are trying to find out their names,

00:38:51.960 | like it's a worthy task, it's a worthy goal.

00:38:55.400 | I know what you're doing.

00:38:56.520 | But the next step here is to sort of introduce like,

00:38:59.280 | is this like what people want?

00:39:00.560 | People want to sort of edit their personas

00:39:02.880 | or do they just want more of them?

00:39:04.240 | - I'm sure you're getting a lot of opinions

00:39:06.400 | and they all conflict with each other.

00:39:08.280 | Before we move on, I have to ask,

00:39:09.920 | because we're kind of on this topic,

00:39:11.800 | how do you make audio engaging?

00:39:13.600 | Because it's useful, not just for deep dive,

00:39:15.640 | but also for us as podcasters.

00:39:17.760 | What does engaging mean?

00:39:20.120 | If you could break it down for us, that'd be great.

00:39:22.160 | - I mean, I can try.

00:39:23.520 | Don't claim to be an expert at all.

00:39:26.000 | - So I'll give you some, like variation in tone and speed.

00:39:30.560 | You know, there's this sort of writing advice where,

00:39:32.880 | you know, this sentence is five words,

00:39:34.360 | this sentence is three, that kind of advice

00:39:36.160 | where you vary things, you have excitement,

00:39:38.240 | you have laughter, all that stuff.

00:39:39.920 | But I'd be curious how else you break down.

00:39:42.000 | - So there's the basics, like obviously structure

00:39:43.960 | that can't be meandering, right?

00:39:45.200 | Like there needs to be sort of an ultimate goal

00:39:48.360 | that the voices are trying to get to, human or artificial.

00:39:51.880 | I think one thing we find often

00:39:53.760 | is if there's just too much agreement between people,

00:39:58.000 | like that's not fun to listen to.

00:40:00.600 | So there needs to be some sort of tension and buildup,

00:40:04.000 | you know, withholding information, for example.

00:40:06.200 | Like as you listen to a story unfold,

00:40:09.240 | like you're gonna learn more and more about it.

00:40:11.240 | And audio that maybe becomes even more important

00:40:13.880 | because like you actually don't have the ability

00:40:16.680 | to just like skim to the end of something

00:40:18.480 | when you're driving or something,

00:40:19.600 | like you're gonna be hooked.

00:40:21.160 | 'Cause like there's, and that's how like,

00:40:22.680 | that's how a lot of podcasts work.

00:40:24.760 | Like maybe not interviews necessarily,

00:40:26.280 | but a lot of true crime,

00:40:28.120 | a lot of entertainment in general.

00:40:30.640 | There's just like a gradual unrolling of information.

00:40:33.880 | And that also like sort of goes back

00:40:35.360 | to the content transformation aspect of it.

00:40:37.120 | Like maybe you are going from,

00:40:39.160 | let's say the Wikipedia article of like,

00:40:41.760 | one of the history of mysteries, maybe episodes,

00:40:44.200 | like the Wikipedia article is gonna state out

00:40:46.040 | the information very differently.

00:40:47.440 | It's like, here's what happened,

00:40:48.560 | would probably be in the very first paragraph.

00:40:52.080 | And one approach we could have done is like,

00:40:53.440 | maybe a person's just narrating that thing.

00:40:56.320 | And maybe that would work for like a certain audience.

00:40:59.440 | Or I guess that's how I would picture

00:41:01.160 | like a standard history lesson to unfold.

00:41:04.040 | But like, because we're trying to put it

00:41:05.600 | in this two-person dialogue format,

00:41:08.080 | like we inject like the fact that, you know,

00:41:10.680 | there's, you don't give everything at first.

00:41:13.480 | And then you set up like differing opinions

00:41:16.080 | of the same topic or the same,

00:41:17.880 | like maybe you seize on a topic and go deeper into it

00:41:21.080 | and then try to bring yourself back out of it

00:41:23.280 | and go back to the main narrative.

00:41:25.840 | So that's mostly from like the setting up

00:41:27.960 | the script perspective.

00:41:29.880 | And then the audio, I was saying earlier,

00:41:32.280 | it's trying to be as close to just human speech as possible,

00:41:37.280 | I think was what we found success with so far.

00:41:40.240 | - Yeah.

00:41:41.080 | Like with interjections, right?

00:41:41.920 | Like, I think like when you listen to two people talk,

00:41:43.840 | there's a lot of like, yeah, yeah, right.

00:41:45.960 | And then there's like a lot of like that questioning,

00:41:47.760 | like, oh yeah, really?

00:41:49.720 | What did you think?

00:41:50.560 | - I noticed that, that's great.

00:41:52.840 | - Totally.

00:41:53.680 | - Like, so my question is,

00:41:56.760 | do you pull in speech experts to do this

00:41:59.600 | or did you just come up with it yourselves?

00:42:01.920 | You can be like, okay, talk to a whole bunch

00:42:03.960 | of fiction writers to make things engaging

00:42:06.160 | or comedy writers or whatever, stand up comedy, right?

00:42:08.520 | They have to make audio engaging.

00:42:10.360 | But audio as well, like there's professional fields

00:42:12.560 | of studying where people do this for a living,

00:42:15.600 | but us as AI engineers are just making this up as we go.

00:42:19.800 | - I mean, it's a great idea, but you definitely didn't.

00:42:22.400 | - Yeah.

00:42:23.240 | - No, I'm just like, oh.

00:42:24.360 | - My guess is you didn't.

00:42:25.640 | - Yeah.

00:42:26.480 | - There's a certain appeal to authority that people have.

00:42:28.280 | They're like, oh, like you can't do this

00:42:29.600 | 'cause you don't have any experience

00:42:31.000 | like making engaging audio,

00:42:33.280 | but that's what you literally did.

00:42:35.440 | - Right, I mean, I was literally chatting

00:42:37.160 | with someone at Google earlier today

00:42:39.840 | about how some people think that like,

00:42:41.760 | you need a linguistics person in the room

00:42:43.920 | for like making a good chatbot,

00:42:45.720 | but that's not actually true.

00:42:46.680 | 'Cause like this person went to school for linguistics

00:42:49.400 | and according to him, he's an engineer now,

00:42:51.320 | according to him, like most of his classmates

00:42:53.360 | were not actually good at language.

00:42:55.320 | Like they knew how to analyze language

00:42:58.280 | and like sort of the mathematical patterns

00:43:00.000 | and rhythms and language,

00:43:01.560 | but that doesn't necessarily mean

00:43:02.840 | they were gonna be eloquent at like,

00:43:05.400 | while speaking or writing.

00:43:06.800 | So I think, yeah, a lot of we haven't invested

00:43:09.760 | in specialists in the audio format yet,

00:43:12.720 | but maybe that would.

00:43:13.760 | - I think it's like super interesting

00:43:15.040 | because I think there's like a very human question

00:43:17.800 | of like what makes something interesting.

00:43:19.880 | And there's like a very deep question of like,

00:43:23.680 | what is it, right?

00:43:25.080 | Like, what is the quality that we are all looking for?

00:43:27.760 | Is it, does somebody have to be funny?

00:43:29.400 | Does something have to be entertaining?

00:43:30.720 | Does something have to be straight to the point?

00:43:32.800 | And I think when you try to distill that,

00:43:35.040 | this is the interesting thing I think

00:43:36.440 | about our experiment, about this particular launch is,

00:43:38.840 | first, we only launched one format.

00:43:41.000 | And so we sort of had to squeeze everything we believed

00:43:44.480 | about what an interesting thing is into one package.

00:43:48.080 | And as a result of it, I think we learned,

00:43:49.480 | it's like, hey, interacting with a chatbot

00:43:52.000 | is sort of novel at first, but it's not interesting, right?

00:43:55.520 | It's like humans are what makes interacting

00:43:58.440 | with chatbots interesting.

00:43:59.840 | It's like, ha, ha, ha, I'm gonna try to trick it.

00:44:01.880 | It's like, that's interesting, spell strawberry, right?

00:44:04.640 | This is like the fun that like people have with it.

00:44:06.960 | But like, that's not the LLM being interesting, that's you,

00:44:09.560 | just like kind of giving it your own flavor.

00:44:11.680 | But it's like, what does it mean to sort of flip it

00:44:14.160 | on its head and say, no, you be interesting now, right?

00:44:17.240 | Like you give the chatbot the opportunity to do it.

00:44:20.240 | And this is not a chatbot per se, it is like just the audio.

00:44:24.520 | And it's like the texture, I think,

00:44:26.560 | that really brings it to life.

00:44:28.600 | And it's like the things that we've described here,

00:44:30.440 | which was like, okay, now I have to like lead you

00:44:32.760 | down a path of information about like

00:44:34.880 | this commercialization deck.

00:44:36.560 | It's like, how do you do that?

00:44:38.640 | To be able to successfully do it,

00:44:40.560 | I do think that you need experts.

00:44:42.640 | I think we'll engage with experts like down the road,

00:44:45.360 | but I think it will have to be in the context of,

00:44:48.200 | well, what's the next thing we're building, right?

00:44:50.360 | It's like, what am I trying to change here?

00:44:52.240 | What do I fundamentally believe needs to be improved?

00:44:55.200 | And I think there's still like a lot more studying

00:44:57.640 | that we have to do in terms of like,

00:44:59.040 | well, what are people actually using this for?

00:45:01.320 | And we're just in such early days.

00:45:03.000 | Like it hasn't even been a month.

00:45:04.680 | - Two, three weeks, three weeks, I think.

00:45:06.880 | - Yeah.

00:45:07.720 | - I think the other, one other element to that is the,

00:45:10.280 | like the fact that you're bringing your own sources to it.

00:45:13.200 | Like it's your stuff.

00:45:14.520 | Like, you know this somewhat well,

00:45:16.600 | or you care to know about this.

00:45:18.040 | So like that, I think changed the equation

00:45:20.200 | on its head as well.

00:45:21.280 | It's like your sources and someone's telling you about it.

00:45:24.480 | So like you care about how that dynamic is,

00:45:27.200 | but you just care for it to be good enough

00:45:29.120 | to be entertaining.

00:45:30.200 | 'Cause ultimately they're talking about

00:45:31.480 | your mortgage deed or whatever.

00:45:33.760 | - So it's interesting just from the topic itself,

00:45:36.640 | even taking out all the agreements

00:45:38.560 | and the hiding of the slow reveal.

00:45:40.520 | - I mean, there's a baseline maybe,

00:45:42.000 | like if it was like too drab,

00:45:43.440 | like if it was someone who was reading it off,

00:45:44.840 | like, you know, that's like the absolute worst, but like.

00:45:47.680 | - Do you prompt for humor?

00:45:49.680 | That's a tough one, right?

00:45:51.880 | - I think it's more of a generic way

00:45:55.720 | to bring humor out if possible.

00:45:57.520 | I think humor is actually one of the hardest things.

00:45:59.640 | - Yeah.

00:46:00.480 | - But I don't know if you saw.

00:46:01.320 | - That is AGI, humor is AGI.

00:46:02.160 | - Yeah, but did you see the chicken one?

00:46:03.880 | - No.

00:46:04.720 | - Okay, if you haven't heard it.

00:46:05.920 | - We'll splice it in here.

00:46:06.800 | - Okay, yeah, yeah.

00:46:07.640 | There is a video on threads.

00:46:10.040 | I think it was by Martino Wong.

00:46:12.880 | And it's a PDF.

00:46:16.360 | - Welcome to your deep dive for today.

00:46:18.400 | - Oh yeah, get ready for a fun one.

00:46:20.160 | - Buckle up because we are diving into

00:46:24.120 | chicken, chicken, chicken, chicken, chicken.

00:46:28.640 | - You got that right.

00:46:29.760 | - By Doug Zonker.

00:46:31.080 | - Now.

00:46:31.920 | - And yes, you heard that title correctly.

00:46:33.720 | - Titles.

00:46:34.560 | - Our listener today submitted this paper.

00:46:36.480 | - Yeah, they're gonna need our help.

00:46:37.840 | - And I can totally see why.

00:46:39.320 | - Absolutely.

00:46:40.160 | - It's dense, it's baffling.

00:46:41.760 | - It's a lot.

00:46:42.600 | - And it's packed with more chicken than a KFC buffet.

00:46:46.560 | - Wait, that's hilarious, that's so funny.

00:46:49.880 | So it's like stuff like that,

00:46:51.200 | that's like truly delightful, truly surprising,

00:46:53.600 | but it's like, we didn't tell it to be funny.

00:46:55.240 | - Humor's contextual also, like super contextual

00:46:57.880 | what we're realizing.

00:46:58.720 | So we're not prompting for humor,

00:47:00.200 | but we're prompting for maybe a lot of other things

00:47:02.440 | that are bringing out that humor.

00:47:03.920 | - I think the thing about ad generated content,

00:47:06.320 | if we look at YouTube, like we do videos on YouTube

00:47:09.040 | and it's like, you know, a lot of people are screaming

00:47:11.200 | in the thumbnails to get clicks.

00:47:12.640 | There's like everybody, there's kind of like a meta

00:47:15.720 | of like what you need to do to get clicks.

00:47:18.120 | But I think in your product,

00:47:19.640 | there's no actual creator on the other side

00:47:22.480 | investing the time.

00:47:23.320 | So you can actually generate a type of content

00:47:25.400 | that is maybe not universally appealing,

00:47:28.280 | you know, at a much. - It's personal.

00:47:29.800 | - Yeah, exactly.

00:47:30.800 | I think that's the most interesting thing.

00:47:32.200 | It's like, well, is there a way for like,

00:47:35.280 | take Mr. Beast, right?

00:47:36.800 | It's like Mr. Beast optimizes videos

00:47:39.560 | to reach the biggest audience and like the most clicks.

00:47:42.320 | But what if every video could be kind of like regenerated

00:47:45.280 | to be closer to your taste, you know, when you watch it?

00:47:48.640 | - I think that's kind of the promise of AI

00:47:50.840 | that I think we are just like touching on,

00:47:53.240 | which is I think every time I've gotten information

00:47:56.280 | from somebody, they have delivered it to me

00:47:57.840 | in their preferred method, right?

00:47:59.440 | Like if somebody gives me a PDF, it's a PDF.

00:48:01.600 | Somebody gives me a hundred slide deck,

00:48:03.280 | that is the format in which I'm going to read it.

00:48:05.280 | But I think we are now living in the era

00:48:07.280 | where transformations are really possible,

00:48:09.280 | which is look, like I don't want to read

00:48:11.400 | your hundred slide deck,

00:48:12.280 | but I'll listen to a 16 minute audio overview

00:48:14.320 | on the drive home.

00:48:15.400 | - Yeah.

00:48:16.240 | - And that I think is really novel.

00:48:18.920 | And that is paving the way in a way

00:48:21.560 | that like maybe we wanted, but didn't expect.

00:48:25.120 | Where I also think you're listening to a lot of content

00:48:28.480 | that normally wouldn't have had content made about it.

00:48:31.360 | Like I watched this TikTok

00:48:32.880 | where this woman uploaded her diary from 2004.

00:48:36.240 | For sure, right?

00:48:37.080 | Like nobody was going to make a podcast about a diary.

00:48:39.280 | Like hopefully not, like it seems kind of embarrassing.

00:48:41.560 | - It's kind of creepy.

00:48:42.400 | - Yeah, it's kind of creepy.

00:48:43.520 | But she was doing this like live listen of like,

00:48:45.600 | "Oh, like here's a podcast about my diary."

00:48:48.120 | And it's like, it's entertaining right now

00:48:50.520 | to sort of all listen to it together.

00:48:52.520 | But like the connection is personal.

00:48:54.040 | It was like, it was her interacting

00:48:55.760 | with like her information in a totally different way.

00:48:58.120 | And I think that's where like,

00:48:59.440 | oh, that's a super interesting space, right?

00:49:01.080 | Where it's like, I'm creating content for myself

00:49:03.760 | in a way that suits the way that I want to consume it.

00:49:06.520 | - Or people compare like retirement plan options.

00:49:09.360 | Like no one's going to give you that content

00:49:11.520 | like for your personal financial situation.

00:49:14.880 | And like, even when we started out the experiment,

00:49:16.640 | like a lot of the goal was to go for really obscure content

00:49:21.640 | and see how well we could transform that.

00:49:23.720 | So like, if you look at the Mountain View,

00:49:25.520 | like city council meeting notes,

00:49:27.800 | like you're never going to read it.

00:49:29.200 | But like, if it was a three minute summary,

00:49:31.160 | like that would be interesting.

00:49:32.720 | - I see.

00:49:33.560 | You have one system, one prompt

00:49:35.360 | that just covers everything you threw at it.

00:49:37.720 | - Maybe.

00:49:38.800 | - No, I'm just kidding.

00:49:41.000 | It's really interesting.

00:49:41.880 | You know, I'm trying to figure out

00:49:43.840 | what you nailed compared to others.

00:49:46.440 | And I think that the way that you treat your,

00:49:48.760 | the AI is like a little bit different

00:49:50.400 | than a lot of the builders I talked to.

00:49:52.200 | So I don't know what it is you said.

00:49:54.120 | I wish I had a transcript right in front of me,

00:49:55.640 | but it's something like,

00:49:56.600 | people treat AI as like a tool for thought,

00:49:58.160 | but usually it's kind of doing their bidding.

00:50:00.800 | And you know, what you're really doing

00:50:02.600 | is loading up these like two virtual agents.

00:50:06.080 | I don't, you've never said the word agents,

00:50:07.880 | I put that in your mouth,

00:50:08.800 | but two virtual humans or AIs

00:50:11.000 | and letting them form their own opinion

00:50:13.200 | and letting them kind of just live

00:50:15.000 | and embody it a little bit.

00:50:16.560 | Is that accurate?

00:50:17.800 | - I think that that is as close to accurate as possible.

00:50:21.560 | I mean, in general, I try to be careful about saying like,

00:50:23.800 | oh, you know, letting, you know, yeah,

00:50:26.040 | like these personas live.

00:50:27.560 | But I think to your earlier question of like,

00:50:29.920 | what makes it interesting?

00:50:30.920 | That's what it takes to make it interesting.

00:50:32.360 | - Yeah.

00:50:33.200 | - Right, and I think to do it well

00:50:34.360 | is like a worthy challenge.

00:50:35.560 | I also think that it's interesting

00:50:37.840 | because they're interested, right?

00:50:39.320 | Like, is it interesting to compare-

00:50:41.200 | - The O'Carnegie thing.

00:50:42.200 | - Yeah, is it interesting to have two retirement plans?

00:50:46.320 | No, but to listen to these two talk about it,

00:50:50.120 | oh my gosh, you'd think it was like

00:50:51.320 | the best thing ever invented, right?

00:50:52.800 | It's like, get this, deep dive into 401k

00:50:57.800 | through Chase versus, you know, whatever.

00:51:00.480 | - They do do a lot of get this, which is funny.

00:51:02.920 | - I know, I know, I dream about it.

00:51:04.760 | I'm sorry.

00:51:06.920 | - There's a, I have a few more questions

00:51:10.520 | on just like the engineering around this.

00:51:13.600 | And obviously some of this is just me

00:51:15.520 | creatively asking how this works.

00:51:17.240 | How do you make decisions between

00:51:18.600 | when to trust the AI overlord to decide for you?

00:51:22.960 | In other words, stick it, let's say products as it is today,

00:51:26.640 | you want to improve it in some way.

00:51:28.920 | Do you engineer it into the system?

00:51:30.960 | Like write code to make sure it happens

00:51:34.080 | or you just stick it in a prompt

00:51:35.160 | and hope that the LM does it for you?

00:51:38.160 | Do you know what I mean?

00:51:39.000 | - Do you mean specifically about audio

00:51:40.400 | or sort of in general?

00:51:41.440 | - In general, like designing AI products,

00:51:44.120 | I think this is like the one thing

00:51:45.440 | that people are struggling with.

00:51:48.000 | And there's compound AI people

00:51:50.040 | and then there's big AI people.

00:51:51.320 | So compound AI people will be like Databricks,

00:51:53.320 | have lots of little models, chain them together

00:51:55.720 | to make an output.

00:51:56.720 | It's deterministic, you control every single piece

00:51:59.000 | and you produce what you produce.

00:52:01.160 | The open AI people, totally the opposite,

00:52:03.200 | like write one giant prompts

00:52:04.480 | and let the model figure it out.

00:52:06.040 | And obviously the answer for most people

00:52:07.880 | is going to be a spectrum in between those two,

00:52:09.680 | like big model, small model.

00:52:10.840 | When do you decide that?

00:52:11.840 | - I think it depends on the task.

00:52:13.560 | It also depends on, well, it depends on the task,

00:52:16.120 | but ultimately depends on what is your desired outcome?

00:52:19.200 | Like what am I engineering for here?

00:52:21.600 | And I think there's like several potential outputs

00:52:23.640 | and there's sort of like general categories.

00:52:24.880 | Am I trying to delight somebody?

00:52:26.080 | Am I trying to just like meet

00:52:27.200 | whatever the person is trying to do?

00:52:29.000 | Am I trying to sort of simplify a workflow?

00:52:31.120 | At what layer am I implementing this?

00:52:32.920 | Am I trying to implement this as part of the stack

00:52:35.680 | to reduce like friction,

00:52:37.840 | particularly for like engineers or something?

00:52:39.760 | Or am I trying to engineer it

00:52:40.840 | so that I deliver like a super high quality thing?

00:52:44.080 | I think that the question of like, which of those two,

00:52:47.120 | I think you're right, it is a spectrum.

00:52:49.160 | But I think fundamentally it comes down to like,

00:52:52.320 | it's a craft, like it's still a craft

00:52:54.680 | as much as it is a science.

00:52:56.360 | And I think the reality is like,

00:52:58.480 | you have to have a really strong POV

00:53:00.200 | about like what you want to get out of it

00:53:02.240 | and to be able to make that decision.

00:53:04.080 | Because I think if you don't have that strong POV,

00:53:06.240 | like you're going to get lost in sort of the detail

00:53:07.960 | of like capability.

00:53:09.440 | And capability is sort of the last thing that matters

00:53:12.360 | because it's like models will catch up, right?

00:53:14.360 | Like models will be able to do, you know,

00:53:16.280 | whatever in the next five years, it's going to be insane.

00:53:18.880 | So I think this is like a race to like value.

00:53:21.600 | And it's like really having a strong opinion about like,

00:53:24.480 | what does that look like today?

00:53:25.720 | And how far are you going to be able to push it?

00:53:28.080 | Sorry, I think maybe that was like very like philosophical.

00:53:31.120 | - It's fine, we get there.

00:53:32.520 | And I think that hits a lot of the points it's going to make.

00:53:35.320 | I tweeted today, or I ex-posted, whatever,

00:53:38.840 | that we're going to interview you

00:53:41.040 | on what we should ask you.

00:53:42.120 | So we got a list of feature requests, mostly.

00:53:45.400 | It's funny, nobody actually had any like specific questions

00:53:48.840 | about how the product was built.

00:53:50.000 | They just want to know when you're releasing some feature.

00:53:52.280 | So I know you cannot talk about all of these things,

00:53:54.760 | but I think maybe it will give people an idea

00:53:56.720 | of like where the product is going.

00:53:58.120 | So I think the most common question,

00:53:59.960 | I think five people asked is like,

00:54:01.560 | are you going to build an API?

00:54:03.280 | And, you know, do you see this product

00:54:05.320 | as still be kind of like a full head product,

00:54:07.840 | where I can log in and do everything there?

00:54:09.920 | Or do you want it to be a piece of infrastructure

00:54:12.000 | that people build on?

00:54:13.120 | - I mean, I think, why not both?

00:54:15.920 | I think we work at a place where you could have both.

00:54:18.840 | I think that end user products,

00:54:21.440 | like products that touch the hands of users,

00:54:23.320 | have a lot of value.

00:54:24.640 | For me personally, like we learn a lot

00:54:26.200 | about what people are trying to do

00:54:27.400 | and what's like actually useful

00:54:29.040 | and what people are ready for.

00:54:30.920 | And so we're going to keep investing in that.

00:54:33.640 | I think at the same time, right,

00:54:35.520 | there are a lot of developers that are interested

00:54:37.840 | in using the same technology to build their own thing.

00:54:40.120 | We're going to look into that.

00:54:41.720 | How soon that's going to be ready, I can't really comment,

00:54:44.080 | but these are the things that like, hey, we heard it.

00:54:47.120 | We're trying to figure it out.

00:54:48.440 | And I think there's room for both.

00:54:50.360 | - Is there a world in which this becomes

00:54:52.240 | a default Gemini interface

00:54:53.520 | because it's technically different org?

00:54:55.280 | - It's such a good question.

00:54:56.480 | And I think every time someone asks me, it's like,

00:54:58.600 | hey, I just leaned over Gilliam.

00:55:00.640 | (laughing)

00:55:02.200 | We'll ask the Gemini folks what they think.

00:55:05.200 | - Multilingual support.

00:55:06.640 | I know people kind of hack this a little bit together.

00:55:09.040 | Any ideas for full support,

00:55:10.840 | but also I'm mostly interested in dialects.

00:55:13.440 | In Italy, we have Italian obviously,

00:55:15.720 | but we have a lot of local dialects.

00:55:17.440 | Like if you go to Rome, people don't really speak Italian,

00:55:19.600 | they speak local dialect.

00:55:21.240 | Do you think there's a path to which these models,

00:55:24.240 | especially the speech can learn very like niche dialects,

00:55:28.440 | like how much data do you need?

00:55:29.960 | Can people contribute?

00:55:31.120 | Like, I'm curious if you see this as a possibility.

00:55:35.560 | - So I guess high level,

00:55:36.800 | like we're definitely working on adding more languages.

00:55:39.640 | That's like top priority.

00:55:41.240 | We're going to start small,

00:55:42.560 | but like theoretically we should be able to cover

00:55:44.600 | like most languages pretty soon.

00:55:46.840 | - What a ridiculous statement by the way, that's crazy.

00:55:49.360 | - Unlike the soon or the pretty soon part.

00:55:52.480 | - No, but like, you know, a few years ago,

00:55:54.680 | like a small team of like, I don't know, 10 people saying

00:55:57.120 | that we will support the top 100, 200 languages

00:55:59.640 | is like absurd, but you can do it.

00:56:02.400 | You can do it.

00:56:03.240 | - And I think like the speech team, you know,

00:56:06.320 | we are a small team,

00:56:07.720 | but the speech team is another team and the modeling team,

00:56:11.080 | like these folks are just like absolutely brilliant

00:56:13.920 | at what they do.

00:56:14.760 | And I think like when we've talked to them and we've said,

00:56:16.760 | hey, you know, how about more languages?

00:56:18.600 | How about more voices?

00:56:19.560 | How about dialects, right?

00:56:20.640 | This is something that like they are game to do.

00:56:23.600 | And like, that's the roadmap for them.

00:56:25.840 | The speech team supports like a bunch of other efforts

00:56:27.920 | across Google, like Gemini Live, for example,

00:56:30.120 | is also the models built by the same,

00:56:32.040 | like sort of deep mind speech team.

00:56:34.160 | But yeah, the thing about dialects is really interesting.

00:56:36.320 | 'Cause like in some of our sort of earliest testing

00:56:39.560 | with trying out other languages,

00:56:40.800 | we actually noticed that sometimes it wouldn't stick

00:56:44.200 | to a certain dialect, especially for like,

00:56:46.720 | I think for French, we noticed that like

00:56:48.440 | when we presented it to like a native speaker,

00:56:50.280 | it would sometimes go from like a Canadian person

00:56:52.440 | speaking French versus like a French person speaking French

00:56:54.880 | or an American person speaking French,

00:56:56.280 | which is not what we wanted.

00:56:58.560 | So there's a lot more sort of speech quality work

00:57:01.360 | that we need to do there to make sure that it works reliably

00:57:04.240 | and at least sort of like the standard dialect that we want.

00:57:07.680 | But that does show that there's potential

00:57:09.360 | to sort of do the thing that you're talking about

00:57:11.240 | of like fixing a dialect that you want,

00:57:13.720 | maybe contribute your own voice

00:57:15.680 | or like you pick from one of the options.

00:57:17.920 | There's a lot more headroom there.

00:57:19.760 | - Yeah, because we have movies.

00:57:21.280 | Like we have old Roman movies

00:57:23.120 | that are like different languages,

00:57:25.040 | but there's not that many, you know?

00:57:26.920 | So I'm always like, well,

00:57:28.320 | I'm sure like the Italian is so strong in the model

00:57:31.480 | that like when you're trying to like pull that away from it,

00:57:33.720 | like you kind of need a lot, but-

00:57:35.440 | - Right, that's all sort of like

00:57:36.960 | wonderful deep mind speech team.

00:57:38.640 | - Yeah. - Yeah, yeah, yeah.

00:57:39.880 | - Well, anyway, if you need Italian, he's got you.

00:57:41.520 | - Yeah, yeah, yeah.

00:57:42.360 | - I got him on, I got him on.

00:57:44.200 | - Specifically, it's English, I got you.

00:57:46.200 | The managing system prompt, people want a lot of that.

00:57:49.400 | I assume yes-ish.

00:57:51.200 | Definitely looking into it for just core notebook LM.

00:57:55.600 | Like everybody's wanted that forever.

00:57:57.360 | So we're working on that.

00:57:58.560 | I think for the audio itself,

00:58:01.080 | we are trying to figure out the best way to do it.

00:58:03.760 | So we'll launch something sooner rather than later.

00:58:06.840 | So we'll probably stage it.

00:58:08.280 | And I think like, you know, just to be fully transparent,

00:58:10.840 | we'll probably launch something

00:58:12.320 | that's more of a fast follow

00:58:13.480 | than like a fully baked feature first.

00:58:15.120 | Just because like I see so many people

00:58:16.720 | put in like the fake show notes,

00:58:18.480 | it's like, hey, I'll help you out.

00:58:19.720 | We'll just put a text fax or something, yeah.

00:58:21.560 | - I think a lot of people are like, this is almost perfect,

00:58:23.800 | but like, I just need that extra 10, 20%.

00:58:25.960 | - Yeah.

00:58:26.800 | - I noticed that you say no a lot, I think,

00:58:29.160 | or you try to ship one thing.

00:58:30.840 | - Yeah.

00:58:31.680 | - And that is different about you

00:58:33.200 | than maybe other PMs or other eng teams

00:58:35.840 | that try to ship, they're like, oh, here are all the knobs.

00:58:38.080 | I'm just, take all my knobs.

00:58:39.800 | - Yeah, yeah.

00:58:40.640 | - Top P, top K, it doesn't matter.

00:58:42.160 | I'll just put it in the docs and you figure it out, right?

00:58:44.120 | - That's right, that's right.

00:58:45.200 | - Whereas for you, it's you actually just,

00:58:47.600 | you make one product.

00:58:49.280 | - Yeah.

00:58:50.120 | - As opposed to like 10 you could possibly have done.

00:58:51.640 | - Yeah, yeah.

00:58:52.480 | - I don't know, it's interesting.

00:58:53.320 | - I think about this a lot.

00:58:54.160 | I think it requires a lot of discipline

00:58:55.440 | because I thought about the knobs.

00:58:57.760 | I was like, oh, I saw on Twitter, you know, on X,

00:59:01.000 | people want the knobs, like, great.

00:59:02.600 | Started mocking it up, making the text boxes,

00:59:05.040 | designing like the little fiddles, right?

00:59:06.960 | And then I looked at it and I was kind of sad.

00:59:08.720 | I was like, oh, right, it's like, oh, it's like,

00:59:11.040 | this is not cool, this is not fun, this is not magical.

00:59:14.080 | It is sort of exactly what you would expect knobs to be.

00:59:19.080 | But then, you know, it's like, oh, I mean,

00:59:21.520 | how much can you, you know, design a knob?

00:59:24.520 | I thought about it, I was like,

00:59:26.120 | but the thing that people really liked

00:59:27.960 | was that there wasn't any.

00:59:29.600 | They just pushed a button.

00:59:30.440 | - One button.

00:59:31.280 | - And it was cool.

00:59:32.240 | And so I was like, how do we bring more of that, right?

00:59:34.920 | That still gives the user the optionality that they want.

00:59:37.760 | And so this is where, like,

00:59:38.680 | you have to have a strong POV, I think.

00:59:40.520 | You have to like really boil down,

00:59:42.040 | what did I learn in like the month

00:59:43.920 | since I've launched this thing that people really want?

00:59:47.120 | And I can give it to them while preserving like that,

00:59:49.800 | that delightful sort of fun experience.

00:59:52.400 | And I think that's actually really hard.

00:59:54.120 | Like, I'm not gonna come up with that by myself.

00:59:55.800 | I'm like, that's something

00:59:56.640 | that like our team thinks about every day.

00:59:58.200 | We all have different ideas.

00:59:59.760 | We're all experimenting with sort of how to get the most

01:00:03.200 | out of like the insight and also ship it quick.

01:00:05.720 | So we'll see, we'll find out soon

01:00:07.760 | if people like it or not.

01:00:08.600 | - I think the other interesting thing

01:00:09.800 | about like AI development now

01:00:12.000 | is that the knobs are not necessarily,

01:00:15.080 | like going back to all the sort of like craft

01:00:18.240 | and like human taste and all of that

01:00:20.560 | that went into building it.

01:00:21.880 | Like the knobs are not as easy to add as simply like,

01:00:26.400 | I'm gonna add a parameter to this

01:00:28.560 | and it's gonna make it happen.

01:00:29.880 | It's like, you kind of have to redo

01:00:31.880 | the quality process for everything.

01:00:34.200 | But the prioritization is also different though.

01:00:36.800 | - It goes back to sort of like,

01:00:38.040 | it's a lot easier to do an eval

01:00:39.480 | for like the deep dive format than if like,

01:00:41.720 | okay, now I'm gonna let you inject

01:00:43.600 | like these random things, right?

01:00:45.160 | Okay, how am I gonna measure quality?

01:00:46.600 | Either I say, well, I don't care

01:00:48.520 | because like you just input whatever.

01:00:50.880 | Or I say, actually wait, right?

01:00:53.000 | Like I wanna help you get the best output ever.

01:00:54.920 | What's it going to take?

01:00:55.960 | - The knob actually needs to work reliably.

01:00:58.160 | - Yeah. - Yeah.

01:00:59.000 | Very important point.

01:01:00.000 | - Two more things we definitely wanna talk about.

01:01:02.160 | I guess now people equivalent notebook LM

01:01:05.040 | to like a podcast generator,

01:01:06.520 | but I guess, you know,

01:01:08.040 | there's a whole product suite there.

01:01:10.320 | How should people think about that?

01:01:11.800 | Like, is this, and also like the future of the product

01:01:14.720 | as far as monetization too, you know?

01:01:16.840 | Like, is it gonna be,

01:01:18.560 | the voice thing gonna be a core to it?

01:01:20.160 | Is it just gonna be one output modality

01:01:22.160 | and like you're still looking to build like a broader

01:01:24.400 | kind of like a interface with data and documents platform?

01:01:27.840 | - I mean, that's such a good question

01:01:29.960 | that I think the answer it's,

01:01:32.560 | I'm waiting to get more data.

01:01:34.640 | I think because we are still in the period

01:01:36.600 | where everyone's really excited about it.

01:01:38.600 | Everyone's trying it.

01:01:40.080 | I think I'm getting a lot of sort of like positive feedback

01:01:42.760 | on the audio.

01:01:43.960 | We have some early signal that says it's a really good hook,

01:01:47.240 | but people stay for the other features.

01:01:49.040 | So that's really good too.

01:01:50.360 | I was making a joke yesterday.

01:01:51.400 | I was like, it'd be really nice, you know,

01:01:53.560 | if it was just the audio,

01:01:55.720 | 'cause then I could just like simplify the train, right?

01:01:58.360 | I don't have to think about all this other functionality.

01:02:00.800 | But I think the reality is that the framework

01:02:03.960 | kind of like what we were talking about earlier

01:02:05.440 | that we had laid out,

01:02:06.280 | which is like, you bring your own sources,

01:02:07.880 | there's something you do in the middle,

01:02:09.240 | and then there's an output is that really extensible one.

01:02:12.120 | And it's a really interesting one.

01:02:13.320 | And I think like, particularly when we think about

01:02:16.080 | what a big business looks like,

01:02:17.960 | especially when we think about commercialization,

01:02:20.280 | audio is just one such modality.

01:02:23.520 | But the editor itself,

01:02:25.080 | like the space in which you're able to do these things

01:02:27.800 | is like, that's the business, right?

01:02:29.480 | Like maybe the audio by itself, not so much,

01:02:32.160 | but like in this big package,

01:02:33.640 | like, oh, I could see that.

01:02:34.480 | I could see that being like a really big business.

01:02:37.240 | - Yep.

01:02:38.080 | Any thoughts on some of the alternative

01:02:40.160 | interact with data and documents thing,

01:02:42.080 | like cloud artifacts, like a JGBD canvas,

01:02:45.640 | you know, kind of how do you see,

01:02:47.280 | maybe where notebook LM stars,

01:02:48.760 | but like Gemini starts,

01:02:50.520 | like you have so many amazing teams and products at Google

01:02:53.280 | that sometimes like, I'm sure you have to figure that out.

01:02:56.200 | - Yeah, well, I love artifacts.

01:02:59.480 | I played a little bit with canvas.

01:03:00.720 | I got a little dizzy using it.

01:03:02.120 | I was like, oh, there's something, well, you know,

01:03:04.600 | I like the idea of it fundamentally,

01:03:06.960 | but something about the UX was like,

01:03:08.400 | oh, this is like more disorienting than like artifacts.

01:03:11.000 | And I couldn't figure out what it was.

01:03:12.440 | And I didn't spend a lot of time thinking about it,

01:03:14.600 | but I love that, right?

01:03:16.640 | Like the thing where you are like,

01:03:18.480 | I'm working with, you know, an LLM, an agent,

01:03:21.560 | a chap or whatever to create something new.

01:03:24.280 | And there's like the chat space.

01:03:26.200 | There's like the output space.

01:03:27.880 | I love that.

01:03:28.760 | And the thing that I think I feel angsty about

01:03:31.600 | is like, we've been talking about this for like a year,

01:03:34.680 | right?

01:03:35.520 | Like, of course, like, I'm going to say that,

01:03:36.720 | but it's like, but like for a year now,

01:03:38.640 | I've had these like mocks that I was just like,

01:03:40.880 | I want to push the button, but we prioritize other things.

01:03:43.920 | We were like, okay, what can we like really win at?

01:03:46.200 | And like, we prioritize audio, for example, instead of that.

01:03:49.440 | But just like when people were like,

01:03:51.160 | oh, what is this magic draft thing?

01:03:52.560 | Oh, it's like a hundred percent, right?

01:03:54.120 | It's like stuff like that,

01:03:55.880 | that we want to try to build into notebook too.

01:03:57.560 | And I'd made this comment on Twitter as well,

01:03:59.880 | where I was like, now I don't know, actually, right?

01:04:02.720 | I don't actually know if that is the right thing.

01:04:05.240 | Like, are people really getting utility out of this?

01:04:07.680 | I mean, from the launches,

01:04:09.080 | it seems like people are really getting it.

01:04:11.000 | But I think now if we were to ship it,

01:04:12.960 | I have to rev on it like one layer more, right?

01:04:15.120 | I have to deliver like a differentiating value

01:04:17.800 | compared to like artifacts, which is hard.

01:04:20.200 | - Which is, because you've,

01:04:21.880 | you demonstrated the ability to fast follow.

01:04:24.160 | So you don't have to innovate every single time.

01:04:26.680 | - I know, I know.

01:04:27.520 | I think for me, it's just like,

01:04:28.960 | the bar is high to ship.

01:04:30.760 | And when I say that, I think it's sort of like,

01:04:32.480 | conceptually, like the value that you deliver to the user.

01:04:34.640 | I mean, you'll see a notebook alarm.

01:04:36.160 | There are a lot of corners that I have personally cut,

01:04:38.560 | where it's like, our UX designer is always like,

01:04:40.920 | I can't believe you let us ship

01:04:42.880 | with like these ugly scroll bars.

01:04:44.440 | And I'm like, no one notices, I promise.

01:04:47.120 | He's like, no, everyone.

01:04:48.880 | It's a screenshot, this thing.

01:04:50.160 | But I mean, kidding aside, I think that's true,

01:04:52.800 | that it's like, we do want to be able to fast follow,

01:04:54.960 | but I think we want to make sure

01:04:56.280 | that things also land really well.

01:04:58.120 | So the utility has to be there.

01:04:59.720 | - Code, especially on our podcast, has a special place.

01:05:03.160 | Is code notebook LLM interesting to you?

01:05:06.160 | I haven't, I've never,

01:05:07.280 | I don't see like a connect my GitHub to this thing.

01:05:09.880 | - Yeah, yeah.

01:05:10.720 | I think code is a big one.

01:05:12.560 | Code is a big one.

01:05:13.600 | I think we have been really focused,

01:05:15.800 | especially when we had like a much smaller team,

01:05:17.960 | we were really focused on like,

01:05:18.920 | let's push like an end-to-end journey together.

01:05:21.200 | Let's prove that we can do that.

01:05:22.720 | Because then once you lay the groundwork of like,

01:05:24.880 | sources, do something in the chat,

01:05:27.200 | output, once you have that,

01:05:28.360 | you just scale it up from there, right?

01:05:30.080 | And it's like, now it's just a matter of like,

01:05:31.840 | scaling the inputs, scaling the outputs,

01:05:33.880 | scaling the capabilities of the chat.

01:05:35.800 | So I think we're going to get there.

01:05:37.400 | And now I also feel like I have a much better view

01:05:40.760 | of like where the investment is required.

01:05:43.160 | Whereas previously I was like,

01:05:44.320 | hey, like, let's flesh out the story first

01:05:46.160 | before we put more engineers on this thing,

01:05:47.840 | because that's just going to slow us down.

01:05:49.880 | - For what it's worth, the model still understands code.

01:05:52.280 | So like, I've seen at least one or two people

01:05:55.000 | just like, download their GitHub repo,

01:05:57.080 | put it in there and get like an audio overview of your code.

01:06:00.320 | - Yeah, yeah.

01:06:01.160 | - I've never tried that.

01:06:02.000 | - This is like, these are all,

01:06:03.360 | all the files are connected together.

01:06:04.880 | 'Cause the model still understands code.

01:06:06.280 | Like, even if you haven't like, optimized for it.

01:06:07.560 | - I think on sort of like the creepy side of things,

01:06:10.920 | I did watch a student, like with her permission, of course,

01:06:13.720 | I watched her do her homework in Notebook LM.

01:06:17.200 | And I didn't tell her like, what kind of homework to bring,

01:06:20.480 | but she brought like her computer science homework.

01:06:23.520 | And I was like, oh.

01:06:24.440 | And she uploaded it and she said,

01:06:26.760 | here's my homework, read it.

01:06:28.600 | And it was just the instructions.

01:06:29.800 | And Notebook LM was like, okay, I've read it.

01:06:32.760 | And the student was like, okay, here's my code so far.

01:06:37.080 | And she copy pasted it from the editor.

01:06:39.120 | And she was like, check my homework.

01:06:41.440 | And Notebook LM was like, well, number one is wrong.

01:06:44.040 | And I thought that was really interesting,

01:06:45.480 | 'cause it didn't tell her what was wrong.

01:06:46.840 | It just said it's wrong.

01:06:48.120 | And she was like, okay, don't tell me the answer,

01:06:50.720 | but like, walk me through like how you'd think about this.

01:06:53.360 | And it was, what was interesting for me

01:06:55.800 | was that she didn't ask for the answer.

01:06:58.000 | And I asked her, I was like, oh, why did you do that?

01:06:59.480 | And she was like, well, I actually want to learn it.

01:07:01.240 | She was like, 'cause I'm going to have to take a quiz

01:07:02.520 | on this at some point.

01:07:03.520 | And I was like, oh yeah, this is a really good point.

01:07:05.920 | And it was interesting because, you know,

01:07:07.920 | Notebook LM, while the formatting wasn't perfect,

01:07:09.880 | like did say like, hey, have you thought about using,

01:07:12.560 | you know, maybe an integer instead of like this?

01:07:14.760 | And so that was really interesting.

01:07:16.960 | - Are you adding like real-time chat on the output?

01:07:19.880 | Like, you know, there's kind of like the deep dive show

01:07:22.400 | and then there's like the listeners call in and say, hey.

01:07:26.400 | - Yeah, we're actively, that's one of the things

01:07:28.040 | we're actively prioritizing.

01:07:29.560 | Actually, one of the interesting things is now we're like,

01:07:31.880 | why would anyone want to do that?

01:07:33.560 | Like, what are the actual, like kind of going back

01:07:35.960 | to sort of having a strong POV about the experience.

01:07:38.760 | It's like, what is better?

01:07:41.040 | Like, what is fundamentally better about doing that?

01:07:43.040 | That's not just like being able to Q&A your notebook.

01:07:45.400 | How is that different from like a conversation?

01:07:47.480 | Is it just the fact that like there was a show

01:07:50.120 | and you want to tweak the show?

01:07:51.880 | Is it because you want to participate?

01:07:53.760 | So I think there's a lot there

01:07:55.120 | that like we can continue to unpack, but yes,

01:07:57.400 | that's coming.

01:07:58.240 | - It's because I formed a parasocial relationship.

01:08:00.640 | - Yeah, I just want to be part of your life.

01:08:03.720 | - Get this.

01:08:04.560 | - Totally.

01:08:07.240 | - Yeah, but it is obviously because OpenAI

01:08:09.720 | has just launched a real-time chat.

01:08:11.080 | It's a very hot topic.

01:08:12.800 | I would say one of the toughest AI engineering disciplines

01:08:16.320 | out there because even their API

01:08:19.320 | doesn't do interruptions that well.

01:08:21.840 | To be honest and you know, yeah.

01:08:23.640 | So real-time chat is tough.

01:08:25.280 | - I love that thing.

01:08:26.120 | I love it, yeah.

01:08:27.200 | - Okay, so we have a couple of ways to end,

01:08:30.320 | either call to action or laying out one principle

01:08:33.040 | of AI PMing or engineering that you really

01:08:36.280 | think about a lot.

01:08:37.240 | Is there anything that comes to mind?

01:08:39.240 | - I feel like that's a test.

01:08:40.080 | Of course, I'm going to say go to notebooklm.google.com.

01:08:43.760 | Try it out, join the Discord and tell us what you think.

01:08:46.720 | - Yeah, especially like you have a technical audience.

01:08:49.240 | What do you want from a technical engineering audience?

01:08:52.360 | - I mean, I think it's interesting

01:08:54.240 | because the technical and engineering audience

01:08:55.960 | typically will just say, "Hey, where's the API?"

01:08:58.440 | But you know, and I think we addressed it.

01:09:00.080 | But I think what I would really be interested to discover

01:09:03.240 | is, is this useful to you?

01:09:05.160 | Why is it useful?

01:09:06.000 | What did you do?

01:09:06.960 | Right, is it useful tomorrow?

01:09:08.160 | How about next week?

01:09:09.000 | Just the most useful thing for me is if you do stop using it

01:09:12.240 | or if you do keep using it, tell me why.

01:09:14.240 | Because I think contextualizing it within your life,

01:09:16.880 | right, your background, your motivations,

01:09:19.440 | like is what really helps me build really cool things.

01:09:22.280 | - And then one piece of advice for AI PMs.

01:09:24.640 | - Okay, if I had to pick one, it's just always be building.

01:09:28.400 | Like build things yourself.

01:09:29.360 | I think like for PMs, it's like such a critical skill

01:09:32.200 | and just like take time to like pop your head up

01:09:34.800 | and see what else is new out there.

01:09:36.480 | On the weekends, I try to have a lot of discipline.

01:09:38.680 | Like I only use ChatGPT and like Cloud on the weekend.

01:09:41.880 | I try to like use like the APIs.

01:09:44.080 | Occasionally I'll try to build something

01:09:46.000 | on like GCP over the weekend,

01:09:47.680 | 'cause like I don't do that normally like at work.

01:09:50.480 | But it's just like the rigor of just trying

01:09:53.160 | to be like a builder yourself.

01:09:55.400 | And even just like testing, right?

01:09:56.600 | Like you can have an idea of like how a product should work

01:09:59.040 | and maybe your engineers are building it.

01:10:00.720 | But it's like, what was your like proof of concept, right?

01:10:03.280 | Like what gave you conviction that that was the right thing?

01:10:06.080 | - Call to action.

01:10:07.000 | - I feel like consistently like the most magical moments

01:10:10.840 | out of like AI building come about for me

01:10:13.800 | when like I'm really, really, really just close

01:10:16.440 | to the edge of the model capability.

01:10:19.120 | And sometimes it's like farther than you think it is.

01:10:21.360 | Like I think while building this product,

01:10:23.560 | some of the other experiments,

01:10:24.560 | like there were phases where it was like easy

01:10:26.160 | to think that you've like approached it.

01:10:28.240 | But like sometimes at that point,

01:10:29.600 | what you really need is to like show your thing to someone

01:10:32.120 | and like they'll come up with creative ways to improve it.

01:10:34.800 | Like we're all sort of like learning, I think.

01:10:37.400 | So yeah, like I feel like unless you're hitting

01:10:39.400 | that bound of like, this is what Gemini 1.5 can do,

01:10:43.720 | probably like the magic moment is like somewhere there,

01:10:46.120 | like in that sort of limit.

01:10:48.520 | - So push the edge of the capability.

01:10:50.560 | - Yeah, totally.

01:10:51.880 | - It's funny because we had a Nicola Scarlini

01:10:54.120 | from DeepMind on the pod.

01:10:55.640 | And he was like, if the model is always successful,

01:10:57.880 | you're probably not trying hard enough

01:10:59.560 | to like give it hard.

01:11:00.400 | - Right.

01:11:01.240 | - So yeah.

01:11:03.160 | - My problem is like sometimes I'm not smart enough

01:11:05.200 | to judge.

01:11:06.040 | - Yeah, right.

01:11:06.880 | (laughing)

01:11:08.880 | - I think like that's, I hear that a lot.

01:11:11.160 | Like people are always like, I don't know how to use it.

01:11:13.600 | Yeah, and it's hard.

01:11:15.080 | Like I remember the first time I used Google search,

01:11:16.800 | I was like, what do we type?

01:11:18.080 | My dad was like, anything.

01:11:19.680 | It's like anything, I got nothing in my brain, dad.

01:11:21.720 | (laughing)

01:11:23.000 | What do you mean?

01:11:23.840 | And I think there's a lot of like for product builders

01:11:26.280 | is like have a strong opinion about like,

01:11:28.320 | what is the user supposed to do?

01:11:30.000 | - Yeah.

01:11:30.840 | - Help them do it.

01:11:31.680 | - Principle for AI engineers or like just one advice

01:11:35.520 | that you have others?

01:11:36.760 | - I guess like, in addition to pushing the bounds

01:11:39.440 | and to do that, that often means like,

01:11:41.400 | you're not gonna get it right in the first go.

01:11:43.880 | So like, don't be afraid to just like,

01:11:46.320 | batch multiple models together.

01:11:49.360 | I guess that's, I'm basically describing an agent,

01:11:51.600 | but more thinking time equals

01:11:53.720 | just better results consistently.

01:11:55.520 | And that holds true for probably every single time

01:11:59.560 | that I've tried to build something.

01:12:01.240 | - Well, at some point we will talk about

01:12:02.840 | the sort of longer inference paradigm.

01:12:04.720 | It seems like DeepMind is rumored

01:12:06.560 | to be coming out with something.

01:12:07.760 | You can't comment, of course.

01:12:09.080 | Yeah, well, thank you so much.

01:12:10.440 | You know, you've created, I actually said,

01:12:12.760 | I think you saw this.

01:12:13.880 | I think that Notebook LLM was kind of like

01:12:15.800 | the ChatGPC moment for Google.

01:12:17.920 | - Yeah, that was so crazy when I saw that.

01:12:19.720 | I was like, what?

01:12:20.560 | Like ChatGPC was huge for me.

01:12:22.400 | And I think, you know, when you said it

01:12:24.680 | and other people have said it, I was like, is it?

01:12:27.600 | - Yeah.

01:12:28.440 | - That's crazy, that's so cool.

01:12:29.280 | - People weren't like really cognizant

01:12:30.600 | of Notebook LLM before and audio overviews

01:12:32.880 | and Notebook LLM, like unlocked the, you know,

01:12:36.960 | a use case for people in a way that

01:12:39.200 | I would go so far as to say cloud projects never did.

01:12:41.760 | And I don't know, you know,

01:12:43.360 | I think a lot of it is competent PMing and engineering,

01:12:46.240 | but also just, you know, it's interesting how

01:12:48.680 | a lot of these projects are always

01:12:50.080 | like low key research previews.

01:12:51.960 | For you, it's like, you're a separate org,

01:12:53.480 | but like, you know, you built products and UI innovation

01:12:56.880 | on top of also working with research to improve the model.

01:12:59.920 | That was a success.

01:13:01.200 | That wasn't planned to be this whole big thing.

01:13:04.080 | You know, your TPUs were on fire, right?

01:13:06.280 | - Oh my gosh, that was so funny.

01:13:08.320 | I didn't know people would like really

01:13:09.840 | catch on to the Elmo fire,

01:13:11.640 | but it was just like one of those things

01:13:13.320 | where I was like, you know, we had to ask for more TPUs.

01:13:16.720 | Yeah, many times.

01:13:18.600 | And, you know, it was a little bit of a subtweet of like,

01:13:21.480 | hey, reminder, give us more TPUs down here.

01:13:25.000 | - It's weird.

01:13:25.840 | I just think like when people try to make big launches,

01:13:28.360 | then they flop.

01:13:29.200 | And then like when they're not trying

01:13:30.480 | and they're just trying to build a good thing,

01:13:32.680 | then they succeed.

01:13:33.640 | It's this fundamentally really weird magic

01:13:36.520 | that I haven't really encapsulated yet,

01:13:38.800 | but you've done it.

01:13:40.040 | - Thank you.

01:13:40.880 | And you know, I think we'll just keep going

01:13:43.200 | in like the same way.

01:13:44.040 | We just keep trying, keep trying to make it better.

01:13:45.760 | - Yeah, I hope so.

01:13:46.880 | All right, cool.

01:13:47.720 | Thank you.

01:13:48.560 | - Thank you. Thanks for having us.

01:13:49.400 | - Thanks.

01:13:50.240 | (upbeat music)

01:13:52.820 | (upbeat music)

01:13:55.400 | (upbeat music)

How NotebookLM Was Made

Chapters