back to index

How NotebookLM Was Made


Chapters

0:0 Introductions
1:39 From Project Tailwind to NotebookLM
9:25 Learning from 65,000 Discord members
12:15 How NotebookLM works
18:0 Working with Steven Johnson
23:0 How to prioritize features
25:13 Structuring the data pipelines
29:50 How to eval
34:34 Steering the podcast outputs
37:51 Defining speakers personalities
39:4 How do you make audio engaging?
45:47 Humor is AGI
51:38 Designing for non-determinism
53:35 API when?
55:5 Multilingual support and dialect considerations
57:50 Managing system prompts and feature requests
60:58 Future of NotebookLM
64:59 Podcasts for your codebase
67:16 Plans for real-time chat
68:27 Wrap up

Whisper Transcript | Transcript Only Page

00:00:00.000 | Hey everyone, we're here today as guests on Latent Space.
00:00:03.400 | - It's great to be here.
00:00:04.720 | I'm a long time listener and fan.
00:00:06.360 | They've had some great guests on this show before.
00:00:08.240 | - Yeah, what an honor to have us,
00:00:10.400 | the hosts of another podcast, join as guests.
00:00:13.200 | - I mean, a huge thank you to Swix and Alessio
00:00:16.500 | for the invite, thanks for having us on the show.
00:00:18.320 | - Yeah, really, it seems like they brought us here
00:00:19.880 | to talk a little bit about our show, our podcast.
00:00:22.600 | - Yeah, I mean, we've had lots of listeners ourselves,
00:00:25.240 | listeners at Deep Dive.
00:00:26.280 | - Oh yeah, we've made a ton of audio overviews
00:00:29.120 | since we launched and we're learning a lot.
00:00:31.240 | - There's probably a lot we can share
00:00:32.520 | around what we're building next, huh?
00:00:34.320 | - Yeah, we'll share a little bit at least.
00:00:35.920 | - The short version is we'll keep learning
00:00:38.000 | and getting better for you.
00:00:39.760 | - We're glad you're along for the ride.
00:00:41.080 | - So yeah, keep listening.
00:00:41.920 | - Keep listening and stay curious.
00:00:44.120 | We promise to keep diving deep
00:00:45.880 | and bringing you even better options in the future.
00:00:49.160 | - Stay curious.
00:00:50.280 | (upbeat music)
00:00:52.800 | - Hey everyone, welcome to the Latent Space Podcast.
00:00:55.240 | This is Alessio, partner and CTO
00:00:56.960 | of Residence at Decibel Partners
00:00:58.600 | and I'm joined by Nicole Swicks, founder of Small.ai.
00:01:01.560 | - Hey, and today we're back in the studio
00:01:03.880 | with our special guests, Ryzen Martin and Usama,
00:01:07.200 | I forgot to get your last name, Shavkat?
00:01:10.320 | - Yes.
00:01:11.160 | - Okay, welcome.
00:01:12.520 | - Hello, thank you for having us.
00:01:13.640 | - Thanks for having us.
00:01:14.480 | - So AI podcasters meet human podcasters, always fun.
00:01:17.800 | Congrats on the success of Notebook LM.
00:01:20.880 | I mean, how does it feel?
00:01:22.560 | - It's been a lot of fun.
00:01:23.520 | A lot of it honestly was unexpected,
00:01:25.840 | but my favorite part is really listening
00:01:27.560 | to the audio overviews that people have been making.
00:01:29.800 | - Maybe we should do a little bit of intros
00:01:31.680 | and tell the story.
00:01:32.880 | You know, what is your path into the sort of Google AI org
00:01:36.200 | or maybe I actually don't even know what org
00:01:38.680 | you guys are in.
00:01:39.760 | - I can start.
00:01:40.600 | My name's Ryza.
00:01:41.560 | I lead the Notebook LM team inside of Google Labs.
00:01:45.240 | So specifically that's the org that we're in.
00:01:47.360 | It's called Google Labs.
00:01:48.240 | It's only about two years old.
00:01:49.960 | And our whole mandate is really to build AI products.
00:01:53.840 | That's it.
00:01:54.680 | We work super closely with DeepMind.
00:01:56.280 | Our entire thing is just like try a bunch of things
00:01:59.160 | and see what's landing with users.
00:02:01.200 | And the background that I have is really,
00:02:03.520 | I worked in payments before this
00:02:05.280 | and I worked in ads right before and then startups.
00:02:07.800 | I tell people like at every time that I changed orgs,
00:02:12.040 | I actually almost quit Google.
00:02:13.800 | Like specifically like in between ads and payments,
00:02:15.800 | I was like, all right, I can't do this.
00:02:17.200 | Like, this is like super hard.
00:02:18.680 | I was like, it's not for me.
00:02:19.640 | I'm like a very zero to one person.
00:02:21.680 | But then I was like, okay, I'll try.
00:02:22.920 | I'll interview with other teams.
00:02:24.640 | And when I interviewed in payments,
00:02:25.840 | I was like, oh, these people are really cool.
00:02:27.520 | I don't know if I'm like a super good fit with this space,
00:02:31.160 | but I'll try it 'cause the people are cool.
00:02:33.040 | And then I really enjoyed that.
00:02:34.080 | And then I worked on like zero to one features
00:02:36.160 | inside of payments and I had a lot of fun.
00:02:38.480 | But then the time came again where I was like,
00:02:40.320 | oh, I don't know.
00:02:41.440 | It's like, it's time to leave.
00:02:42.280 | It's time to start my own thing.
00:02:43.640 | But then I interviewed inside of Google Labs
00:02:45.840 | and I was like, oh darn.
00:02:47.200 | Like there's definitely like-
00:02:48.360 | - They got you again.
00:02:49.200 | - They got me again.
00:02:50.280 | (laughing)
00:02:51.120 | And so now I've been here for two years
00:02:52.840 | and I'm happy that I stayed
00:02:54.840 | because especially with the recent success of Notebook LM,
00:02:58.440 | I'm like, dang, we did it.
00:03:00.120 | I actually got to do it.
00:03:01.200 | So that was really cool.
00:03:02.280 | - Kind of similar, honestly.
00:03:03.920 | I was at a big team at Google.
00:03:06.560 | We do sort of the data center supply chain planning stuff.
00:03:10.200 | Google has like the largest sort of footprint.
00:03:11.960 | Obviously there's a lot of management stuff to do there.
00:03:14.240 | But then there was this thing called Area 120 at Google,
00:03:17.440 | which does not exist anymore.
00:03:19.520 | But I sort of wanted to do like more zero to one building
00:03:23.320 | and landed a role there where we're trying to build
00:03:25.560 | like a creator commerce platform called Kaya.
00:03:29.000 | It launched briefly a couple of years ago.
00:03:32.280 | But then Area 120 sort of transitioned
00:03:34.600 | and morphed into Labs.
00:03:36.440 | And like over the last few years,
00:03:38.400 | like the focus just got a lot clearer.
00:03:40.880 | Like we were trying to build new AI products
00:03:43.680 | and do it in the wild and sort of co-create and all of that.
00:03:47.040 | So yeah, we've just been trying a bunch of different things
00:03:49.800 | and this one really landed,
00:03:51.400 | which has felt pretty phenomenal.
00:03:52.760 | - Really, really landed.
00:03:53.920 | Let's talk about the brief history of NotebookLM.
00:03:57.080 | You had a tweet, which is very helpful for doing research.
00:03:59.800 | May, 2023, during Google I/O,
00:04:01.520 | you announced Project Tailwind.
00:04:03.480 | - Yeah.
00:04:04.320 | - So today is October, 2024.
00:04:07.120 | So you joined October, 2022.
00:04:09.160 | - Actually, I used to lead AI Test Kitchen.
00:04:11.720 | And this was actually, I think not I/O 2023,
00:04:16.640 | I/O 2022 is when we launched AI Test Kitchen
00:04:20.680 | or announced it.
00:04:21.520 | And I don't know if you remember it.
00:04:22.920 | - I wasn't, that's how you like had the basic prototype
00:04:25.880 | for Gemini 8.
00:04:26.720 | - Yes, yes, exactly.
00:04:27.560 | - And like gave beta access to people.
00:04:29.320 | - Yeah, yeah, yeah.
00:04:30.320 | And I remember, I was like, wow, this is crazy.
00:04:33.200 | We're going to launch an LLM into the wild.
00:04:36.280 | And that was the first project
00:04:37.400 | that I was working on at Google.
00:04:39.240 | But at the same time, my manager at the time, Josh,
00:04:42.000 | he was like, "Hey, but I want you to really think about
00:04:44.600 | like what real products would we build
00:04:46.880 | that are not just demos of the technology?"
00:04:49.360 | That was in October of 2022.
00:04:53.200 | I was sitting next to an engineer
00:04:55.120 | that was working on a project called Talk to Small Corpus.
00:04:58.120 | His name was Adam.
00:04:59.360 | And the idea of Talk to Small Corpus
00:05:00.800 | is basically using LLM to talk to your data.
00:05:03.240 | And at the time I was like, wait,
00:05:04.880 | there are some like really practical things
00:05:07.200 | that you can build here.
00:05:08.280 | And I, just a little bit of background,
00:05:10.080 | like I was an adult learner.
00:05:11.280 | Like I went to college while I was working a full-time job.
00:05:14.120 | And the first thing I thought was like,
00:05:15.840 | this would have really helped me with my studying, right?
00:05:19.040 | If I could just like talk to a textbook,
00:05:20.760 | especially like when I was tired after work,
00:05:22.960 | that would have been huge.
00:05:24.400 | We took a lot of like the Talk to Small Corpus prototypes
00:05:27.000 | and I showed it to a lot of like college students,
00:05:29.040 | particularly like adult learners.
00:05:31.080 | They were like, yes, like I get it.
00:05:32.960 | Like I didn't even have to explain it to them.
00:05:35.200 | And we just continued to iterate the prototype from there
00:05:38.240 | to the point where we actually got a slot
00:05:40.600 | as part of the I/O demo in '23.
00:05:42.720 | - And Corpus, was it a textbook?
00:05:44.800 | - Oh my gosh, yeah.
00:05:46.200 | It's funny, actually.
00:05:47.360 | When he explained the project to me,
00:05:48.560 | he was like, "Talk to Small Corpus."
00:05:49.960 | I was like, "Talk to a small corpse?"
00:05:51.160 | - Yeah, nobody says corpus.
00:05:52.000 | - It was like a small corpse?
00:05:53.080 | This is not AI.
00:05:53.920 | - It's very academic.
00:05:54.760 | - Yeah, yeah.
00:05:55.600 | And it really was just like a way for us to describe
00:05:58.520 | the amount of data that we thought like could be,
00:06:01.280 | it could be good for.
00:06:02.280 | - Yeah, but even then, you're still like doing rag stuff
00:06:04.760 | because, you know, the context lens back then
00:06:06.760 | was probably like 2K, 4K.
00:06:08.040 | - Yeah, it was basically rag.
00:06:09.480 | That was essentially what it was.
00:06:10.880 | And I remember, I was like,
00:06:12.360 | we were building the prototypes and at the same time,
00:06:14.960 | I think like the rest of the world was, right?
00:06:17.160 | We were seeing all of these like chat with PDF stuff
00:06:19.600 | come up and I was like, "Come on, we gotta go."
00:06:21.680 | Like we have to like push this out into the world.
00:06:24.320 | I think if there was anything,
00:06:25.200 | I wish we would have launched sooner
00:06:26.680 | because I wanted to learn faster.
00:06:28.600 | But I think like we netted out pretty well.
00:06:30.760 | - Was the initial product just text-to-speech
00:06:33.800 | or were you also doing kind of like a synthesizing
00:06:36.680 | of the content, refining it?
00:06:38.120 | Or were you just helping people read through it?
00:06:40.480 | - Before we did the I/O announcement in '23,
00:06:44.120 | we'd already done a lot of studies.
00:06:46.280 | And one of the first things that I realized
00:06:48.680 | was the first thing anybody ever typed
00:06:50.360 | was summarize the thing, right?
00:06:52.920 | Summarize the document.
00:06:54.080 | And it was like half like a test
00:06:55.920 | and half just like, "Oh, I know the content.
00:06:57.840 | I wanna see how well it does this."
00:06:59.400 | So as part of the first thing that we launched,
00:07:01.880 | it was called Project Tailwind back then.
00:07:03.960 | It was just Q&A.
00:07:05.360 | So you could chat with the doc just through text
00:07:07.960 | and it would automatically generate a summary as well.
00:07:10.560 | I'm not sure if we had it back then.
00:07:12.000 | I think we did.
00:07:12.840 | It would also generate the key topics in your document.
00:07:15.920 | And it could support up to like 10 documents.
00:07:18.600 | So it wasn't just like a single doc.
00:07:20.240 | - And then the I/O demo went well, I guess.
00:07:23.040 | - Yeah.
00:07:23.880 | - And then what was the discussion from there
00:07:25.840 | to where we are today?
00:07:27.320 | Is there any maybe intermediate step of the product
00:07:30.760 | that people missed between this was launch or?
00:07:33.600 | - It was interesting because every step of the way,
00:07:35.760 | I think we hit like some pretty critical milestones.
00:07:38.560 | So I think from the initial demo,
00:07:40.240 | I think there was so much excitement of like,
00:07:41.840 | "Wow, what is this thing that Google is launching?"
00:07:44.800 | And so we capitalized on that.
00:07:46.320 | We built the wait list.
00:07:47.640 | That's actually when we also launched the Discord server,
00:07:50.400 | which has been huge for us because for us in particular,
00:07:53.840 | one of the things that I really wanted to do
00:07:56.680 | was to be able to launch features and get feedback ASAP.
00:08:00.080 | Like the moment somebody tries it,
00:08:01.480 | like I want to hear what they think right now.
00:08:03.440 | And I want to ask follow-up questions.
00:08:04.960 | And the Discord has just been so great for that.
00:08:07.160 | But then we basically took the feedback from I/O.
00:08:10.280 | We continued to refine the product.
00:08:12.360 | So we added more features.
00:08:13.680 | We added sort of like the ability to save notes,
00:08:15.880 | write notes, we generate follow-up questions.
00:08:18.800 | So there's a bunch of stuff in the product
00:08:20.240 | that shows like a lot of that research,
00:08:22.280 | but it was really the rolling out of things.
00:08:23.960 | Like we removed the wait list,
00:08:25.760 | so rolled out to all of the United States.
00:08:27.720 | We rolled out to over 200 countries and territories.
00:08:31.240 | We started supporting more languages,
00:08:33.280 | both in the UI and like the actual source stuff.
00:08:36.200 | We experienced, like in terms of milestones,
00:08:38.080 | there was like an explosion of like users in Japan.
00:08:41.200 | This was super interesting in terms of just like
00:08:43.520 | unexpected, like people would write to us
00:08:45.520 | and they would be like, "This is amazing.
00:08:47.600 | I have to read all of these rules in English,
00:08:50.560 | but I can chat in Japanese."
00:08:52.560 | It's like, oh, wow, that's true, right?
00:08:54.760 | Like with LLMs, you kind of get this natural,
00:08:57.080 | it translates the content for you,
00:08:59.680 | and you can ask in your sort of preferred mode.
00:09:01.960 | And I think that's not just like a language thing too.
00:09:04.760 | I think there's like,
00:09:06.120 | I do this test with Wealth of Nations all the time,
00:09:07.960 | 'cause it's like a pretty complicated text to read.
00:09:10.400 | - The Evan Smith classic, it's like 400 pages this thing.
00:09:12.560 | - Yeah, but I like this test 'cause I'm like,
00:09:14.840 | I ask in like normie, you know, plain speak,
00:09:17.600 | and then it summarizes really well for me.
00:09:19.600 | It sort of adapts to my tone.
00:09:21.120 | - Very capitalist.
00:09:23.680 | - Very on brand.
00:09:25.840 | - I just checked in on a Notebook LM Discord, 65,000 people.
00:09:28.680 | - Yeah.
00:09:29.520 | - Crazy, just like for one project within Google.
00:09:32.640 | It's not like, it's not labs, it's just Notebook LM.
00:09:35.480 | - Just Notebook LM.
00:09:36.560 | - What do you learn from the community?
00:09:39.200 | - I think that the Discord is really great
00:09:42.040 | for hearing about a couple of things.
00:09:43.680 | One, when things are going wrong.
00:09:45.400 | I think, honestly, like our fastest way
00:09:48.160 | that we've been able to find out
00:09:49.520 | if like the servers are down,
00:09:50.840 | or there's just an influx of people being like,
00:09:53.080 | it says system unable to answer,
00:09:54.520 | anybody else getting this?
00:09:56.200 | And I'm like, all right, let's go.
00:09:58.080 | And it actually catches it a lot faster
00:09:59.760 | than like our own monitoring does.
00:10:01.600 | It's like, that's been really cool.
00:10:02.480 | So thank you.
00:10:03.320 | - Cats will need a dog.
00:10:04.160 | (all laughing)
00:10:05.760 | - So thank you to everybody.
00:10:06.920 | Please keep reporting it.
00:10:08.200 | I think the second thing is really the use cases.
00:10:10.280 | I think when we put it out there,
00:10:11.600 | I was like, hey, I have a hunch of how people will use it,
00:10:14.640 | but like to actually hear about, you know,
00:10:16.800 | not just the context of like the use of Notebook LM,
00:10:19.400 | but like, what is this person's life like?
00:10:21.880 | Why do they care about using this tool?
00:10:23.640 | Especially people who actually have trouble using it,
00:10:25.760 | but they keep pushing,
00:10:27.280 | like that's just so critical to understand
00:10:29.520 | what was so motivating, right?
00:10:31.400 | Like what was your problem that was like so worth solving?
00:10:33.480 | So that's like a second thing.
00:10:34.600 | The third thing is also just hearing sort of like
00:10:37.120 | when we have wins and when we don't have wins,
00:10:39.480 | because there's actually a lot of functionality
00:10:41.200 | where I'm like, hmm, I don't know
00:10:42.760 | if that landed super well
00:10:43.840 | or if that was actually super critical.
00:10:45.840 | As part of having this sort of small project, right,
00:10:49.240 | I wanna be able to unlaunch things too.
00:10:50.960 | So it's not just about just like rolling things out
00:10:52.760 | and testing it and being like, wow,
00:10:54.720 | now we have like 99 features.
00:10:57.160 | Like hopefully we get to a place where it's like,
00:10:58.640 | there's just a really strong core feature set
00:11:00.560 | and the things that aren't as great, we can just unlaunch.
00:11:02.400 | - What have you unlaunched?
00:11:03.520 | I have to ask.
00:11:04.440 | - I'm in the process of unlaunching some stuff.
00:11:07.480 | But for example, we had this idea
00:11:10.880 | that you could highlight the text in your source passage
00:11:13.800 | and then you could transform it.
00:11:15.360 | And nobody was really using it.
00:11:17.000 | And it was like a very complicated piece of our architecture
00:11:19.960 | and it's very hard to continue supporting it
00:11:22.280 | in the context of new features.
00:11:24.040 | So we were like, okay, let's do a 50/50 sunset of this thing
00:11:27.040 | and see if anybody complains.
00:11:28.360 | And so far, nobody has.
00:11:29.600 | - Is there like a feature flagging paradigm
00:11:31.760 | inside of your architecture
00:11:33.720 | that lets you feature flag these things easily?
00:11:36.400 | - Yes, and actually-
00:11:37.440 | - What is it called?
00:11:38.280 | Like, I love feature flagging.
00:11:39.960 | - You mean like in terms of just like
00:11:41.200 | being able to expose things to users?
00:11:42.040 | - Yeah, as a PM, like this is your number one tool, right?
00:11:44.280 | - Yeah, yeah.
00:11:45.120 | - Let's try this out.
00:11:46.120 | All right, if it works, roll it out.
00:11:47.960 | If it doesn't, roll it back, you know?
00:11:49.240 | - Yeah, I mean, we just run Mendel experiments
00:11:51.480 | for the most part.
00:11:52.320 | And I actually, I don't know if you saw it,
00:11:54.160 | but on Twitter, somebody was able to get around our flags
00:11:56.760 | and they enabled all the experiments.
00:11:58.440 | They were like, "Check out what the Notebook LM team
00:12:01.120 | is cooking."
00:12:01.960 | And I was like, "Oh!"
00:12:03.000 | And I was at lunch with the rest of the team.
00:12:05.840 | And I was like, I was eating, I was like,
00:12:07.520 | "Guys, guys, Magic Draft week!"
00:12:10.760 | They were like, "Oh no!"
00:12:12.640 | I was like, "Okay, just finish eating
00:12:13.840 | and then let's go figure out what to do."
00:12:15.400 | - Yeah.
00:12:16.240 | - I think a post-mortem would be fun,
00:12:17.320 | but I don't think we need to do it on the podcast now.
00:12:20.880 | - Yeah, yeah.
00:12:21.720 | - Can we just talk about what's behind the magic?
00:12:24.560 | So I think everybody has questions,
00:12:27.520 | hypotheses about what models power it.
00:12:30.000 | I know you might not be able to share everything,
00:12:32.080 | but can you just get people very basic?
00:12:34.440 | How do you take the data and put it in the model?
00:12:36.560 | What text model do you use?
00:12:38.160 | What's the text-to-speech kind of like jump
00:12:40.800 | between the two?
00:12:41.680 | - Sure, yeah.
00:12:42.840 | - I was going to say, Susama,
00:12:44.160 | he manually does all the podcasts.
00:12:45.920 | - Oh, thank you.
00:12:46.760 | - Really fast.
00:12:47.600 | - You're very fast, yeah.
00:12:48.440 | - He's both of the voices at once.
00:12:50.320 | - Voice actor.
00:12:52.280 | - Go ahead, go ahead.
00:12:53.120 | - So just for a bit of background,
00:12:54.800 | we were building this thing sort of outside Notebook LM
00:12:57.760 | to begin with.
00:12:58.600 | Like just the idea is like content transformation, right?
00:13:01.480 | Like we can do different modalities.
00:13:03.200 | Like everyone knows that everyone's been poking at it,
00:13:05.600 | but like, how do you make it really useful?
00:13:08.640 | And like one of the ways we thought was like, okay,
00:13:10.400 | like you maybe like, you know,
00:13:12.480 | people learn better when they're hearing things,
00:13:14.600 | but TTS exists and you can like narrate
00:13:17.000 | whatever's on screen,
00:13:18.080 | but you want to absorb it the same way.
00:13:20.000 | So like, that's where we sort of started out
00:13:21.920 | into the realm of like, maybe we try like, you know,
00:13:24.840 | two people are having a conversation kind of format.
00:13:28.360 | We didn't actually start out thinking
00:13:30.160 | this would live in Notebook, right?
00:13:31.800 | Like Notebook was sort of,
00:13:33.200 | we built this demo out independently,
00:13:35.960 | tried out like a few different sort of sources.
00:13:38.520 | The main idea was like, go from some sort of sources
00:13:41.280 | and transform it into a listenable, engaging audio format.
00:13:45.560 | And then through that process,
00:13:46.560 | we like unlocked a bunch more sort of learnings.
00:13:49.400 | Like for example, in a sense,
00:13:51.280 | like you're not prompting the model as much
00:13:53.760 | because like the information density is getting unrolled
00:13:57.520 | by the model prompting itself in a sense,
00:14:00.480 | because there's two speakers
00:14:01.720 | and they're both technically like AI personas, right?
00:14:04.280 | That have different angles of looking at things
00:14:07.160 | and like, they'll have a discussion about it.
00:14:08.960 | And that sort of, we realized that's kind of
00:14:10.800 | what was making it riveting in a sense.
00:14:12.800 | Like you care about what comes next,
00:14:14.760 | even if you've read the material already,
00:14:17.080 | 'cause like people say they get new insights
00:14:20.000 | on their own journals or books or whatever,
00:14:23.480 | like anything that they've written themselves.
00:14:25.480 | So yeah, from a modeling perspective,
00:14:27.080 | like it's, like Raisa said earlier,
00:14:29.480 | like we work with the DeepMind audio folks pretty closely.
00:14:33.480 | So they're always cooking up new techniques
00:14:35.760 | to like get better, more human-like audio.
00:14:38.880 | And then Gemini 1.5 is really, really good
00:14:42.960 | at absorbing long context.
00:14:45.240 | So we sort of like generally put those things together
00:14:48.680 | in a way that we could reliably produce the audio.
00:14:52.760 | - I would add like, there's something really nuanced,
00:14:55.520 | I think about sort of the evolution
00:14:57.120 | of like the utility of text-to-speech,
00:14:59.840 | where if it's just reading an actual text response,
00:15:04.120 | and I've done this several times,
00:15:05.440 | I do it all the time with like reading my text messages
00:15:07.800 | or like sometimes I'm trying to read
00:15:09.400 | like a really dense paper,
00:15:10.680 | but I'm trying to do actual work,
00:15:12.040 | I'll have it like read out the screen.
00:15:14.200 | There is something really robotic about it
00:15:16.480 | that is not engaging.
00:15:18.120 | And it's really hard to consume content in that way.
00:15:20.880 | And it's never been really effective,
00:15:22.280 | like particularly for me where I'm like,
00:15:23.800 | hey, it's actually just like, it's fine for like short stuff
00:15:26.520 | like texting, but even that, it's like not that great.
00:15:29.440 | So I think the frontier of experimentation here
00:15:31.960 | was really thinking about there is a transform
00:15:35.640 | that needs to happen in between whatever,
00:15:38.240 | here's like my resume, right?
00:15:39.640 | Or here's like a hundred page slide deck or something.
00:15:42.920 | There is a transform that needs to happen
00:15:44.840 | that is inherently editorial.
00:15:47.480 | And I think this is where like that two-person persona,
00:15:51.000 | right, dialogue model,
00:15:52.640 | they have takes on the material that you've presented,
00:15:56.440 | that's where it really sort of like brings the content
00:15:59.040 | to life in a way that's like not robotic.
00:16:02.160 | And I think that's like where the magic is,
00:16:04.240 | is like, you don't actually know what's going to happen
00:16:06.920 | when you press generate, you know, for better or for worse,
00:16:09.280 | like to the extent that like people are like,
00:16:10.880 | no, I actually want it to be more predictable now.
00:16:13.120 | Like I want to be able to tell them,
00:16:15.360 | but I think that initial like, wow,
00:16:17.280 | was because you didn't know, right?
00:16:19.160 | When you upload your resume,
00:16:20.880 | what's it about to say about you?
00:16:22.400 | And I think I've seen enough of these where I'm like,
00:16:24.320 | oh, it gave you good vibes, right?
00:16:26.000 | Like you knew I was going to say like something really cool.
00:16:28.440 | As we start to shape this product,
00:16:30.320 | I think we want to try to preserve as much of that wow,
00:16:32.840 | as much as we can,
00:16:34.080 | because I do think like exposing like all the knobs
00:16:37.200 | and like the dials,
00:16:38.560 | like we've been thinking about this a lot.
00:16:40.480 | It's like, hey, is that like the actual thing?
00:16:43.520 | Is that the thing that people really want?
00:16:45.720 | - Have you found differences in having one model
00:16:48.400 | just generate the conversation
00:16:50.120 | and then using text-to-speech to kind of fake two people?
00:16:52.800 | Or like, are you actually using two different
00:16:55.600 | kind of system prompts to like have a conversation
00:16:58.360 | step-by-step?
00:16:59.320 | I'm always curious, like,
00:17:00.800 | if persona system prompts make a big difference
00:17:03.080 | or like you just put in one prompt
00:17:04.440 | and then you just let it run?
00:17:05.760 | - I guess like generally we use a lot of inference
00:17:08.960 | as you can tell with like the spinning thing takes a while.
00:17:12.920 | So yeah, there's definitely like a bunch
00:17:14.080 | of different things happening under the hood.
00:17:16.080 | We've tried both approaches
00:17:17.440 | and they have their sort of drawbacks and benefits.
00:17:21.360 | I think that that idea of like questioning
00:17:23.880 | like the two different personas, like persist throughout
00:17:26.360 | like whatever approach we try.
00:17:27.440 | It's like, there's a bit of like imperfection in there.
00:17:30.880 | Like we had to really lean into the fact that like
00:17:33.600 | to build something that's engaging,
00:17:35.440 | like it needs to be somewhat human
00:17:37.720 | and it needs to be just not a chatbot.
00:17:39.960 | Like that was sort of like what we need to diverge from.
00:17:42.000 | It's like, you know,
00:17:42.840 | most chatbots will just narrate the same kind of answer,
00:17:46.360 | like given the same sources for the most part,
00:17:48.360 | which is ridiculous.
00:17:49.640 | So yeah, there's like experimentation there under the hood,
00:17:52.680 | like with the model to like make sure that it's spitting
00:17:54.960 | out like different takes and different personas
00:17:57.640 | and different sort of prompting each other
00:17:59.280 | is like a good analogy, I guess.
00:18:00.760 | - Yeah, I think Steven Johnson, I think he's on your team.
00:18:03.920 | I don't know what his role is.
00:18:05.160 | He seems like chief dreamer, writer.
00:18:07.920 | - Yeah, I mean, I can comment on Steven.
00:18:10.600 | So Steven joined actually in the very early days,
00:18:13.320 | I think before it was even a fully funded project.
00:18:15.880 | And I remember when he joined, I was like,
00:18:17.960 | Steven Johnson's going to be on my team.
00:18:20.680 | You know, and for folks who don't know him,
00:18:22.480 | Steven is a New York Times bestselling author
00:18:25.160 | of like 14 books.
00:18:26.400 | He has a PBS show.
00:18:28.120 | He's like incredibly smart,
00:18:30.120 | just like a true sort of celebrity by himself.
00:18:33.640 | And then he joined Google and he was like,
00:18:35.120 | I want to come here and I want to build the thing
00:18:38.040 | that I've always dreamed of,
00:18:39.320 | which is a tool to help me think.
00:18:42.160 | I was like, a what?
00:18:43.000 | Like a tool to help you think?
00:18:45.120 | I was like, what do you need help with?
00:18:46.720 | Like, you seem to be doing great on your own.
00:18:48.920 | And, you know, he would describe this to me
00:18:51.040 | and I would watch his flow.
00:18:52.600 | And aside from like providing a lot of inspiration,
00:18:55.520 | to be honest, like when I watched Steven work,
00:18:58.000 | I was like, oh, nobody works like this, right?
00:19:00.760 | Like this is what makes him special.
00:19:02.760 | Like he is such a dedicated like researcher and journalist
00:19:07.000 | and he's so thorough, he's so smart.
00:19:09.160 | And then I had this realization of like,
00:19:10.840 | maybe Steven is the product.
00:19:13.440 | Maybe the work is to take Steven's expertise
00:19:16.880 | and bring it to like everyday people
00:19:19.000 | that could really benefit from this.
00:19:20.200 | Like just watching him work,
00:19:21.400 | I was like, oh, I could definitely use like a mini Steven,
00:19:23.800 | like doing work for me.
00:19:25.080 | Like that would make me a better PM.
00:19:26.800 | And then I thought very quickly about like the adjacent roles
00:19:29.480 | that could use sort of this like research and analysis tool.
00:19:33.000 | And so aside from being, you know, chief dreamer,
00:19:36.840 | Steven also represents like a super workflow
00:19:40.800 | that I think all of us,
00:19:42.520 | like if we had access to a tool like it,
00:19:44.600 | would just inherently like make us better.
00:19:46.480 | - Did you make him express his thoughts while he worked
00:19:49.480 | or you just silently watched him?
00:19:51.440 | Or how does this work?
00:19:52.800 | - Oh no, now you're making me admit it.
00:19:55.400 | But yes, I did just silently watch him.
00:19:56.760 | - Yeah, this is a part of the PM toolkit, right?
00:19:58.480 | Like user interviews and all that.
00:20:00.600 | - Yeah, I mean, I did interview him,
00:20:02.600 | but I noticed like if I interviewed him,
00:20:04.800 | it was different than if I just watched him.
00:20:07.480 | And I did the same thing with students all the time.
00:20:09.760 | Like I followed a lot of students around,
00:20:11.400 | I watched them study.
00:20:12.760 | I would ask them like, oh, how do you feel now?
00:20:14.800 | Right, or why did you do that?
00:20:15.960 | Like what made you do that actually?
00:20:18.360 | Or why are you upset about like this particular thing?
00:20:20.240 | Why are you cranky about this particular topic?
00:20:22.760 | And it was very similar, I think, for Steven,
00:20:25.200 | especially because he was describing,
00:20:27.080 | he was in the middle of writing a book
00:20:28.800 | and he would describe like, oh, you know,
00:20:30.840 | here's how I research things
00:20:32.560 | and here's how I keep my notes.
00:20:34.440 | Oh, and here's how I do it.
00:20:35.800 | And it was really,
00:20:36.960 | he was doing this sort of like self-questioning, right?
00:20:40.080 | Like now we talk about like chain of, you know,
00:20:42.040 | reasoning or thought, reflection.
00:20:43.960 | And I was like, oh, he's the OG.
00:20:46.160 | Like I watched him do it in real time.
00:20:47.680 | I was like, that's like LLM right there.
00:20:50.520 | And to be able to bring sort of that expertise in a way
00:20:53.720 | that was like, you know, maybe like costly inference wise,
00:20:56.520 | but really have like that ability inside of a tool
00:20:58.680 | that was like, for starters, free inside of Notebook LM,
00:21:01.960 | it was good to learn whether or not
00:21:03.360 | people really did find use out of it.
00:21:05.120 | - So did he just commit to using Notebook LM for everything?
00:21:08.520 | Or did you just model his existing workflow?
00:21:11.880 | - Both, right?
00:21:12.720 | Like in the beginning, there was no product for him to use.
00:21:15.040 | And so he just kept describing the thing that he wanted.
00:21:17.240 | And then eventually like we started building the thing
00:21:19.680 | and then I would start watching him use it.
00:21:22.080 | One of the things that I love about Stephen
00:21:24.240 | is he uses the product in ways where it kind of does it,
00:21:28.440 | but doesn't quite, like he's always using it
00:21:30.920 | at like the absolute max limit of this thing.
00:21:34.360 | But the way that he describes it is so full of promise
00:21:37.160 | where he's like, I can see it going here.
00:21:40.040 | And all I have to do is sort of like meet him there
00:21:42.360 | and sort of pressure test whether or not, you know,
00:21:44.480 | everyday people want it and we just have to build it.
00:21:47.000 | - I would say OpenAI has a pretty similar person,
00:21:49.280 | Andrew Mason, I think his name is.
00:21:51.000 | It's very similar, like just from the writing world
00:21:53.240 | and using it as a tool for thought to shape Chachabitty.
00:21:56.440 | I don't think that people who use AR tools
00:21:58.720 | to their limit are common.
00:22:00.440 | I'm looking at my Notebook LM now, I've got two sources.
00:22:03.400 | You have a little like source limit thing
00:22:05.680 | and my bar is over here, you know,
00:22:07.280 | and it stretches across the whole thing.
00:22:08.440 | I'm like, did he fill it up?
00:22:09.640 | Like what, you know?
00:22:10.480 | - Yes, and he has like a higher limit than others.
00:22:12.840 | I think Stephen- - He fills it up.
00:22:14.440 | - Oh yeah, like I don't think Stephen even has a limit.
00:22:17.480 | - And he has Notes, Google Drive stuff, PDFs, MP3, whatever.
00:22:21.960 | - Yes, and one of my favorite demos,
00:22:23.360 | he just did this recently,
00:22:24.400 | is he has actually PDFs of like handwritten Marie Curie notes.
00:22:28.840 | - I see, so you're doing image recognition as well.
00:22:30.800 | - Yeah, so it does support it today.
00:22:32.800 | So if you have a PDF that's purely images,
00:22:34.920 | it will recognize it.
00:22:36.200 | But his demo is just like super powerful.
00:22:37.840 | He's like, okay, here's Marie Curie's notes.
00:22:39.560 | And it's like, here's how I'm using it to analyze it.
00:22:41.680 | And I'm using it for like this thing that I'm writing.
00:22:43.920 | And that's really compelling.
00:22:45.480 | It's like the everyday person
00:22:46.640 | doesn't think of these applications.
00:22:48.480 | And I think even like when I listened to Stephen's demo,
00:22:50.880 | I see the gap.
00:22:52.040 | I see how Stephen got there,
00:22:53.560 | but I don't see how I could without him.
00:22:55.840 | And so there's a lot of work still for us to build
00:22:58.520 | of like, hey, how do I bring that magic down
00:23:01.320 | to like zero work?
00:23:03.280 | Because I look at all the steps that he had to take
00:23:05.160 | in order to do it.
00:23:06.000 | And I'm like, okay, that's product work for us, right?
00:23:07.880 | Like that's just onboarding.
00:23:09.320 | - And so from an engineering perspective,
00:23:10.880 | people come to you and it's like,
00:23:11.800 | okay, I need to use this handwritten notes
00:23:14.160 | from Marie Curie from hundreds of years ago.
00:23:17.000 | How do you think about adding support for like data sources
00:23:19.840 | and then maybe any fun stories
00:23:21.520 | and like supporting more esoteric types of inputs?
00:23:25.440 | - So I think about the product in three ways, right?
00:23:27.680 | So there's the sources, the source input,
00:23:30.120 | there's like the capabilities
00:23:31.360 | of like what you could do with those sources.
00:23:33.360 | And then there's the third space,
00:23:34.640 | which is how do you output it into the world?
00:23:36.480 | Like how do you put it back out there?
00:23:38.640 | There's a lot of really basic sources
00:23:40.640 | that we don't support still, right?
00:23:42.200 | I think there's sort of like
00:23:43.240 | the handwritten notes stuff is one,
00:23:45.080 | but even basic things like Doc X or like PowerPoint, right?
00:23:49.040 | Like these are the things that people,
00:23:50.760 | everyday people are like,
00:23:51.600 | "Hey, my professor actually gave me everything in Doc X.
00:23:54.600 | Can you support that?"
00:23:55.880 | And then just like basic stuff,
00:23:57.160 | like images and PDFs combined with texts.
00:24:00.040 | Like there's just a really long roadmap for sources
00:24:02.840 | that I think we just have to work on.
00:24:04.160 | So that's like a big piece of it.
00:24:05.560 | On the output side,
00:24:06.440 | and I think this is like one of the most interesting things
00:24:08.080 | that we learned really early on is,
00:24:10.680 | sure, there's like the Q&A analysis stuff,
00:24:13.480 | which is like, "Hey, when did this thing launch?
00:24:15.520 | Okay, you found it in the slide deck.
00:24:16.920 | Here's the answer."
00:24:18.240 | But most of the time,
00:24:19.240 | the reason why people ask those questions
00:24:20.840 | is because they're trying to make something new.
00:24:22.560 | And so when actually,
00:24:23.520 | when some of those early features leaked,
00:24:25.320 | like a lot of the features we're experimenting with
00:24:27.280 | are the output types.
00:24:28.840 | And so you can imagine that people care a lot
00:24:31.440 | about the resources that they're putting into Notebook LM
00:24:33.880 | 'cause they're trying to create something new.
00:24:35.920 | So I think equally as important as the source inputs
00:24:39.320 | are the outputs that we're helping people to create.
00:24:42.160 | And really, shortly on the roadmap,
00:24:44.240 | we're thinking about,
00:24:45.640 | how do we help people use Notebook LM
00:24:47.760 | to distribute knowledge?
00:24:49.640 | And that's like one of the most compelling use cases
00:24:51.520 | is like shared notebooks.
00:24:52.680 | It's like a way to share knowledge.
00:24:54.200 | How do we help people take sources
00:24:55.920 | and then one-click new documents out of it, right?
00:24:59.040 | And I think that's something that people think is like,
00:25:00.680 | "Oh yeah, of course," right?
00:25:01.720 | Like one push a document,
00:25:02.880 | but what does it mean to do it right?
00:25:05.080 | Like to do it in your style, in your brand, right?
00:25:08.040 | To follow your guidelines, stuff like that.
00:25:09.560 | So I think there's a lot of work
00:25:11.160 | on both sides of that equation.
00:25:13.320 | - Interesting.
00:25:14.160 | Any comments on the engineering side of things?
00:25:16.200 | - So yeah, like I said,
00:25:17.440 | I was mostly working on building the text to audio,
00:25:20.600 | which kind of lives as a separate engineering pipeline
00:25:23.160 | almost that we then put into Notebook LM.
00:25:25.160 | But I think there's probably tons of Notebook LM
00:25:27.320 | engineering war stories on dealing with sources.
00:25:30.160 | And so I don't work too closely with engineers directly,
00:25:32.960 | but I think a lot of it does come down
00:25:34.360 | to like Gemini's native understanding of images really well,
00:25:38.280 | like the latest generation.
00:25:39.280 | - Yeah, I think on the engineering and modeling side,
00:25:41.440 | I think we are a really good example of a team
00:25:45.120 | that's put a product out there
00:25:46.960 | and we're getting a lot of feedback from the users
00:25:48.560 | and we return the data to the modeling team, right?
00:25:50.920 | To the extent that we say,
00:25:51.760 | "Hey, actually, you know what people are uploading,
00:25:54.560 | but we can't really support super well?
00:25:56.400 | Text plus image," right?
00:25:57.880 | Especially to the extent that like Notebook LM
00:26:00.000 | can handle up to 50 sources, 500,000 words each.
00:26:03.720 | Like you're not going to be able to jam all of that
00:26:05.840 | into like the context window.
00:26:07.000 | So how do we do multimodal embeddings with that?
00:26:09.640 | There's really like a lot of things that we have to solve
00:26:12.760 | that are almost there, but not quite there yet.
00:26:16.240 | - And then turning it into audio.
00:26:18.280 | I think one of the best things is it has so many of the human
00:26:21.480 | does that happen in the text generation
00:26:23.280 | that then becomes audio?
00:26:24.440 | Or is that a part of like the audio model
00:26:26.680 | that transforms the text?
00:26:27.520 | - It's a bit of both, I would say.
00:26:28.960 | The audio model is definitely trying to mimic
00:26:30.760 | like certain human intonations and like sort of natural,
00:26:34.320 | like breathing and pauses and like laughter
00:26:37.000 | and things like that.
00:26:38.200 | But yeah, in generating like the text,
00:26:40.440 | we also have to sort of give signals
00:26:42.240 | on like where those things maybe would make sense.
00:26:45.400 | - Yeah, and on the input side instead,
00:26:47.800 | having a transcript versus having the audio,
00:26:49.920 | like, can you take some of the emotions out of it too?
00:26:52.720 | If I'm giving, like, for example,
00:26:54.480 | when we did the recaps of our podcast,
00:26:56.320 | we can either give audio of the pod
00:26:58.880 | or we can give a diarized transcription of it.
00:27:01.200 | But like the transcription doesn't have some of the,
00:27:03.240 | you know, voice kind of like things.
00:27:05.720 | Do you reconstruct that when people upload audio
00:27:08.280 | or how does that work?
00:27:09.320 | - So when you upload audio today, we just transcribe it.
00:27:12.240 | So it is quite lossy in the sense that like,
00:27:14.920 | we don't transcribe like the emotion from that as a source.
00:27:18.000 | But when you do upload a text file
00:27:21.760 | and it has a lot of like that annotation,
00:27:24.400 | I think that there is some ability for it to be reused
00:27:27.600 | in like the audio output, right?
00:27:29.120 | But I think it will still contextualize it
00:27:31.760 | in the deep dive format.
00:27:33.160 | So I think that's something that's like
00:27:34.880 | particularly important is like,
00:27:36.160 | hey, today we only have one format, it's deep dive.
00:27:38.720 | It's meant to be pretty general overview
00:27:41.000 | and it is pretty peppy.
00:27:42.240 | It's just very upbeat.
00:27:43.640 | - It's very enthusiastic, yeah.
00:27:44.960 | - Yeah, yeah, even if you had like a sad topic,
00:27:47.680 | I think they would find a way to be like,
00:27:49.400 | silver lining though, we're having a good chat.
00:27:53.960 | - Yeah, that's awesome.
00:27:54.800 | One of the ways, many, many, many ways
00:27:56.800 | that deep dive went viral is people saying like,
00:27:59.440 | if you want to feel good about yourself,
00:28:00.680 | just drop in your LinkedIn.
00:28:02.120 | Any other like favorite use cases that you saw
00:28:04.680 | from people discovering things in social media?
00:28:07.880 | - I mean, there's so many funny ones
00:28:09.440 | and I love the funny ones.
00:28:10.880 | I think because I'm always relieved when I watch them,
00:28:13.200 | I'm like, that was funny and not scary, it's great.
00:28:16.960 | There was another one that was interesting,
00:28:18.440 | which was a startup founder putting their landing page
00:28:21.520 | and being like, all right, let's test whether or not
00:28:23.560 | like the value prop is coming through.
00:28:25.120 | And I was like, wow, that's right, that's smart.
00:28:28.240 | And then I saw a couple of other people
00:28:29.880 | following up on that too.
00:28:31.920 | - Yeah, I put my about page in there
00:28:33.720 | and like, yeah, if there are things
00:28:35.600 | that I'm not comfortable with, I should remove it,
00:28:37.440 | so that it can pick it up.
00:28:38.800 | - Right, I think that the personal hype machine
00:28:40.800 | was like pretty viral one.
00:28:44.160 | I think like people uploaded their dreams
00:28:46.480 | and like some people like keep sort of dream journals
00:28:48.680 | and it like would sort of comment on those
00:28:52.040 | and like it was therapeutic.
00:28:53.600 | - I didn't see those, those are good.
00:28:55.920 | I hear from Googlers all the time,
00:28:58.160 | especially 'cause we launched it internally first.
00:29:00.840 | And I think we launched it during the Q3
00:29:04.600 | sort of like check-in cycle.
00:29:06.480 | So all Googlers have to write notes about like,
00:29:08.920 | hey, what'd you do in Q3?
00:29:11.600 | And what Googlers were doing is they would write
00:29:14.280 | whatever they accomplished in Q3
00:29:16.360 | and then they would create an audio overview.
00:29:18.840 | And these people that I didn't know
00:29:20.200 | would just ping me and be like, wow,
00:29:22.080 | like, I feel really good like going into a meeting
00:29:24.320 | with my manager.
00:29:25.160 | And I was like, good, good, good, good.
00:29:27.200 | You really did that, right?
00:29:28.360 | (laughs)
00:29:29.200 | - I think another cool one is just like any Wikipedia article
00:29:33.000 | like you drop it in and it's just like suddenly
00:29:35.400 | like the best sort of summary overview.
00:29:38.320 | - I think that's what Karpathy did, right?
00:29:40.160 | Like he has now a Spotify channel
00:29:42.240 | called "Histories of Mysteries,"
00:29:44.720 | which is basically like he just took like interesting stuff
00:29:47.560 | from Wikipedia and made audio overviews out of it.
00:29:50.560 | - Yeah, he became a podcaster overnight.
00:29:52.360 | - Yeah, I'm here for it.
00:29:54.200 | I fully support him.
00:29:55.840 | I'm racking up the listens for him.
00:29:58.400 | - Honestly, it's useful even without the audio.
00:30:00.560 | You know, I feel like the audio does add an element to it,
00:30:03.240 | but I always want, you know, paired audio and text.
00:30:06.080 | And it's just amazing to see
00:30:07.560 | what people are organically discovering.
00:30:09.480 | I feel like it's because you laid the groundwork
00:30:11.680 | with NotebookLM and then you came in
00:30:13.680 | and added the sort of TTS portion
00:30:16.080 | and made it so good, so human, which is weird.
00:30:19.200 | Like it's this engineering process of humans.
00:30:21.080 | Oh, one thing I wanted to ask.
00:30:22.440 | Do you have evals?
00:30:23.480 | - Yeah. - Yes.
00:30:24.560 | - What?
00:30:25.400 | - Potatoes for chefs.
00:30:26.720 | (laughing)
00:30:27.560 | - What is that?
00:30:28.400 | What do you mean potatoes?
00:30:29.560 | - Oh, sorry, sorry.
00:30:30.400 | We were joking with this like a couple of weeks ago.
00:30:33.440 | We were doing like side-by-sides,
00:30:34.840 | but like Usama sent me the file
00:30:36.360 | and it was literally called "Potatoes for Chefs."
00:30:39.040 | And I was like, you know, my job is really serious,
00:30:42.080 | but like- - It's kind of funny.
00:30:43.400 | - You have to laugh a little bit.
00:30:45.000 | Like the title of the file was like "Potatoes for Chefs."
00:30:47.680 | - Was it like a training document for chefs?
00:30:50.360 | - It was just a side-by-side
00:30:52.360 | for like two different kind of audio transcripts.
00:30:54.920 | - The question is really like, as you iterate,
00:30:57.400 | the typical engineering advice
00:30:59.160 | is you establish some kind of tests or a benchmark.
00:31:02.920 | You're at like 30%.
00:31:03.920 | You want to get it up to 90, right?
00:31:05.280 | - Yeah.
00:31:06.120 | - What does that look like for making something sound human
00:31:08.520 | and interesting and voice?
00:31:11.040 | - We have the sort of formal eval process as well,
00:31:13.440 | but I think like for this particular project,
00:31:15.440 | we maybe took a slightly different route to begin with.
00:31:17.800 | Like there was a lot of just
00:31:19.160 | within the team listening sessions,
00:31:21.720 | a lot of like sort of like- - Dogfooding.
00:31:23.440 | - Yeah, like I think the bar that we tried to get to
00:31:27.640 | before even starting formal evals
00:31:30.200 | with raters and everything was much higher
00:31:32.680 | than I think other projects would.
00:31:34.240 | Like, 'cause that's, as you said,
00:31:35.360 | like the traditional advice, right?
00:31:36.360 | Like get that ASAP.
00:31:37.520 | Like, what are you looking to improve on?
00:31:40.040 | Whatever benchmark it is.
00:31:41.480 | So there was a lot of just like critical listening.
00:31:44.280 | And I think a lot of making sure
00:31:47.000 | that those improvements actually could go into the model
00:31:49.880 | and like we're happy with that human element of it.
00:31:53.040 | And then eventually we had to obviously distill those down
00:31:55.440 | into an eval set, but like still there's like,
00:31:57.520 | the team is just like a very, very like avid user
00:32:00.960 | of the product at all stages.
00:32:02.920 | - I think you just have to be really opinionated.
00:32:05.040 | I think that sometimes if you are,
00:32:07.760 | your intuition is just sharper
00:32:10.080 | and you can move a lot faster on the product
00:32:12.560 | because it's like, if you hold that bar high, right?
00:32:14.960 | Like if you think about like the iterative cycle,
00:32:17.240 | it's like, hey, we could take like six months
00:32:20.200 | to ship this thing, to get it to like mid where we were,
00:32:23.640 | or we could just like listen to this and be like,
00:32:25.040 | yeah, that's not it, right?
00:32:26.280 | And I don't need a rater to tell me that.
00:32:28.080 | That's my preference, right?
00:32:29.240 | And collectively, like if I have two other people
00:32:31.040 | listen to it, they'll probably agree.
00:32:33.320 | And it's just kind of this step of like,
00:32:35.040 | just keep improving it to the point where you're like,
00:32:37.200 | okay, now I think this is really impressive.
00:32:39.840 | And then like do evals, right?
00:32:42.240 | And then validate that.
00:32:43.280 | - Was the sound model done and frozen
00:32:45.560 | before you started doing all this?
00:32:46.720 | Or are you also saying,
00:32:48.640 | hey, we need to improve the sound model as well?
00:32:50.680 | - Both, yeah.
00:32:51.880 | We were making improvements on the audio
00:32:54.280 | and just like generating the transcript as well.
00:32:58.600 | I think another weird thing here was like,
00:33:00.880 | we need it to be entertaining
00:33:02.360 | and that's much harder to quantify
00:33:04.000 | than some of the other benchmarks that you can make
00:33:06.640 | for like, you know, Sweebench or get better at this math.
00:33:10.480 | - Do you just have people rate one to five
00:33:11.840 | or, you know, or just thumbs up and down?
00:33:14.000 | - For the formal rater evals,
00:33:15.440 | we have sort of like a Likert scale
00:33:17.080 | and like a bunch of different dimensions there.
00:33:19.000 | But we had to sort of break down
00:33:20.840 | that what makes it entertaining
00:33:22.640 | into like a bunch of different factors.
00:33:24.240 | But I think the team stage of that was more critical.
00:33:27.760 | It was like, we need to make sure
00:33:29.760 | that like what is making it fun and engaging.
00:33:32.000 | Like we dialed that as far as it goes.
00:33:34.160 | And while we're making other changes that are necessary,
00:33:36.640 | like obviously they shouldn't make stuff up
00:33:38.560 | or, you know, be insensitive. - Hallucinations.
00:33:40.920 | - Hallucinations. - Other safety things.
00:33:43.440 | - Right, like a bunch of safety stuff.
00:33:45.160 | - Yeah, exactly.
00:33:46.000 | So like with all of that,
00:33:47.520 | and like also just, you know,
00:33:48.840 | following sort of a coherent narrative
00:33:51.040 | and structure is really important.
00:33:52.880 | But like with all of this,
00:33:53.840 | we really had to make sure that that central tenet
00:33:57.080 | of being entertaining and engaging
00:33:59.560 | and something you actually want to listen to,
00:34:01.160 | it just doesn't go away,
00:34:02.040 | which takes like a lot of just active listening time
00:34:04.200 | 'cause you're closest to the prompts,
00:34:06.120 | the model and everything.
00:34:07.320 | - I think sometimes the difficulty is
00:34:09.400 | because we're dealing with non-deterministic models,
00:34:12.440 | sometimes you just got a bad roll of the dice
00:34:14.600 | and it's always on the distribution
00:34:15.960 | that you could get something bad.
00:34:17.600 | Basically, how many, do you like do 10 runs at a time?
00:34:20.840 | And then how do you get rid of the non-determinism?
00:34:23.400 | - Right, yeah.
00:34:24.320 | That's-- - Like bad luck.
00:34:25.640 | - Yeah, yeah, yeah.
00:34:26.480 | I mean, there still will be like bad audio overviews.
00:34:29.440 | There's like a bunch of them that happens.
00:34:31.880 | - Do you mean for like the raider emails?
00:34:33.360 | - For raiders, right?
00:34:34.400 | Like what if that one person
00:34:35.800 | just got like a really bad rating?
00:34:37.120 | You actually had a great prompt.
00:34:38.600 | You actually had a great model, great weights, whatever.
00:34:40.800 | And you just, you had a bad output.
00:34:42.520 | Like, and that's okay, right?
00:34:44.120 | - I actually think like the way that these are constructed,
00:34:48.160 | if you think about like the different types of controls
00:34:50.880 | that the user has, right?
00:34:51.720 | Like what can the user do today to affect it?
00:34:54.520 | - We push a button. - Just use your sources.
00:34:55.760 | You just push a button.
00:34:56.600 | - I have tried to prompt engineer by changing the title.
00:34:58.840 | - Yeah, yeah, yeah.
00:34:59.960 | - Changing the title, people have found out,
00:35:02.720 | the title of the notebook, people have found out
00:35:04.160 | you can add show notes, right?
00:35:05.560 | You can get them to think like the show has changed
00:35:07.960 | sort of fundamentally. - Someone changed the language
00:35:08.800 | of the output.
00:35:09.640 | - Changing the language of the output.
00:35:10.960 | Like those are less well-tested
00:35:13.240 | because we focused on like this one aspect.
00:35:16.160 | So it did change the way that we sort of think
00:35:18.720 | about quality as well, right?
00:35:20.240 | So it's like quality is on the dimensions of entertainment,
00:35:24.080 | of course, like consistency, groundedness.
00:35:26.920 | But in general, does it follow the structure
00:35:29.200 | of the deep dive?
00:35:30.720 | And I think when we talk about like non-determinism,
00:35:33.440 | it's like, well, as long as it follows like the structure
00:35:36.040 | of the deep dive, right?
00:35:37.120 | It sort of inherently meets all those other qualities.
00:35:39.960 | And so it makes it a little bit easier for us
00:35:42.520 | to ship something with confidence
00:35:44.440 | to the extent that it's like,
00:35:45.280 | I know it's gonna make a deep dive.
00:35:46.440 | It's gonna make a good deep dive.
00:35:47.560 | Whether or not the person likes it, I don't know.
00:35:49.800 | But as we expand to new formats, as we open up controls,
00:35:53.640 | I think that's where it gets really much harder,
00:35:55.840 | even with the show notes, right?
00:35:56.840 | Like people don't know what they're going to get
00:35:58.280 | when they do that.
00:35:59.320 | And we see that already where it's like,
00:36:00.880 | this is gonna be a lot harder to validate
00:36:03.480 | in terms of quality,
00:36:04.720 | where now we'll get a greater distribution.
00:36:06.320 | Whereas I don't think we really got like very distribution
00:36:09.400 | because of like that pre-process
00:36:10.760 | that Usama was talking about.
00:36:12.080 | And also because of the way that we'd constrain,
00:36:14.120 | like what were we measuring for?
00:36:16.000 | Literally just like, is it a deep dive?
00:36:18.160 | - And you determine what a deep dive is.
00:36:19.920 | - Yeah.
00:36:20.760 | - Everything needs a PM.
00:36:21.600 | I have, this is very similar
00:36:24.280 | to something I've been thinking about for AI products
00:36:25.840 | in general.
00:36:26.680 | There's always like a chief tastemaker.
00:36:28.120 | And for Notebook LM,
00:36:29.320 | it seems like it's a combination of you and Steven.
00:36:31.400 | - Well, okay.
00:36:32.240 | I want to take a step back.
00:36:33.440 | - And Usama.
00:36:34.280 | I mean, presumably for the voice stuff.
00:36:35.640 | - Usama's like the head chef, right?
00:36:38.320 | Of like deep dive, I think.
00:36:39.640 | - Potatoes.
00:36:40.480 | - Of potatoes.
00:36:41.640 | And I say this because I think even though
00:36:44.960 | we are already a very opinionated team
00:36:46.600 | and Steven, for sure, very opinionated,
00:36:48.600 | I think of the audio generations,
00:36:50.840 | like Usama was the most opinionated, right?
00:36:53.400 | And we all, we all like would say like,
00:36:55.240 | "Hey," I remember like one of the first ones he sent me,
00:36:57.280 | I was like, "Oh, I feel like
00:36:58.120 | "they should introduce themselves.
00:36:59.360 | "I feel like they should say a title."
00:37:01.080 | But then like, we would catch things like,
00:37:03.160 | maybe they shouldn't say their names.
00:37:04.640 | - Yeah, they don't say their names.
00:37:05.880 | - That was a Steven catch.
00:37:07.040 | - Yeah, yeah.
00:37:07.880 | - Like not give them names.
00:37:08.720 | - So stuff like that is just like,
00:37:10.480 | we all injected like a little bit of just like,
00:37:13.400 | "Hey, here's like my take on like how a podcast should be."
00:37:16.280 | Right, and I think like if you're a person
00:37:18.160 | who like regularly listens to podcasts,
00:37:19.880 | there's probably some collective preference there
00:37:23.320 | that's generic enough that you can standardize
00:37:24.960 | into like the deep dive format.
00:37:26.280 | But yeah, it's the new formats where I think like,
00:37:28.360 | "Oh, that's the next test."
00:37:29.760 | - Yeah, I've tried to make a clone by the way.
00:37:31.760 | Of course, everyone did.
00:37:32.720 | - Yeah.
00:37:33.560 | - Everyone in AI was like, "Oh no, this is so easy.
00:37:35.120 | "I'll just take a TTS model."
00:37:36.400 | Obviously our models are not as good as yours,
00:37:38.520 | but I tried to inject a consistent character backstory,
00:37:41.960 | like age, identity, where they went to work,
00:37:45.120 | where they went to school, what their hobbies are.
00:37:47.040 | Then it just, the models try to bring it in too much.
00:37:49.800 | I don't know if you tried this.
00:37:51.280 | So then I'm like, "Okay, like how do I define a personality
00:37:54.400 | "but it doesn't keep coming up every single time?"
00:37:57.840 | - Yeah, I mean, we have like a really, really good
00:38:00.520 | like character designer on our team.
00:38:02.320 | - What?
00:38:03.160 | Like a D&D person?
00:38:05.080 | - Just to say like we, just like we had to be opinionated
00:38:07.480 | about the format, we had to be opinionated
00:38:09.560 | about who are those two people talking.
00:38:11.680 | - Okay.
00:38:12.520 | - Right, and then to the extent that like
00:38:14.520 | you can design the format,
00:38:16.040 | you should be able to design the people as well.
00:38:18.560 | - Yeah, I would love like a, you know,
00:38:20.760 | like when you play Baldur's Gate,
00:38:22.040 | like you roll like 17 on charisma
00:38:24.920 | and like it's like what race they are, I don't know.
00:38:27.240 | - I recently, actually, I was just talking
00:38:28.640 | about character select screens.
00:38:29.960 | - Yeah.
00:38:30.800 | - I was like, I love that. - People spend hours on that.
00:38:31.640 | - I love that, right?
00:38:32.680 | And I was like, maybe there's something to be learned there
00:38:35.920 | because like people have fallen in love with the deep dive
00:38:39.600 | as a format, as a technology,
00:38:42.280 | but also as just like those two personas.
00:38:44.640 | Now, when you hear a deep dive and you've heard them,
00:38:46.520 | you're like, "I know those two," right?
00:38:48.800 | And people, it's so funny when I,
00:38:50.440 | when people are trying to find out their names,
00:38:51.960 | like it's a worthy task, it's a worthy goal.
00:38:55.400 | I know what you're doing.
00:38:56.520 | But the next step here is to sort of introduce like,
00:38:59.280 | is this like what people want?
00:39:00.560 | People want to sort of edit their personas
00:39:02.880 | or do they just want more of them?
00:39:04.240 | - I'm sure you're getting a lot of opinions
00:39:06.400 | and they all conflict with each other.
00:39:08.280 | Before we move on, I have to ask,
00:39:09.920 | because we're kind of on this topic,
00:39:11.800 | how do you make audio engaging?
00:39:13.600 | Because it's useful, not just for deep dive,
00:39:15.640 | but also for us as podcasters.
00:39:17.760 | What does engaging mean?
00:39:20.120 | If you could break it down for us, that'd be great.
00:39:22.160 | - I mean, I can try.
00:39:23.520 | Don't claim to be an expert at all.
00:39:26.000 | - So I'll give you some, like variation in tone and speed.
00:39:30.560 | You know, there's this sort of writing advice where,
00:39:32.880 | you know, this sentence is five words,
00:39:34.360 | this sentence is three, that kind of advice
00:39:36.160 | where you vary things, you have excitement,
00:39:38.240 | you have laughter, all that stuff.
00:39:39.920 | But I'd be curious how else you break down.
00:39:42.000 | - So there's the basics, like obviously structure
00:39:43.960 | that can't be meandering, right?
00:39:45.200 | Like there needs to be sort of an ultimate goal
00:39:48.360 | that the voices are trying to get to, human or artificial.
00:39:51.880 | I think one thing we find often
00:39:53.760 | is if there's just too much agreement between people,
00:39:58.000 | like that's not fun to listen to.
00:40:00.600 | So there needs to be some sort of tension and buildup,
00:40:04.000 | you know, withholding information, for example.
00:40:06.200 | Like as you listen to a story unfold,
00:40:09.240 | like you're gonna learn more and more about it.
00:40:11.240 | And audio that maybe becomes even more important
00:40:13.880 | because like you actually don't have the ability
00:40:16.680 | to just like skim to the end of something
00:40:18.480 | when you're driving or something,
00:40:19.600 | like you're gonna be hooked.
00:40:21.160 | 'Cause like there's, and that's how like,
00:40:22.680 | that's how a lot of podcasts work.
00:40:24.760 | Like maybe not interviews necessarily,
00:40:26.280 | but a lot of true crime,
00:40:28.120 | a lot of entertainment in general.
00:40:30.640 | There's just like a gradual unrolling of information.
00:40:33.880 | And that also like sort of goes back
00:40:35.360 | to the content transformation aspect of it.
00:40:37.120 | Like maybe you are going from,
00:40:39.160 | let's say the Wikipedia article of like,
00:40:41.760 | one of the history of mysteries, maybe episodes,
00:40:44.200 | like the Wikipedia article is gonna state out
00:40:46.040 | the information very differently.
00:40:47.440 | It's like, here's what happened,
00:40:48.560 | would probably be in the very first paragraph.
00:40:52.080 | And one approach we could have done is like,
00:40:53.440 | maybe a person's just narrating that thing.
00:40:56.320 | And maybe that would work for like a certain audience.
00:40:59.440 | Or I guess that's how I would picture
00:41:01.160 | like a standard history lesson to unfold.
00:41:04.040 | But like, because we're trying to put it
00:41:05.600 | in this two-person dialogue format,
00:41:08.080 | like we inject like the fact that, you know,
00:41:10.680 | there's, you don't give everything at first.
00:41:13.480 | And then you set up like differing opinions
00:41:16.080 | of the same topic or the same,
00:41:17.880 | like maybe you seize on a topic and go deeper into it
00:41:21.080 | and then try to bring yourself back out of it
00:41:23.280 | and go back to the main narrative.
00:41:25.840 | So that's mostly from like the setting up
00:41:27.960 | the script perspective.
00:41:29.880 | And then the audio, I was saying earlier,
00:41:32.280 | it's trying to be as close to just human speech as possible,
00:41:37.280 | I think was what we found success with so far.
00:41:40.240 | - Yeah.
00:41:41.080 | Like with interjections, right?
00:41:41.920 | Like, I think like when you listen to two people talk,
00:41:43.840 | there's a lot of like, yeah, yeah, right.
00:41:45.960 | And then there's like a lot of like that questioning,
00:41:47.760 | like, oh yeah, really?
00:41:49.720 | What did you think?
00:41:50.560 | - I noticed that, that's great.
00:41:52.840 | - Totally.
00:41:53.680 | - Like, so my question is,
00:41:56.760 | do you pull in speech experts to do this
00:41:59.600 | or did you just come up with it yourselves?
00:42:01.920 | You can be like, okay, talk to a whole bunch
00:42:03.960 | of fiction writers to make things engaging
00:42:06.160 | or comedy writers or whatever, stand up comedy, right?
00:42:08.520 | They have to make audio engaging.
00:42:10.360 | But audio as well, like there's professional fields
00:42:12.560 | of studying where people do this for a living,
00:42:15.600 | but us as AI engineers are just making this up as we go.
00:42:19.800 | - I mean, it's a great idea, but you definitely didn't.
00:42:22.400 | - Yeah.
00:42:23.240 | - No, I'm just like, oh.
00:42:24.360 | - My guess is you didn't.
00:42:25.640 | - Yeah.
00:42:26.480 | - There's a certain appeal to authority that people have.
00:42:28.280 | They're like, oh, like you can't do this
00:42:29.600 | 'cause you don't have any experience
00:42:31.000 | like making engaging audio,
00:42:33.280 | but that's what you literally did.
00:42:35.440 | - Right, I mean, I was literally chatting
00:42:37.160 | with someone at Google earlier today
00:42:39.840 | about how some people think that like,
00:42:41.760 | you need a linguistics person in the room
00:42:43.920 | for like making a good chatbot,
00:42:45.720 | but that's not actually true.
00:42:46.680 | 'Cause like this person went to school for linguistics
00:42:49.400 | and according to him, he's an engineer now,
00:42:51.320 | according to him, like most of his classmates
00:42:53.360 | were not actually good at language.
00:42:55.320 | Like they knew how to analyze language
00:42:58.280 | and like sort of the mathematical patterns
00:43:00.000 | and rhythms and language,
00:43:01.560 | but that doesn't necessarily mean
00:43:02.840 | they were gonna be eloquent at like,
00:43:05.400 | while speaking or writing.
00:43:06.800 | So I think, yeah, a lot of we haven't invested
00:43:09.760 | in specialists in the audio format yet,
00:43:12.720 | but maybe that would.
00:43:13.760 | - I think it's like super interesting
00:43:15.040 | because I think there's like a very human question
00:43:17.800 | of like what makes something interesting.
00:43:19.880 | And there's like a very deep question of like,
00:43:23.680 | what is it, right?
00:43:25.080 | Like, what is the quality that we are all looking for?
00:43:27.760 | Is it, does somebody have to be funny?
00:43:29.400 | Does something have to be entertaining?
00:43:30.720 | Does something have to be straight to the point?
00:43:32.800 | And I think when you try to distill that,
00:43:35.040 | this is the interesting thing I think
00:43:36.440 | about our experiment, about this particular launch is,
00:43:38.840 | first, we only launched one format.
00:43:41.000 | And so we sort of had to squeeze everything we believed
00:43:44.480 | about what an interesting thing is into one package.
00:43:48.080 | And as a result of it, I think we learned,
00:43:49.480 | it's like, hey, interacting with a chatbot
00:43:52.000 | is sort of novel at first, but it's not interesting, right?
00:43:55.520 | It's like humans are what makes interacting
00:43:58.440 | with chatbots interesting.
00:43:59.840 | It's like, ha, ha, ha, I'm gonna try to trick it.
00:44:01.880 | It's like, that's interesting, spell strawberry, right?
00:44:04.640 | This is like the fun that like people have with it.
00:44:06.960 | But like, that's not the LLM being interesting, that's you,
00:44:09.560 | just like kind of giving it your own flavor.
00:44:11.680 | But it's like, what does it mean to sort of flip it
00:44:14.160 | on its head and say, no, you be interesting now, right?
00:44:17.240 | Like you give the chatbot the opportunity to do it.
00:44:20.240 | And this is not a chatbot per se, it is like just the audio.
00:44:24.520 | And it's like the texture, I think,
00:44:26.560 | that really brings it to life.
00:44:28.600 | And it's like the things that we've described here,
00:44:30.440 | which was like, okay, now I have to like lead you
00:44:32.760 | down a path of information about like
00:44:34.880 | this commercialization deck.
00:44:36.560 | It's like, how do you do that?
00:44:38.640 | To be able to successfully do it,
00:44:40.560 | I do think that you need experts.
00:44:42.640 | I think we'll engage with experts like down the road,
00:44:45.360 | but I think it will have to be in the context of,
00:44:48.200 | well, what's the next thing we're building, right?
00:44:50.360 | It's like, what am I trying to change here?
00:44:52.240 | What do I fundamentally believe needs to be improved?
00:44:55.200 | And I think there's still like a lot more studying
00:44:57.640 | that we have to do in terms of like,
00:44:59.040 | well, what are people actually using this for?
00:45:01.320 | And we're just in such early days.
00:45:03.000 | Like it hasn't even been a month.
00:45:04.680 | - Two, three weeks, three weeks, I think.
00:45:06.880 | - Yeah.
00:45:07.720 | - I think the other, one other element to that is the,
00:45:10.280 | like the fact that you're bringing your own sources to it.
00:45:13.200 | Like it's your stuff.
00:45:14.520 | Like, you know this somewhat well,
00:45:16.600 | or you care to know about this.
00:45:18.040 | So like that, I think changed the equation
00:45:20.200 | on its head as well.
00:45:21.280 | It's like your sources and someone's telling you about it.
00:45:24.480 | So like you care about how that dynamic is,
00:45:27.200 | but you just care for it to be good enough
00:45:29.120 | to be entertaining.
00:45:30.200 | 'Cause ultimately they're talking about
00:45:31.480 | your mortgage deed or whatever.
00:45:33.760 | - So it's interesting just from the topic itself,
00:45:36.640 | even taking out all the agreements
00:45:38.560 | and the hiding of the slow reveal.
00:45:40.520 | - I mean, there's a baseline maybe,
00:45:42.000 | like if it was like too drab,
00:45:43.440 | like if it was someone who was reading it off,
00:45:44.840 | like, you know, that's like the absolute worst, but like.
00:45:47.680 | - Do you prompt for humor?
00:45:49.680 | That's a tough one, right?
00:45:51.880 | - I think it's more of a generic way
00:45:55.720 | to bring humor out if possible.
00:45:57.520 | I think humor is actually one of the hardest things.
00:45:59.640 | - Yeah.
00:46:00.480 | - But I don't know if you saw.
00:46:01.320 | - That is AGI, humor is AGI.
00:46:02.160 | - Yeah, but did you see the chicken one?
00:46:03.880 | - No.
00:46:04.720 | - Okay, if you haven't heard it.
00:46:05.920 | - We'll splice it in here.
00:46:06.800 | - Okay, yeah, yeah.
00:46:07.640 | There is a video on threads.
00:46:10.040 | I think it was by Martino Wong.
00:46:12.880 | And it's a PDF.
00:46:16.360 | - Welcome to your deep dive for today.
00:46:18.400 | - Oh yeah, get ready for a fun one.
00:46:20.160 | - Buckle up because we are diving into
00:46:24.120 | chicken, chicken, chicken, chicken, chicken.
00:46:28.640 | - You got that right.
00:46:29.760 | - By Doug Zonker.
00:46:31.080 | - Now.
00:46:31.920 | - And yes, you heard that title correctly.
00:46:33.720 | - Titles.
00:46:34.560 | - Our listener today submitted this paper.
00:46:36.480 | - Yeah, they're gonna need our help.
00:46:37.840 | - And I can totally see why.
00:46:39.320 | - Absolutely.
00:46:40.160 | - It's dense, it's baffling.
00:46:41.760 | - It's a lot.
00:46:42.600 | - And it's packed with more chicken than a KFC buffet.
00:46:46.560 | - Wait, that's hilarious, that's so funny.
00:46:49.880 | So it's like stuff like that,
00:46:51.200 | that's like truly delightful, truly surprising,
00:46:53.600 | but it's like, we didn't tell it to be funny.
00:46:55.240 | - Humor's contextual also, like super contextual
00:46:57.880 | what we're realizing.
00:46:58.720 | So we're not prompting for humor,
00:47:00.200 | but we're prompting for maybe a lot of other things
00:47:02.440 | that are bringing out that humor.
00:47:03.920 | - I think the thing about ad generated content,
00:47:06.320 | if we look at YouTube, like we do videos on YouTube
00:47:09.040 | and it's like, you know, a lot of people are screaming
00:47:11.200 | in the thumbnails to get clicks.
00:47:12.640 | There's like everybody, there's kind of like a meta
00:47:15.720 | of like what you need to do to get clicks.
00:47:18.120 | But I think in your product,
00:47:19.640 | there's no actual creator on the other side
00:47:22.480 | investing the time.
00:47:23.320 | So you can actually generate a type of content
00:47:25.400 | that is maybe not universally appealing,
00:47:28.280 | you know, at a much. - It's personal.
00:47:29.800 | - Yeah, exactly.
00:47:30.800 | I think that's the most interesting thing.
00:47:32.200 | It's like, well, is there a way for like,
00:47:35.280 | take Mr. Beast, right?
00:47:36.800 | It's like Mr. Beast optimizes videos
00:47:39.560 | to reach the biggest audience and like the most clicks.
00:47:42.320 | But what if every video could be kind of like regenerated
00:47:45.280 | to be closer to your taste, you know, when you watch it?
00:47:48.640 | - I think that's kind of the promise of AI
00:47:50.840 | that I think we are just like touching on,
00:47:53.240 | which is I think every time I've gotten information
00:47:56.280 | from somebody, they have delivered it to me
00:47:57.840 | in their preferred method, right?
00:47:59.440 | Like if somebody gives me a PDF, it's a PDF.
00:48:01.600 | Somebody gives me a hundred slide deck,
00:48:03.280 | that is the format in which I'm going to read it.
00:48:05.280 | But I think we are now living in the era
00:48:07.280 | where transformations are really possible,
00:48:09.280 | which is look, like I don't want to read
00:48:11.400 | your hundred slide deck,
00:48:12.280 | but I'll listen to a 16 minute audio overview
00:48:14.320 | on the drive home.
00:48:15.400 | - Yeah.
00:48:16.240 | - And that I think is really novel.
00:48:18.920 | And that is paving the way in a way
00:48:21.560 | that like maybe we wanted, but didn't expect.
00:48:25.120 | Where I also think you're listening to a lot of content
00:48:28.480 | that normally wouldn't have had content made about it.
00:48:31.360 | Like I watched this TikTok
00:48:32.880 | where this woman uploaded her diary from 2004.
00:48:36.240 | For sure, right?
00:48:37.080 | Like nobody was going to make a podcast about a diary.
00:48:39.280 | Like hopefully not, like it seems kind of embarrassing.
00:48:41.560 | - It's kind of creepy.
00:48:42.400 | - Yeah, it's kind of creepy.
00:48:43.520 | But she was doing this like live listen of like,
00:48:45.600 | "Oh, like here's a podcast about my diary."
00:48:48.120 | And it's like, it's entertaining right now
00:48:50.520 | to sort of all listen to it together.
00:48:52.520 | But like the connection is personal.
00:48:54.040 | It was like, it was her interacting
00:48:55.760 | with like her information in a totally different way.
00:48:58.120 | And I think that's where like,
00:48:59.440 | oh, that's a super interesting space, right?
00:49:01.080 | Where it's like, I'm creating content for myself
00:49:03.760 | in a way that suits the way that I want to consume it.
00:49:06.520 | - Or people compare like retirement plan options.
00:49:09.360 | Like no one's going to give you that content
00:49:11.520 | like for your personal financial situation.
00:49:14.880 | And like, even when we started out the experiment,
00:49:16.640 | like a lot of the goal was to go for really obscure content
00:49:21.640 | and see how well we could transform that.
00:49:23.720 | So like, if you look at the Mountain View,
00:49:25.520 | like city council meeting notes,
00:49:27.800 | like you're never going to read it.
00:49:29.200 | But like, if it was a three minute summary,
00:49:31.160 | like that would be interesting.
00:49:32.720 | - I see.
00:49:33.560 | You have one system, one prompt
00:49:35.360 | that just covers everything you threw at it.
00:49:37.720 | - Maybe.
00:49:38.800 | - No, I'm just kidding.
00:49:41.000 | It's really interesting.
00:49:41.880 | You know, I'm trying to figure out
00:49:43.840 | what you nailed compared to others.
00:49:46.440 | And I think that the way that you treat your,
00:49:48.760 | the AI is like a little bit different
00:49:50.400 | than a lot of the builders I talked to.
00:49:52.200 | So I don't know what it is you said.
00:49:54.120 | I wish I had a transcript right in front of me,
00:49:55.640 | but it's something like,
00:49:56.600 | people treat AI as like a tool for thought,
00:49:58.160 | but usually it's kind of doing their bidding.
00:50:00.800 | And you know, what you're really doing
00:50:02.600 | is loading up these like two virtual agents.
00:50:06.080 | I don't, you've never said the word agents,
00:50:07.880 | I put that in your mouth,
00:50:08.800 | but two virtual humans or AIs
00:50:11.000 | and letting them form their own opinion
00:50:13.200 | and letting them kind of just live
00:50:15.000 | and embody it a little bit.
00:50:16.560 | Is that accurate?
00:50:17.800 | - I think that that is as close to accurate as possible.
00:50:21.560 | I mean, in general, I try to be careful about saying like,
00:50:23.800 | oh, you know, letting, you know, yeah,
00:50:26.040 | like these personas live.
00:50:27.560 | But I think to your earlier question of like,
00:50:29.920 | what makes it interesting?
00:50:30.920 | That's what it takes to make it interesting.
00:50:32.360 | - Yeah.
00:50:33.200 | - Right, and I think to do it well
00:50:34.360 | is like a worthy challenge.
00:50:35.560 | I also think that it's interesting
00:50:37.840 | because they're interested, right?
00:50:39.320 | Like, is it interesting to compare-
00:50:41.200 | - The O'Carnegie thing.
00:50:42.200 | - Yeah, is it interesting to have two retirement plans?
00:50:46.320 | No, but to listen to these two talk about it,
00:50:50.120 | oh my gosh, you'd think it was like
00:50:51.320 | the best thing ever invented, right?
00:50:52.800 | It's like, get this, deep dive into 401k
00:50:57.800 | through Chase versus, you know, whatever.
00:51:00.480 | - They do do a lot of get this, which is funny.
00:51:02.920 | - I know, I know, I dream about it.
00:51:04.760 | I'm sorry.
00:51:06.920 | - There's a, I have a few more questions
00:51:10.520 | on just like the engineering around this.
00:51:13.600 | And obviously some of this is just me
00:51:15.520 | creatively asking how this works.
00:51:17.240 | How do you make decisions between
00:51:18.600 | when to trust the AI overlord to decide for you?
00:51:22.960 | In other words, stick it, let's say products as it is today,
00:51:26.640 | you want to improve it in some way.
00:51:28.920 | Do you engineer it into the system?
00:51:30.960 | Like write code to make sure it happens
00:51:34.080 | or you just stick it in a prompt
00:51:35.160 | and hope that the LM does it for you?
00:51:38.160 | Do you know what I mean?
00:51:39.000 | - Do you mean specifically about audio
00:51:40.400 | or sort of in general?
00:51:41.440 | - In general, like designing AI products,
00:51:44.120 | I think this is like the one thing
00:51:45.440 | that people are struggling with.
00:51:48.000 | And there's compound AI people
00:51:50.040 | and then there's big AI people.
00:51:51.320 | So compound AI people will be like Databricks,
00:51:53.320 | have lots of little models, chain them together
00:51:55.720 | to make an output.
00:51:56.720 | It's deterministic, you control every single piece
00:51:59.000 | and you produce what you produce.
00:52:01.160 | The open AI people, totally the opposite,
00:52:03.200 | like write one giant prompts
00:52:04.480 | and let the model figure it out.
00:52:06.040 | And obviously the answer for most people
00:52:07.880 | is going to be a spectrum in between those two,
00:52:09.680 | like big model, small model.
00:52:10.840 | When do you decide that?
00:52:11.840 | - I think it depends on the task.
00:52:13.560 | It also depends on, well, it depends on the task,
00:52:16.120 | but ultimately depends on what is your desired outcome?
00:52:19.200 | Like what am I engineering for here?
00:52:21.600 | And I think there's like several potential outputs
00:52:23.640 | and there's sort of like general categories.
00:52:24.880 | Am I trying to delight somebody?
00:52:26.080 | Am I trying to just like meet
00:52:27.200 | whatever the person is trying to do?
00:52:29.000 | Am I trying to sort of simplify a workflow?
00:52:31.120 | At what layer am I implementing this?
00:52:32.920 | Am I trying to implement this as part of the stack
00:52:35.680 | to reduce like friction,
00:52:37.840 | particularly for like engineers or something?
00:52:39.760 | Or am I trying to engineer it
00:52:40.840 | so that I deliver like a super high quality thing?
00:52:44.080 | I think that the question of like, which of those two,
00:52:47.120 | I think you're right, it is a spectrum.
00:52:49.160 | But I think fundamentally it comes down to like,
00:52:52.320 | it's a craft, like it's still a craft
00:52:54.680 | as much as it is a science.
00:52:56.360 | And I think the reality is like,
00:52:58.480 | you have to have a really strong POV
00:53:00.200 | about like what you want to get out of it
00:53:02.240 | and to be able to make that decision.
00:53:04.080 | Because I think if you don't have that strong POV,
00:53:06.240 | like you're going to get lost in sort of the detail
00:53:07.960 | of like capability.
00:53:09.440 | And capability is sort of the last thing that matters
00:53:12.360 | because it's like models will catch up, right?
00:53:14.360 | Like models will be able to do, you know,
00:53:16.280 | whatever in the next five years, it's going to be insane.
00:53:18.880 | So I think this is like a race to like value.
00:53:21.600 | And it's like really having a strong opinion about like,
00:53:24.480 | what does that look like today?
00:53:25.720 | And how far are you going to be able to push it?
00:53:28.080 | Sorry, I think maybe that was like very like philosophical.
00:53:31.120 | - It's fine, we get there.
00:53:32.520 | And I think that hits a lot of the points it's going to make.
00:53:35.320 | I tweeted today, or I ex-posted, whatever,
00:53:38.840 | that we're going to interview you
00:53:41.040 | on what we should ask you.
00:53:42.120 | So we got a list of feature requests, mostly.
00:53:45.400 | It's funny, nobody actually had any like specific questions
00:53:48.840 | about how the product was built.
00:53:50.000 | They just want to know when you're releasing some feature.
00:53:52.280 | So I know you cannot talk about all of these things,
00:53:54.760 | but I think maybe it will give people an idea
00:53:56.720 | of like where the product is going.
00:53:58.120 | So I think the most common question,
00:53:59.960 | I think five people asked is like,
00:54:01.560 | are you going to build an API?
00:54:03.280 | And, you know, do you see this product
00:54:05.320 | as still be kind of like a full head product,
00:54:07.840 | where I can log in and do everything there?
00:54:09.920 | Or do you want it to be a piece of infrastructure
00:54:12.000 | that people build on?
00:54:13.120 | - I mean, I think, why not both?
00:54:15.920 | I think we work at a place where you could have both.
00:54:18.840 | I think that end user products,
00:54:21.440 | like products that touch the hands of users,
00:54:23.320 | have a lot of value.
00:54:24.640 | For me personally, like we learn a lot
00:54:26.200 | about what people are trying to do
00:54:27.400 | and what's like actually useful
00:54:29.040 | and what people are ready for.
00:54:30.920 | And so we're going to keep investing in that.
00:54:33.640 | I think at the same time, right,
00:54:35.520 | there are a lot of developers that are interested
00:54:37.840 | in using the same technology to build their own thing.
00:54:40.120 | We're going to look into that.
00:54:41.720 | How soon that's going to be ready, I can't really comment,
00:54:44.080 | but these are the things that like, hey, we heard it.
00:54:47.120 | We're trying to figure it out.
00:54:48.440 | And I think there's room for both.
00:54:50.360 | - Is there a world in which this becomes
00:54:52.240 | a default Gemini interface
00:54:53.520 | because it's technically different org?
00:54:55.280 | - It's such a good question.
00:54:56.480 | And I think every time someone asks me, it's like,
00:54:58.600 | hey, I just leaned over Gilliam.
00:55:00.640 | (laughing)
00:55:02.200 | We'll ask the Gemini folks what they think.
00:55:05.200 | - Multilingual support.
00:55:06.640 | I know people kind of hack this a little bit together.
00:55:09.040 | Any ideas for full support,
00:55:10.840 | but also I'm mostly interested in dialects.
00:55:13.440 | In Italy, we have Italian obviously,
00:55:15.720 | but we have a lot of local dialects.
00:55:17.440 | Like if you go to Rome, people don't really speak Italian,
00:55:19.600 | they speak local dialect.
00:55:21.240 | Do you think there's a path to which these models,
00:55:24.240 | especially the speech can learn very like niche dialects,
00:55:28.440 | like how much data do you need?
00:55:29.960 | Can people contribute?
00:55:31.120 | Like, I'm curious if you see this as a possibility.
00:55:35.560 | - So I guess high level,
00:55:36.800 | like we're definitely working on adding more languages.
00:55:39.640 | That's like top priority.
00:55:41.240 | We're going to start small,
00:55:42.560 | but like theoretically we should be able to cover
00:55:44.600 | like most languages pretty soon.
00:55:46.840 | - What a ridiculous statement by the way, that's crazy.
00:55:49.360 | - Unlike the soon or the pretty soon part.
00:55:52.480 | - No, but like, you know, a few years ago,
00:55:54.680 | like a small team of like, I don't know, 10 people saying
00:55:57.120 | that we will support the top 100, 200 languages
00:55:59.640 | is like absurd, but you can do it.
00:56:02.400 | You can do it.
00:56:03.240 | - And I think like the speech team, you know,
00:56:06.320 | we are a small team,
00:56:07.720 | but the speech team is another team and the modeling team,
00:56:11.080 | like these folks are just like absolutely brilliant
00:56:13.920 | at what they do.
00:56:14.760 | And I think like when we've talked to them and we've said,
00:56:16.760 | hey, you know, how about more languages?
00:56:18.600 | How about more voices?
00:56:19.560 | How about dialects, right?
00:56:20.640 | This is something that like they are game to do.
00:56:23.600 | And like, that's the roadmap for them.
00:56:25.840 | The speech team supports like a bunch of other efforts
00:56:27.920 | across Google, like Gemini Live, for example,
00:56:30.120 | is also the models built by the same,
00:56:32.040 | like sort of deep mind speech team.
00:56:34.160 | But yeah, the thing about dialects is really interesting.
00:56:36.320 | 'Cause like in some of our sort of earliest testing
00:56:39.560 | with trying out other languages,
00:56:40.800 | we actually noticed that sometimes it wouldn't stick
00:56:44.200 | to a certain dialect, especially for like,
00:56:46.720 | I think for French, we noticed that like
00:56:48.440 | when we presented it to like a native speaker,
00:56:50.280 | it would sometimes go from like a Canadian person
00:56:52.440 | speaking French versus like a French person speaking French
00:56:54.880 | or an American person speaking French,
00:56:56.280 | which is not what we wanted.
00:56:58.560 | So there's a lot more sort of speech quality work
00:57:01.360 | that we need to do there to make sure that it works reliably
00:57:04.240 | and at least sort of like the standard dialect that we want.
00:57:07.680 | But that does show that there's potential
00:57:09.360 | to sort of do the thing that you're talking about
00:57:11.240 | of like fixing a dialect that you want,
00:57:13.720 | maybe contribute your own voice
00:57:15.680 | or like you pick from one of the options.
00:57:17.920 | There's a lot more headroom there.
00:57:19.760 | - Yeah, because we have movies.
00:57:21.280 | Like we have old Roman movies
00:57:23.120 | that are like different languages,
00:57:25.040 | but there's not that many, you know?
00:57:26.920 | So I'm always like, well,
00:57:28.320 | I'm sure like the Italian is so strong in the model
00:57:31.480 | that like when you're trying to like pull that away from it,
00:57:33.720 | like you kind of need a lot, but-
00:57:35.440 | - Right, that's all sort of like
00:57:36.960 | wonderful deep mind speech team.
00:57:38.640 | - Yeah. - Yeah, yeah, yeah.
00:57:39.880 | - Well, anyway, if you need Italian, he's got you.
00:57:41.520 | - Yeah, yeah, yeah.
00:57:42.360 | - I got him on, I got him on.
00:57:44.200 | - Specifically, it's English, I got you.
00:57:46.200 | The managing system prompt, people want a lot of that.
00:57:49.400 | I assume yes-ish.
00:57:51.200 | Definitely looking into it for just core notebook LM.
00:57:55.600 | Like everybody's wanted that forever.
00:57:57.360 | So we're working on that.
00:57:58.560 | I think for the audio itself,
00:58:01.080 | we are trying to figure out the best way to do it.
00:58:03.760 | So we'll launch something sooner rather than later.
00:58:06.840 | So we'll probably stage it.
00:58:08.280 | And I think like, you know, just to be fully transparent,
00:58:10.840 | we'll probably launch something
00:58:12.320 | that's more of a fast follow
00:58:13.480 | than like a fully baked feature first.
00:58:15.120 | Just because like I see so many people
00:58:16.720 | put in like the fake show notes,
00:58:18.480 | it's like, hey, I'll help you out.
00:58:19.720 | We'll just put a text fax or something, yeah.
00:58:21.560 | - I think a lot of people are like, this is almost perfect,
00:58:23.800 | but like, I just need that extra 10, 20%.
00:58:25.960 | - Yeah.
00:58:26.800 | - I noticed that you say no a lot, I think,
00:58:29.160 | or you try to ship one thing.
00:58:30.840 | - Yeah.
00:58:31.680 | - And that is different about you
00:58:33.200 | than maybe other PMs or other eng teams
00:58:35.840 | that try to ship, they're like, oh, here are all the knobs.
00:58:38.080 | I'm just, take all my knobs.
00:58:39.800 | - Yeah, yeah.
00:58:40.640 | - Top P, top K, it doesn't matter.
00:58:42.160 | I'll just put it in the docs and you figure it out, right?
00:58:44.120 | - That's right, that's right.
00:58:45.200 | - Whereas for you, it's you actually just,
00:58:47.600 | you make one product.
00:58:49.280 | - Yeah.
00:58:50.120 | - As opposed to like 10 you could possibly have done.
00:58:51.640 | - Yeah, yeah.
00:58:52.480 | - I don't know, it's interesting.
00:58:53.320 | - I think about this a lot.
00:58:54.160 | I think it requires a lot of discipline
00:58:55.440 | because I thought about the knobs.
00:58:57.760 | I was like, oh, I saw on Twitter, you know, on X,
00:59:01.000 | people want the knobs, like, great.
00:59:02.600 | Started mocking it up, making the text boxes,
00:59:05.040 | designing like the little fiddles, right?
00:59:06.960 | And then I looked at it and I was kind of sad.
00:59:08.720 | I was like, oh, right, it's like, oh, it's like,
00:59:11.040 | this is not cool, this is not fun, this is not magical.
00:59:14.080 | It is sort of exactly what you would expect knobs to be.
00:59:19.080 | But then, you know, it's like, oh, I mean,
00:59:21.520 | how much can you, you know, design a knob?
00:59:24.520 | I thought about it, I was like,
00:59:26.120 | but the thing that people really liked
00:59:27.960 | was that there wasn't any.
00:59:29.600 | They just pushed a button.
00:59:30.440 | - One button.
00:59:31.280 | - And it was cool.
00:59:32.240 | And so I was like, how do we bring more of that, right?
00:59:34.920 | That still gives the user the optionality that they want.
00:59:37.760 | And so this is where, like,
00:59:38.680 | you have to have a strong POV, I think.
00:59:40.520 | You have to like really boil down,
00:59:42.040 | what did I learn in like the month
00:59:43.920 | since I've launched this thing that people really want?
00:59:47.120 | And I can give it to them while preserving like that,
00:59:49.800 | that delightful sort of fun experience.
00:59:52.400 | And I think that's actually really hard.
00:59:54.120 | Like, I'm not gonna come up with that by myself.
00:59:55.800 | I'm like, that's something
00:59:56.640 | that like our team thinks about every day.
00:59:58.200 | We all have different ideas.
00:59:59.760 | We're all experimenting with sort of how to get the most
01:00:03.200 | out of like the insight and also ship it quick.
01:00:05.720 | So we'll see, we'll find out soon
01:00:07.760 | if people like it or not.
01:00:08.600 | - I think the other interesting thing
01:00:09.800 | about like AI development now
01:00:12.000 | is that the knobs are not necessarily,
01:00:15.080 | like going back to all the sort of like craft
01:00:18.240 | and like human taste and all of that
01:00:20.560 | that went into building it.
01:00:21.880 | Like the knobs are not as easy to add as simply like,
01:00:26.400 | I'm gonna add a parameter to this
01:00:28.560 | and it's gonna make it happen.
01:00:29.880 | It's like, you kind of have to redo
01:00:31.880 | the quality process for everything.
01:00:34.200 | But the prioritization is also different though.
01:00:36.800 | - It goes back to sort of like,
01:00:38.040 | it's a lot easier to do an eval
01:00:39.480 | for like the deep dive format than if like,
01:00:41.720 | okay, now I'm gonna let you inject
01:00:43.600 | like these random things, right?
01:00:45.160 | Okay, how am I gonna measure quality?
01:00:46.600 | Either I say, well, I don't care
01:00:48.520 | because like you just input whatever.
01:00:50.880 | Or I say, actually wait, right?
01:00:53.000 | Like I wanna help you get the best output ever.
01:00:54.920 | What's it going to take?
01:00:55.960 | - The knob actually needs to work reliably.
01:00:58.160 | - Yeah. - Yeah.
01:00:59.000 | Very important point.
01:01:00.000 | - Two more things we definitely wanna talk about.
01:01:02.160 | I guess now people equivalent notebook LM
01:01:05.040 | to like a podcast generator,
01:01:06.520 | but I guess, you know,
01:01:08.040 | there's a whole product suite there.
01:01:10.320 | How should people think about that?
01:01:11.800 | Like, is this, and also like the future of the product
01:01:14.720 | as far as monetization too, you know?
01:01:16.840 | Like, is it gonna be,
01:01:18.560 | the voice thing gonna be a core to it?
01:01:20.160 | Is it just gonna be one output modality
01:01:22.160 | and like you're still looking to build like a broader
01:01:24.400 | kind of like a interface with data and documents platform?
01:01:27.840 | - I mean, that's such a good question
01:01:29.960 | that I think the answer it's,
01:01:32.560 | I'm waiting to get more data.
01:01:34.640 | I think because we are still in the period
01:01:36.600 | where everyone's really excited about it.
01:01:38.600 | Everyone's trying it.
01:01:40.080 | I think I'm getting a lot of sort of like positive feedback
01:01:42.760 | on the audio.
01:01:43.960 | We have some early signal that says it's a really good hook,
01:01:47.240 | but people stay for the other features.
01:01:49.040 | So that's really good too.
01:01:50.360 | I was making a joke yesterday.
01:01:51.400 | I was like, it'd be really nice, you know,
01:01:53.560 | if it was just the audio,
01:01:55.720 | 'cause then I could just like simplify the train, right?
01:01:58.360 | I don't have to think about all this other functionality.
01:02:00.800 | But I think the reality is that the framework
01:02:03.960 | kind of like what we were talking about earlier
01:02:05.440 | that we had laid out,
01:02:06.280 | which is like, you bring your own sources,
01:02:07.880 | there's something you do in the middle,
01:02:09.240 | and then there's an output is that really extensible one.
01:02:12.120 | And it's a really interesting one.
01:02:13.320 | And I think like, particularly when we think about
01:02:16.080 | what a big business looks like,
01:02:17.960 | especially when we think about commercialization,
01:02:20.280 | audio is just one such modality.
01:02:23.520 | But the editor itself,
01:02:25.080 | like the space in which you're able to do these things
01:02:27.800 | is like, that's the business, right?
01:02:29.480 | Like maybe the audio by itself, not so much,
01:02:32.160 | but like in this big package,
01:02:33.640 | like, oh, I could see that.
01:02:34.480 | I could see that being like a really big business.
01:02:37.240 | - Yep.
01:02:38.080 | Any thoughts on some of the alternative
01:02:40.160 | interact with data and documents thing,
01:02:42.080 | like cloud artifacts, like a JGBD canvas,
01:02:45.640 | you know, kind of how do you see,
01:02:47.280 | maybe where notebook LM stars,
01:02:48.760 | but like Gemini starts,
01:02:50.520 | like you have so many amazing teams and products at Google
01:02:53.280 | that sometimes like, I'm sure you have to figure that out.
01:02:56.200 | - Yeah, well, I love artifacts.
01:02:59.480 | I played a little bit with canvas.
01:03:00.720 | I got a little dizzy using it.
01:03:02.120 | I was like, oh, there's something, well, you know,
01:03:04.600 | I like the idea of it fundamentally,
01:03:06.960 | but something about the UX was like,
01:03:08.400 | oh, this is like more disorienting than like artifacts.
01:03:11.000 | And I couldn't figure out what it was.
01:03:12.440 | And I didn't spend a lot of time thinking about it,
01:03:14.600 | but I love that, right?
01:03:16.640 | Like the thing where you are like,
01:03:18.480 | I'm working with, you know, an LLM, an agent,
01:03:21.560 | a chap or whatever to create something new.
01:03:24.280 | And there's like the chat space.
01:03:26.200 | There's like the output space.
01:03:27.880 | I love that.
01:03:28.760 | And the thing that I think I feel angsty about
01:03:31.600 | is like, we've been talking about this for like a year,
01:03:34.680 | right?
01:03:35.520 | Like, of course, like, I'm going to say that,
01:03:36.720 | but it's like, but like for a year now,
01:03:38.640 | I've had these like mocks that I was just like,
01:03:40.880 | I want to push the button, but we prioritize other things.
01:03:43.920 | We were like, okay, what can we like really win at?
01:03:46.200 | And like, we prioritize audio, for example, instead of that.
01:03:49.440 | But just like when people were like,
01:03:51.160 | oh, what is this magic draft thing?
01:03:52.560 | Oh, it's like a hundred percent, right?
01:03:54.120 | It's like stuff like that,
01:03:55.880 | that we want to try to build into notebook too.
01:03:57.560 | And I'd made this comment on Twitter as well,
01:03:59.880 | where I was like, now I don't know, actually, right?
01:04:02.720 | I don't actually know if that is the right thing.
01:04:05.240 | Like, are people really getting utility out of this?
01:04:07.680 | I mean, from the launches,
01:04:09.080 | it seems like people are really getting it.
01:04:11.000 | But I think now if we were to ship it,
01:04:12.960 | I have to rev on it like one layer more, right?
01:04:15.120 | I have to deliver like a differentiating value
01:04:17.800 | compared to like artifacts, which is hard.
01:04:20.200 | - Which is, because you've,
01:04:21.880 | you demonstrated the ability to fast follow.
01:04:24.160 | So you don't have to innovate every single time.
01:04:26.680 | - I know, I know.
01:04:27.520 | I think for me, it's just like,
01:04:28.960 | the bar is high to ship.
01:04:30.760 | And when I say that, I think it's sort of like,
01:04:32.480 | conceptually, like the value that you deliver to the user.
01:04:34.640 | I mean, you'll see a notebook alarm.
01:04:36.160 | There are a lot of corners that I have personally cut,
01:04:38.560 | where it's like, our UX designer is always like,
01:04:40.920 | I can't believe you let us ship
01:04:42.880 | with like these ugly scroll bars.
01:04:44.440 | And I'm like, no one notices, I promise.
01:04:47.120 | He's like, no, everyone.
01:04:48.880 | It's a screenshot, this thing.
01:04:50.160 | But I mean, kidding aside, I think that's true,
01:04:52.800 | that it's like, we do want to be able to fast follow,
01:04:54.960 | but I think we want to make sure
01:04:56.280 | that things also land really well.
01:04:58.120 | So the utility has to be there.
01:04:59.720 | - Code, especially on our podcast, has a special place.
01:05:03.160 | Is code notebook LLM interesting to you?
01:05:06.160 | I haven't, I've never,
01:05:07.280 | I don't see like a connect my GitHub to this thing.
01:05:09.880 | - Yeah, yeah.
01:05:10.720 | I think code is a big one.
01:05:12.560 | Code is a big one.
01:05:13.600 | I think we have been really focused,
01:05:15.800 | especially when we had like a much smaller team,
01:05:17.960 | we were really focused on like,
01:05:18.920 | let's push like an end-to-end journey together.
01:05:21.200 | Let's prove that we can do that.
01:05:22.720 | Because then once you lay the groundwork of like,
01:05:24.880 | sources, do something in the chat,
01:05:27.200 | output, once you have that,
01:05:28.360 | you just scale it up from there, right?
01:05:30.080 | And it's like, now it's just a matter of like,
01:05:31.840 | scaling the inputs, scaling the outputs,
01:05:33.880 | scaling the capabilities of the chat.
01:05:35.800 | So I think we're going to get there.
01:05:37.400 | And now I also feel like I have a much better view
01:05:40.760 | of like where the investment is required.
01:05:43.160 | Whereas previously I was like,
01:05:44.320 | hey, like, let's flesh out the story first
01:05:46.160 | before we put more engineers on this thing,
01:05:47.840 | because that's just going to slow us down.
01:05:49.880 | - For what it's worth, the model still understands code.
01:05:52.280 | So like, I've seen at least one or two people
01:05:55.000 | just like, download their GitHub repo,
01:05:57.080 | put it in there and get like an audio overview of your code.
01:06:00.320 | - Yeah, yeah.
01:06:01.160 | - I've never tried that.
01:06:02.000 | - This is like, these are all,
01:06:03.360 | all the files are connected together.
01:06:04.880 | 'Cause the model still understands code.
01:06:06.280 | Like, even if you haven't like, optimized for it.
01:06:07.560 | - I think on sort of like the creepy side of things,
01:06:10.920 | I did watch a student, like with her permission, of course,
01:06:13.720 | I watched her do her homework in Notebook LM.
01:06:17.200 | And I didn't tell her like, what kind of homework to bring,
01:06:20.480 | but she brought like her computer science homework.
01:06:23.520 | And I was like, oh.
01:06:24.440 | And she uploaded it and she said,
01:06:26.760 | here's my homework, read it.
01:06:28.600 | And it was just the instructions.
01:06:29.800 | And Notebook LM was like, okay, I've read it.
01:06:32.760 | And the student was like, okay, here's my code so far.
01:06:37.080 | And she copy pasted it from the editor.
01:06:39.120 | And she was like, check my homework.
01:06:41.440 | And Notebook LM was like, well, number one is wrong.
01:06:44.040 | And I thought that was really interesting,
01:06:45.480 | 'cause it didn't tell her what was wrong.
01:06:46.840 | It just said it's wrong.
01:06:48.120 | And she was like, okay, don't tell me the answer,
01:06:50.720 | but like, walk me through like how you'd think about this.
01:06:53.360 | And it was, what was interesting for me
01:06:55.800 | was that she didn't ask for the answer.
01:06:58.000 | And I asked her, I was like, oh, why did you do that?
01:06:59.480 | And she was like, well, I actually want to learn it.
01:07:01.240 | She was like, 'cause I'm going to have to take a quiz
01:07:02.520 | on this at some point.
01:07:03.520 | And I was like, oh yeah, this is a really good point.
01:07:05.920 | And it was interesting because, you know,
01:07:07.920 | Notebook LM, while the formatting wasn't perfect,
01:07:09.880 | like did say like, hey, have you thought about using,
01:07:12.560 | you know, maybe an integer instead of like this?
01:07:14.760 | And so that was really interesting.
01:07:16.960 | - Are you adding like real-time chat on the output?
01:07:19.880 | Like, you know, there's kind of like the deep dive show
01:07:22.400 | and then there's like the listeners call in and say, hey.
01:07:26.400 | - Yeah, we're actively, that's one of the things
01:07:28.040 | we're actively prioritizing.
01:07:29.560 | Actually, one of the interesting things is now we're like,
01:07:31.880 | why would anyone want to do that?
01:07:33.560 | Like, what are the actual, like kind of going back
01:07:35.960 | to sort of having a strong POV about the experience.
01:07:38.760 | It's like, what is better?
01:07:41.040 | Like, what is fundamentally better about doing that?
01:07:43.040 | That's not just like being able to Q&A your notebook.
01:07:45.400 | How is that different from like a conversation?
01:07:47.480 | Is it just the fact that like there was a show
01:07:50.120 | and you want to tweak the show?
01:07:51.880 | Is it because you want to participate?
01:07:53.760 | So I think there's a lot there
01:07:55.120 | that like we can continue to unpack, but yes,
01:07:57.400 | that's coming.
01:07:58.240 | - It's because I formed a parasocial relationship.
01:08:00.640 | - Yeah, I just want to be part of your life.
01:08:03.720 | - Get this.
01:08:04.560 | - Totally.
01:08:07.240 | - Yeah, but it is obviously because OpenAI
01:08:09.720 | has just launched a real-time chat.
01:08:11.080 | It's a very hot topic.
01:08:12.800 | I would say one of the toughest AI engineering disciplines
01:08:16.320 | out there because even their API
01:08:19.320 | doesn't do interruptions that well.
01:08:21.840 | To be honest and you know, yeah.
01:08:23.640 | So real-time chat is tough.
01:08:25.280 | - I love that thing.
01:08:26.120 | I love it, yeah.
01:08:27.200 | - Okay, so we have a couple of ways to end,
01:08:30.320 | either call to action or laying out one principle
01:08:33.040 | of AI PMing or engineering that you really
01:08:36.280 | think about a lot.
01:08:37.240 | Is there anything that comes to mind?
01:08:39.240 | - I feel like that's a test.
01:08:40.080 | Of course, I'm going to say go to notebooklm.google.com.
01:08:43.760 | Try it out, join the Discord and tell us what you think.
01:08:46.720 | - Yeah, especially like you have a technical audience.
01:08:49.240 | What do you want from a technical engineering audience?
01:08:52.360 | - I mean, I think it's interesting
01:08:54.240 | because the technical and engineering audience
01:08:55.960 | typically will just say, "Hey, where's the API?"
01:08:58.440 | But you know, and I think we addressed it.
01:09:00.080 | But I think what I would really be interested to discover
01:09:03.240 | is, is this useful to you?
01:09:05.160 | Why is it useful?
01:09:06.000 | What did you do?
01:09:06.960 | Right, is it useful tomorrow?
01:09:08.160 | How about next week?
01:09:09.000 | Just the most useful thing for me is if you do stop using it
01:09:12.240 | or if you do keep using it, tell me why.
01:09:14.240 | Because I think contextualizing it within your life,
01:09:16.880 | right, your background, your motivations,
01:09:19.440 | like is what really helps me build really cool things.
01:09:22.280 | - And then one piece of advice for AI PMs.
01:09:24.640 | - Okay, if I had to pick one, it's just always be building.
01:09:28.400 | Like build things yourself.
01:09:29.360 | I think like for PMs, it's like such a critical skill
01:09:32.200 | and just like take time to like pop your head up
01:09:34.800 | and see what else is new out there.
01:09:36.480 | On the weekends, I try to have a lot of discipline.
01:09:38.680 | Like I only use ChatGPT and like Cloud on the weekend.
01:09:41.880 | I try to like use like the APIs.
01:09:44.080 | Occasionally I'll try to build something
01:09:46.000 | on like GCP over the weekend,
01:09:47.680 | 'cause like I don't do that normally like at work.
01:09:50.480 | But it's just like the rigor of just trying
01:09:53.160 | to be like a builder yourself.
01:09:55.400 | And even just like testing, right?
01:09:56.600 | Like you can have an idea of like how a product should work
01:09:59.040 | and maybe your engineers are building it.
01:10:00.720 | But it's like, what was your like proof of concept, right?
01:10:03.280 | Like what gave you conviction that that was the right thing?
01:10:06.080 | - Call to action.
01:10:07.000 | - I feel like consistently like the most magical moments
01:10:10.840 | out of like AI building come about for me
01:10:13.800 | when like I'm really, really, really just close
01:10:16.440 | to the edge of the model capability.
01:10:19.120 | And sometimes it's like farther than you think it is.
01:10:21.360 | Like I think while building this product,
01:10:23.560 | some of the other experiments,
01:10:24.560 | like there were phases where it was like easy
01:10:26.160 | to think that you've like approached it.
01:10:28.240 | But like sometimes at that point,
01:10:29.600 | what you really need is to like show your thing to someone
01:10:32.120 | and like they'll come up with creative ways to improve it.
01:10:34.800 | Like we're all sort of like learning, I think.
01:10:37.400 | So yeah, like I feel like unless you're hitting
01:10:39.400 | that bound of like, this is what Gemini 1.5 can do,
01:10:43.720 | probably like the magic moment is like somewhere there,
01:10:46.120 | like in that sort of limit.
01:10:48.520 | - So push the edge of the capability.
01:10:50.560 | - Yeah, totally.
01:10:51.880 | - It's funny because we had a Nicola Scarlini
01:10:54.120 | from DeepMind on the pod.
01:10:55.640 | And he was like, if the model is always successful,
01:10:57.880 | you're probably not trying hard enough
01:10:59.560 | to like give it hard.
01:11:00.400 | - Right.
01:11:01.240 | - So yeah.
01:11:03.160 | - My problem is like sometimes I'm not smart enough
01:11:05.200 | to judge.
01:11:06.040 | - Yeah, right.
01:11:06.880 | (laughing)
01:11:08.880 | - I think like that's, I hear that a lot.
01:11:11.160 | Like people are always like, I don't know how to use it.
01:11:13.600 | Yeah, and it's hard.
01:11:15.080 | Like I remember the first time I used Google search,
01:11:16.800 | I was like, what do we type?
01:11:18.080 | My dad was like, anything.
01:11:19.680 | It's like anything, I got nothing in my brain, dad.
01:11:21.720 | (laughing)
01:11:23.000 | What do you mean?
01:11:23.840 | And I think there's a lot of like for product builders
01:11:26.280 | is like have a strong opinion about like,
01:11:28.320 | what is the user supposed to do?
01:11:30.000 | - Yeah.
01:11:30.840 | - Help them do it.
01:11:31.680 | - Principle for AI engineers or like just one advice
01:11:35.520 | that you have others?
01:11:36.760 | - I guess like, in addition to pushing the bounds
01:11:39.440 | and to do that, that often means like,
01:11:41.400 | you're not gonna get it right in the first go.
01:11:43.880 | So like, don't be afraid to just like,
01:11:46.320 | batch multiple models together.
01:11:49.360 | I guess that's, I'm basically describing an agent,
01:11:51.600 | but more thinking time equals
01:11:53.720 | just better results consistently.
01:11:55.520 | And that holds true for probably every single time
01:11:59.560 | that I've tried to build something.
01:12:01.240 | - Well, at some point we will talk about
01:12:02.840 | the sort of longer inference paradigm.
01:12:04.720 | It seems like DeepMind is rumored
01:12:06.560 | to be coming out with something.
01:12:07.760 | You can't comment, of course.
01:12:09.080 | Yeah, well, thank you so much.
01:12:10.440 | You know, you've created, I actually said,
01:12:12.760 | I think you saw this.
01:12:13.880 | I think that Notebook LLM was kind of like
01:12:15.800 | the ChatGPC moment for Google.
01:12:17.920 | - Yeah, that was so crazy when I saw that.
01:12:19.720 | I was like, what?
01:12:20.560 | Like ChatGPC was huge for me.
01:12:22.400 | And I think, you know, when you said it
01:12:24.680 | and other people have said it, I was like, is it?
01:12:27.600 | - Yeah.
01:12:28.440 | - That's crazy, that's so cool.
01:12:29.280 | - People weren't like really cognizant
01:12:30.600 | of Notebook LLM before and audio overviews
01:12:32.880 | and Notebook LLM, like unlocked the, you know,
01:12:36.960 | a use case for people in a way that
01:12:39.200 | I would go so far as to say cloud projects never did.
01:12:41.760 | And I don't know, you know,
01:12:43.360 | I think a lot of it is competent PMing and engineering,
01:12:46.240 | but also just, you know, it's interesting how
01:12:48.680 | a lot of these projects are always
01:12:50.080 | like low key research previews.
01:12:51.960 | For you, it's like, you're a separate org,
01:12:53.480 | but like, you know, you built products and UI innovation
01:12:56.880 | on top of also working with research to improve the model.
01:12:59.920 | That was a success.
01:13:01.200 | That wasn't planned to be this whole big thing.
01:13:04.080 | You know, your TPUs were on fire, right?
01:13:06.280 | - Oh my gosh, that was so funny.
01:13:08.320 | I didn't know people would like really
01:13:09.840 | catch on to the Elmo fire,
01:13:11.640 | but it was just like one of those things
01:13:13.320 | where I was like, you know, we had to ask for more TPUs.
01:13:16.720 | Yeah, many times.
01:13:18.600 | And, you know, it was a little bit of a subtweet of like,
01:13:21.480 | hey, reminder, give us more TPUs down here.
01:13:25.000 | - It's weird.
01:13:25.840 | I just think like when people try to make big launches,
01:13:28.360 | then they flop.
01:13:29.200 | And then like when they're not trying
01:13:30.480 | and they're just trying to build a good thing,
01:13:32.680 | then they succeed.
01:13:33.640 | It's this fundamentally really weird magic
01:13:36.520 | that I haven't really encapsulated yet,
01:13:38.800 | but you've done it.
01:13:40.040 | - Thank you.
01:13:40.880 | And you know, I think we'll just keep going
01:13:43.200 | in like the same way.
01:13:44.040 | We just keep trying, keep trying to make it better.
01:13:45.760 | - Yeah, I hope so.
01:13:46.880 | All right, cool.
01:13:47.720 | Thank you.
01:13:48.560 | - Thank you. Thanks for having us.
01:13:49.400 | - Thanks.
01:13:50.240 | (upbeat music)
01:13:52.820 | (upbeat music)
01:13:55.400 | (upbeat music)