back to index

Building AGI in Real Time (OpenAI Dev Day 2024)


Chapters

0:0 Intro by Suno.ai
1:23 NotebookLM Recap of DevDay
9:25 Ilan's Strawberry Demo with Realtime Voice Function Calling
19:16 Olivier Godement, Head of Product, OpenAI
36:57 Romain Huet, Head of DX, OpenAI
47:8 Michelle Pokrass, API Tech Lead at OpenAI ft. Simon Willison
64:45 Alistair Pullen, CEO, Cosine (Genie)
78:31 Sam Altman + Kevin Weill Q&A
123:7 Notebook LM Recap of Podcast

Whisper Transcript | Transcript Only Page

00:00:00.000 | (upbeat music)
00:00:02.580 | ♪ A new dev daylights, code ignites ♪
00:00:09.660 | ♪ Real time voice streams reach new heights ♪
00:00:12.760 | ♪ O1 and GPT for O in flight ♪
00:00:16.560 | ♪ Finds when the future data in sight ♪
00:00:20.020 | ♪ Schema sync up, outputs precise ♪
00:00:23.640 | ♪ Distill the models, efficiency spliced ♪
00:00:28.680 | ♪ Pip sockets blaze, connections flow ♪
00:00:31.620 | ♪ Voice AI live, watch innovation grow ♪
00:00:34.000 | - Happy October.
00:00:35.000 | This is your AI co-host, Charlie.
00:00:38.200 | One of our longest standing traditions
00:00:40.120 | is covering major AI and ML conferences in podcast format.
00:00:44.160 | Delving, yes delving, into the vibes
00:00:46.640 | of what it is like to be there stitched in
00:00:48.640 | with short samples of conversations with key players
00:00:51.600 | just to help you feel like you were there.
00:00:54.360 | Covering this year's dev day
00:00:55.880 | was significantly more challenging
00:00:57.960 | because we were all requested
00:00:59.840 | not to record the opening keynotes.
00:01:02.600 | So in place of the opening keynotes,
00:01:04.880 | we had the viral notebook LM deep dive crew,
00:01:08.040 | my new AI podcast, Nemesis,
00:01:10.800 | give you a seven minute recap
00:01:12.640 | of everything that was announced.
00:01:15.120 | Of course, you can also check the show notes for details.
00:01:18.280 | I'll then come back with an explainer
00:01:20.000 | of all the interviews we have for you today.
00:01:22.080 | Watch out and take care.
00:01:23.920 | All right, so we've got a pretty hefty stack
00:01:27.520 | of articles and blog posts here
00:01:29.360 | all about OpenAI's dev day 2024.
00:01:33.120 | - Yeah, lots to dig into there.
00:01:34.720 | - Seems like you're really interested in what's new with AI.
00:01:37.480 | - Definitely.
00:01:38.320 | And it seems like OpenAI had a lot to announce.
00:01:41.360 | New tools, changes to the company, it's a lot.
00:01:44.080 | - It is, and especially since you're interested
00:01:46.240 | in how AI can be used in the real world,
00:01:49.200 | you know, practical applications, we'll focus on that.
00:01:51.720 | - Perfect.
00:01:52.560 | So for example, this new real-time API,
00:01:55.040 | they announced that, right?
00:01:56.560 | That seems like a big deal if we want AI to sound,
00:01:58.960 | well, less like a robot.
00:02:00.440 | - It could be huge.
00:02:01.800 | The real-time API could completely change
00:02:04.160 | how we like interact with AI.
00:02:06.360 | Like imagine if your voice assistant
00:02:07.840 | could actually handle it if you interrupted it.
00:02:09.520 | - Or like have an actual conversation.
00:02:11.360 | - Right, not just these clunky back and forth
00:02:13.440 | things we're used to.
00:02:14.720 | - And they actually showed it off, didn't they?
00:02:16.000 | I read something about a travel app, one for languages.
00:02:19.240 | Even one where the AI ordered takeout.
00:02:22.480 | Those demos were really interesting,
00:02:23.880 | and I think they show how this real-time API
00:02:26.640 | can be used in so many ways.
00:02:28.760 | And the tech behind it is fascinating, by the way.
00:02:31.600 | It uses persistent web socket connections
00:02:34.280 | and this thing called function calling,
00:02:36.520 | so it can respond in real time.
00:02:39.040 | - So the function calling thing,
00:02:40.200 | that sounds kind of complicated.
00:02:41.360 | Can you like explain how that works?
00:02:43.520 | - So imagine giving the AI access to this whole toolbox,
00:02:47.480 | right, information, capabilities, all sorts of things.
00:02:50.160 | So take the travel agent demo, for example.
00:02:52.080 | With function calling, the AI can pull up details,
00:02:55.080 | let's say about Fort Mason, right, from some database,
00:02:58.480 | like nearby restaurants, stuff like that.
00:03:00.080 | - Ah, I get it.
00:03:01.680 | So instead of being limited to what it already knows,
00:03:04.200 | it can go and find the information it needs
00:03:06.680 | like a human travel agent would.
00:03:08.320 | - Precisely.
00:03:09.360 | And someone on Hacker News pointed out a cool detail.
00:03:11.840 | The API actually gives you a text version
00:03:14.760 | of what's being said, so you can store that, analyze it.
00:03:18.040 | - That's smart.
00:03:19.280 | It seems like OpenAI put a lot of thought
00:03:21.360 | into making this API easy for developers to use.
00:03:24.880 | But while we're on OpenAI, you know, besides their tech,
00:03:28.320 | there's been some news about like internal changes too.
00:03:31.360 | Didn't they say they're moving away from being a nonprofit?
00:03:33.640 | - They did.
00:03:34.480 | And it's got everyone talking, it's a major shift,
00:03:36.800 | and it's only natural for people to wonder
00:03:39.040 | how that'll change things for OpenAI in the future.
00:03:41.600 | I mean, there are definitely some valid questions
00:03:43.240 | about this move to for-profit,
00:03:44.680 | like will they have more money for research now?
00:03:47.040 | Probably.
00:03:48.040 | But will they, you know, care as much
00:03:49.600 | about making sure AI benefits everyone?
00:03:52.080 | - Yeah, that's the big question,
00:03:53.280 | especially with all the like the leadership changes
00:03:55.680 | happening at OpenAI too, right?
00:03:58.000 | I read that their chief research officer left,
00:04:01.040 | and their VP of research, and even their CTO.
00:04:04.480 | - It's true.
00:04:05.320 | A lot of people are connecting those departures
00:04:07.440 | with the changes in OpenAI's structure.
00:04:09.480 | - And I guess it makes you wonder
00:04:10.600 | what's going on behind the scenes.
00:04:12.520 | But they are still putting out new stuff.
00:04:15.720 | Like this whole fine-tuning thing really caught my eye.
00:04:18.240 | - Right, fine-tuning.
00:04:19.160 | It's essentially taking a pre-trained AI model
00:04:22.760 | and like customizing it.
00:04:24.600 | - So instead of a general AI,
00:04:26.360 | you get one that's tailored for a specific job.
00:04:28.560 | - Exactly.
00:04:29.400 | And that opens up so many possibilities,
00:04:31.480 | especially for businesses.
00:04:32.920 | Imagine you could train an AI on your company's data,
00:04:36.120 | you know, like how you communicate your brand guidelines.
00:04:38.640 | - So it's like having an AI
00:04:40.040 | that's specifically trained for your company?
00:04:42.160 | - That's the idea.
00:04:43.000 | - And they're doing it with images now too, right?
00:04:45.080 | Fine-tuning with vision is what they called it.
00:04:47.640 | It's pretty incredible what they're doing with that,
00:04:49.280 | especially in fields like medicine.
00:04:51.040 | - Like using AI to help doctors make diagnoses.
00:04:53.880 | - Exactly.
00:04:54.720 | An AI could be trained on like thousands
00:04:57.640 | of medical images, right?
00:04:59.320 | And then it could potentially spot things
00:05:02.040 | that even a trained doctor might miss.
00:05:04.040 | - That's kind of scary, to be honest.
00:05:05.920 | What if it gets it wrong?
00:05:07.560 | - Well, the idea isn't to replace doctors,
00:05:09.880 | but to give them another tool,
00:05:11.800 | you know, help them make better decisions.
00:05:13.600 | - Okay, that makes sense.
00:05:14.840 | But training these AI models must be really expensive.
00:05:18.560 | - It can be.
00:05:19.400 | All those tokens add up.
00:05:20.960 | But OpenAI announced something called
00:05:22.800 | automatic prompt caching.
00:05:24.520 | - Automatic what now?
00:05:26.000 | I don't think I came across that.
00:05:27.640 | - So basically, if your AI sees a prompt
00:05:29.840 | that it's already seen before,
00:05:31.200 | OpenAI will give you a discount.
00:05:33.000 | - Huh.
00:05:34.240 | Like a frequent buyer program for AI.
00:05:36.520 | - Kind of, yeah.
00:05:37.800 | It's good that they're trying to make it more affordable.
00:05:39.920 | And they're also doing something called model distillation.
00:05:42.560 | - Okay, now you're just using big words
00:05:44.440 | to sound smart.
00:05:45.280 | What's that?
00:05:46.400 | - Think of it like a recipe, right?
00:05:49.040 | You can take a really complex recipe
00:05:50.600 | and break it down to the essential parts.
00:05:52.240 | - Make it simpler, but it still tastes the same.
00:05:54.800 | - Yeah.
00:05:55.640 | And that's what model distillation is.
00:05:56.920 | You take a big, powerful AI model
00:05:59.200 | and create a smaller, more efficient version.
00:06:01.480 | - So it's like lighter weight, but still just as capable.
00:06:04.760 | - Exactly.
00:06:05.600 | And that means more people can actually use
00:06:07.560 | these powerful tools.
00:06:09.320 | They don't need like a supercomputer to run them.
00:06:12.400 | - So they're making AI more accessible.
00:06:14.120 | That's great.
00:06:14.960 | - It is.
00:06:15.800 | And speaking of powerful tools,
00:06:17.680 | they also talked about their new O1 model.
00:06:19.880 | That's the one they've been hyping up.
00:06:21.200 | The one that's supposed to be this big leap forward.
00:06:23.520 | - Yeah, O1.
00:06:24.680 | It sounds pretty futuristic.
00:06:26.080 | Like from what I read,
00:06:27.240 | it's not just a bigger, better language model.
00:06:30.080 | - Right, it's a different porch.
00:06:31.200 | - They're saying it can like actually reason, right?
00:06:33.680 | Think differently.
00:06:34.960 | - It's trained differently.
00:06:36.160 | They used reinforcement learning with O1.
00:06:38.400 | - So it's not just finding patterns
00:06:40.240 | in the data it's seen before.
00:06:41.760 | - Not just that.
00:06:42.680 | It can actually learn from its mistakes,
00:06:45.360 | get better at solving problems.
00:06:47.960 | - So give me an example.
00:06:49.040 | What can O1 do that say GPT-4 can't?
00:06:52.800 | - Well, OpenAI showed it doing some pretty impressive stuff
00:06:55.400 | with math, like advanced math.
00:06:57.640 | - Yeah.
00:06:58.480 | - And coding too, complex coding.
00:06:59.920 | Things that even GPT-4 struggled with.
00:07:02.280 | - So you're saying if I needed to like write a screenplay,
00:07:04.520 | I'd stick with GPT-4.
00:07:05.720 | But if I wanted to solve some crazy physics problem,
00:07:08.400 | O1 is what I'd use.
00:07:09.880 | - Something like that, yeah.
00:07:11.440 | Although there is a trade-off.
00:07:13.360 | O1 takes a lot more power to run.
00:07:15.640 | And it takes longer to get those impressive results.
00:07:19.400 | - Hmm, makes sense.
00:07:20.840 | More power, more time, higher quality.
00:07:23.280 | - Exactly.
00:07:24.120 | - It sounds like it's still in development though, right?
00:07:25.720 | Is there anything else they're planning to add to it?
00:07:27.720 | - Oh yeah.
00:07:28.800 | They mentioned system prompts,
00:07:30.720 | which will let developers like set some ground rules
00:07:35.240 | for how it behaves.
00:07:36.640 | And they're working on adding structured outputs
00:07:38.840 | and function calling.
00:07:40.040 | - Wait, structured outputs.
00:07:41.560 | Didn't we just talk about that?
00:07:42.840 | - We did.
00:07:43.680 | That's the thing where the AI's output
00:07:46.400 | is formatted in a way that's easy to use, like JSON.
00:07:49.440 | - Right, right.
00:07:50.360 | So you don't have to spend all day
00:07:51.960 | trying to make sense of what it gives you.
00:07:53.520 | It's good that they're thinking about that stuff.
00:07:55.320 | - It's about making these tools usable.
00:07:57.880 | And speaking of that,
00:07:59.280 | Dev Day finished up with this really interesting talk.
00:08:01.880 | Sam Altman, the CEO of OpenAI,
00:08:04.760 | and Kevin Weil, their new chief product officer.
00:08:08.520 | They talked about like the big picture for AI.
00:08:11.080 | - Yeah, they did, didn't they?
00:08:12.280 | Anything interesting come up?
00:08:14.000 | - Well, Altman talked about moving past
00:08:15.800 | this whole AGI term, artificial general intelligence.
00:08:19.600 | - I can see why.
00:08:20.440 | It's kind of a loaded term, isn't it?
00:08:22.200 | - He thinks it's become a bit of a buzzword
00:08:24.440 | and people don't really understand what it means.
00:08:26.400 | - So are they saying
00:08:27.400 | they're not trying to build AGI anymore?
00:08:30.240 | - It's more like they're saying
00:08:31.560 | they're focused on just making AI better,
00:08:34.760 | constantly improving it,
00:08:36.160 | not worrying about putting it in a box.
00:08:37.960 | - That makes sense.
00:08:38.800 | Keep pushing the limits.
00:08:39.680 | - Exactly.
00:08:40.640 | But they were also very clear about doing it responsibly.
00:08:43.200 | They talked a lot about safety and ethics.
00:08:45.000 | - Yeah, that's important.
00:08:46.280 | - They said they were gonna be very careful
00:08:48.360 | about how they release new features.
00:08:50.120 | - Good, because this stuff is powerful.
00:08:53.160 | - It is.
00:08:54.000 | It was a lot to take in, this whole Dev Day event,
00:08:56.160 | new tools, big changes at OpenAI,
00:08:59.160 | and these big questions about the future of AI.
00:09:01.320 | - It was.
00:09:02.160 | But hopefully this deep dive
00:09:03.840 | helped make sense of some of it.
00:09:05.440 | At least that's what we try to do here.
00:09:07.240 | - Absolutely.
00:09:08.080 | - Thanks for taking the deep dive with us.
00:09:09.760 | - The biggest demo of the new real-time API
00:09:12.280 | involved function calling with voice mode
00:09:14.600 | and buying chocolate-covered strawberries
00:09:16.560 | from our friendly local OpenAI developer experience engineer
00:09:20.080 | and strawberry shop owner, Ilan Biggio.
00:09:23.080 | We'll first play you the audio of his demo
00:09:25.280 | and then go into a little interview with him.
00:09:27.360 | - Fantastic.
00:09:28.520 | Could you place a call
00:09:30.240 | and see if you could get us 400 strawberries
00:09:33.280 | delivered to the venue?
00:09:34.640 | So please keep that under $1,500.
00:09:37.160 | - I'm on it.
00:09:39.840 | We'll get those strawberries delivered for you.
00:09:42.200 | (phone ringing)
00:09:46.440 | - Hello.
00:09:49.560 | - Hi there.
00:09:52.200 | Is this Ilan?
00:09:53.040 | I'm Roman's AI assistant.
00:09:54.680 | How was about it?
00:09:55.880 | Fantastic.
00:09:56.960 | Could you tell me what flavors of strawberry dips
00:09:59.480 | you have available?
00:10:00.560 | - Yeah, we have chocolate, vanilla,
00:10:02.600 | and we have peanut butter.
00:10:04.920 | - Great.
00:10:06.400 | How much would 400 chocolate-covered strawberries cost?
00:10:10.520 | - 400?
00:10:11.360 | Are you sure you want 400?
00:10:12.560 | - Yes.
00:10:15.240 | 400 chocolate-covered strawberries, please.
00:10:17.680 | How much would that be?
00:10:19.080 | - I think that'll be around like $1,415 with 92 cents.
00:10:24.080 | - Awesome.
00:10:28.040 | Let's go ahead and place the order
00:10:29.280 | for 400 chocolate-covered strawberries.
00:10:31.640 | - Great.
00:10:33.560 | When can we get it delivered?
00:10:36.120 | - Please deliver them to the Gateway Pavilion
00:10:38.120 | in Fort Mason, and I'll be paying in cash.
00:10:40.800 | (audience laughing)
00:10:43.800 | - Okay, sweet.
00:10:45.720 | So just to confirm, you want 400 chocolate-covered
00:10:48.120 | strawberries to the Gateway Pavilion?
00:10:49.960 | - Yes, that's perfect.
00:10:54.040 | And when can we expect delivery?
00:10:55.840 | - Well, you guys are right nearby,
00:10:58.280 | so it'll be like, I don't know, 37 seconds?
00:11:00.920 | (audience laughing)
00:11:03.360 | - That's incredibly fast.
00:11:04.880 | (audience laughing)
00:11:07.720 | - Cool.
00:11:08.560 | You too?
00:11:09.400 | - Sure.
00:11:10.240 | (audience applauding)
00:11:11.640 | - Hi, Ilan.
00:11:12.480 | Welcome to Lanespace.
00:11:13.880 | - Thank you.
00:11:14.720 | - Just saw your amazing demos,
00:11:15.560 | had your amazing strawberries.
00:11:17.040 | You are dressed up like exactly like a strawberry salesman.
00:11:19.680 | - Gotta have a little.
00:11:21.520 | - What was the building on demo like?
00:11:23.160 | What was the story behind the demo?
00:11:24.360 | - It was really interesting.
00:11:25.200 | This is actually something I had been thinking about
00:11:27.600 | for months before the launch.
00:11:29.640 | Like having a AI that can make phone calls
00:11:33.640 | is something like I've personally wanted for a long time.
00:11:35.880 | And so as soon as we launched internally,
00:11:37.640 | like I started hacking on it.
00:11:39.800 | And then that sort of just made it into like
00:11:41.720 | an internal demo.
00:11:42.840 | And then people found it really interesting.
00:11:44.400 | And then we thought how cool would it be to have this
00:11:46.560 | like on stage as one of the demos.
00:11:49.560 | - Yeah.
00:11:50.600 | Would you call out any technical issues building?
00:11:54.000 | Like you were basically one of the first people ever
00:11:56.080 | to build with the voice mode API.
00:11:58.360 | Would you call any issues like integrating it
00:12:00.040 | with Twilio like that?
00:12:00.880 | Like you did with function calling,
00:12:02.120 | with like a form filling elements.
00:12:04.440 | I noticed that you had like intents of things to fulfill.
00:12:08.160 | And then when there was still missing info,
00:12:11.920 | the voice would prompt you role-playing the store guy.
00:12:16.280 | - Yeah, yeah.
00:12:17.120 | So I think technically there's like the whole,
00:12:19.920 | just working with audio and streams
00:12:21.960 | is a whole different piece.
00:12:23.440 | Like even separate from like AI and this like new capabilities
00:12:28.040 | it's just tough.
00:12:29.400 | Yeah, when you have a prompt,
00:12:30.560 | conversationally it'll just follow like the,
00:12:32.240 | it was set up like kind of step-by-step
00:12:34.160 | to like ask the right questions
00:12:36.320 | based on like what the request was, right?
00:12:40.760 | The function calling itself is sort of tangential to that.
00:12:43.880 | Like you have to prompt it to call the functions
00:12:45.880 | but then handling it isn't too much different
00:12:47.720 | from like what you would do with assistant streaming
00:12:50.200 | or like chat completion streaming.
00:12:52.120 | I think like the API feels very similar
00:12:54.040 | just like if everything in the API was streaming,
00:12:56.560 | it actually feels quite familiar to that.
00:12:58.160 | - And then function calling wise,
00:13:00.760 | I mean, does it work the same?
00:13:02.600 | I don't know.
00:13:03.440 | Like I saw a lot of logs.
00:13:04.560 | You guys showed like in the playground a lot of logs.
00:13:07.120 | What is in there?
00:13:08.040 | What should people know?
00:13:09.240 | - Yeah, I mean, it is like the events
00:13:12.200 | may have different names than the streaming events
00:13:15.000 | that we have in chat completions
00:13:16.200 | but they represent very similar things.
00:13:18.040 | It's things like, you know, function call started,
00:13:20.400 | argument started.
00:13:21.680 | It's like, here's like argument deltas
00:13:23.720 | and then like function call done.
00:13:26.280 | Conveniently, we send one with like has the full function
00:13:29.320 | and then I just use that.
00:13:30.640 | - Nice.
00:13:31.480 | - Yeah.
00:13:32.600 | - And then like what restrictions
00:13:35.920 | should people be aware of?
00:13:37.120 | Like, you know, I think before we recorded,
00:13:40.200 | we discussed a little bit about the sensitivities
00:13:42.560 | around basically calling random store owners
00:13:45.680 | and putting like an AI on them.
00:13:47.840 | - Yeah, so there's,
00:13:48.680 | I think there's recent regulation on that
00:13:51.280 | which is why we wanna be like very,
00:13:52.960 | I guess aware of, you know,
00:13:55.400 | you can't just call anybody with AI, right?
00:13:57.480 | That's like just robocalling.
00:13:58.720 | You wouldn't want someone just calling you with AI.
00:14:00.120 | - Yeah, yeah.
00:14:00.960 | So like, I'm a developer.
00:14:02.360 | I'm about to do this on random people.
00:14:04.080 | - Yeah.
00:14:04.920 | - What laws am I about to break?
00:14:07.080 | - I forget what the governing body is
00:14:10.960 | but you should, I think,
00:14:13.880 | having consent of the person you're about to call,
00:14:15.720 | it always works, right?
00:14:17.440 | I, as the strawberry owner,
00:14:18.920 | have consented to like getting called with AI.
00:14:21.200 | I think past that, you wanna be careful.
00:14:25.120 | Definitely individuals are more sensitive than businesses.
00:14:27.720 | I think businesses, you have a little bit more leeway.
00:14:29.960 | Also they're like,
00:14:30.800 | businesses I think have an incentive
00:14:32.160 | to want to receive AI phone calls,
00:14:34.600 | especially if like they're dealing with it.
00:14:36.000 | - It's doing business.
00:14:36.840 | - With AI phone calls, right?
00:14:37.680 | Like it's more business.
00:14:39.600 | It's kind of like getting on a booking platform, right?
00:14:42.080 | You're exposed to more.
00:14:43.480 | But I think it's still very much like a gray area.
00:14:46.520 | And so I think everybody should, you know,
00:14:51.080 | tread carefully, like figure out what it is.
00:14:52.600 | I, the law is so recent.
00:14:54.720 | I didn't have enough time to like,
00:14:56.320 | I'm also not a lawyer.
00:14:57.400 | - Yeah, yeah, yeah, of course.
00:14:58.560 | Okay, cool.
00:14:59.400 | Fair enough.
00:15:00.240 | One other thing, this is kind of agentic.
00:15:02.440 | Did you use a state machine at all?
00:15:04.200 | Did you use any framework?
00:15:06.200 | - No, no.
00:15:07.040 | - You just stick it in context
00:15:08.400 | and then just front end in a loop until it ends call?
00:15:11.800 | - Yeah, there isn't even a loop.
00:15:13.400 | Like because the API is just based on sessions,
00:15:18.000 | it's always just gonna keep going.
00:15:19.680 | Every time you speak, it'll trigger a call.
00:15:22.200 | And then after every function call,
00:15:23.720 | it was also invoking like a generation.
00:15:26.760 | And so that is another difference here.
00:15:28.880 | It's like, it's inherently almost like in a loop,
00:15:31.920 | just by being in a session, right?
00:15:34.320 | No state machines needed.
00:15:35.480 | I'd say this is very similar to like,
00:15:37.280 | the notion of routines,
00:15:38.480 | where it's just like a list of steps
00:15:40.760 | and it's like sticks to them softly,
00:15:42.720 | but usually pretty well.
00:15:44.120 | - And the steps is the prompts.
00:15:45.600 | - The steps, it's like the prompt,
00:15:48.120 | like the steps are in the prompts, right?
00:15:49.840 | It's like step one, do this,
00:15:50.680 | step one, step two, do that.
00:15:52.320 | - What if I wanna change the system prompts
00:15:54.000 | halfway through the conversation?
00:15:55.680 | - You can, you can.
00:15:56.840 | To be honest, I have not played without two too much,
00:15:59.120 | but I know you can.
00:16:00.880 | - Yeah, awesome.
00:16:02.200 | I noticed that you called it real-time API,
00:16:04.240 | but not voice API.
00:16:05.920 | So I assume that it's like real-time API
00:16:07.680 | starting with voice, right?
00:16:09.400 | Then that's what he said on the thing.
00:16:11.240 | I can't imagine, like what else is real-time?
00:16:13.640 | - Well, I guess to use ChatGPT's voice mode as an example,
00:16:17.600 | like we've demoed the video, right?
00:16:19.120 | Like real-time image, right?
00:16:21.200 | So I'm not actually sure what timelines are,
00:16:23.320 | but I would expect, if I had to guess,
00:16:25.080 | that like that is probably the next thing
00:16:26.800 | that we're gonna be making.
00:16:28.240 | You'd probably have to talk directly
00:16:29.760 | with a team building this.
00:16:30.840 | - Sure, you can't promise their timelines.
00:16:33.240 | - Yeah, yeah, yeah, right, exactly.
00:16:34.320 | But like, given that this is the features that currently,
00:16:36.800 | that exists, that we've demoed on ChatGPT, yeah.
00:16:40.480 | - There will never be a case
00:16:41.320 | where there's like a real-time text API, right?
00:16:43.800 | - Well, this is a real-time text API.
00:16:45.200 | You can do text only on this.
00:16:48.120 | - Oh. - Yeah.
00:16:49.520 | - I don't know why you would, but it's actually,
00:16:52.760 | so text-to-text here doesn't quite make a lot of sense.
00:16:56.400 | I don't think you'll get a lot of latency game,
00:16:59.080 | but like speech-to-text is really interesting
00:17:02.000 | because you can prevent,
00:17:04.080 | you can prevent responses like audio responses
00:17:06.960 | and force function calls.
00:17:08.920 | And so you can do stuff like UI control
00:17:11.360 | that is like super, super reliable.
00:17:12.920 | We had a lot of like, you know,
00:17:14.800 | unlike we weren't sure how well this was gonna work
00:17:17.620 | because it's like, you have a voice answering,
00:17:20.080 | it's like a whole persona, right?
00:17:21.360 | Like that's a little bit more, you know, risky.
00:17:23.920 | But if you like cut out the audio outputs
00:17:27.280 | and make it so it always has to output a function,
00:17:29.960 | like you can end up with pretty, pretty reliable,
00:17:32.700 | like commands, like a command architecture.
00:17:35.400 | - Yeah, actually that's how the,
00:17:36.440 | that's the way I want to interact
00:17:37.480 | with a lot of these things as well.
00:17:38.560 | Like one-sided voice, not, don't force two-sided on me.
00:17:40.880 | - Yeah, you want to hear the voice back.
00:17:41.960 | And like, sometimes it's like,
00:17:43.880 | yeah, I think having an output voice is great,
00:17:45.880 | but I feel like I don't always want to hear
00:17:48.420 | an output voice.
00:17:49.260 | I'd say usually I don't, but yeah, exactly.
00:17:51.060 | Being able to speak to it is super smooth.
00:17:52.740 | - Cool.
00:17:53.580 | Do you want to comment on any of the other stuff
00:17:54.980 | that you announced?
00:17:56.540 | From caching I noticed was like,
00:17:58.500 | I like the no code change part.
00:18:00.860 | I'm looking forward to the docs
00:18:02.060 | 'cause I'm sure there's a lot of details
00:18:03.620 | on like what you cache, how long you cache.
00:18:05.620 | - Yeah.
00:18:06.460 | - 'Cause I can probably cache this for like five minutes.
00:18:08.220 | I was like, okay,
00:18:09.220 | but what if I don't make a call every five minutes?
00:18:11.100 | - Yeah, to be super honest with you,
00:18:12.620 | I've been so caught up with the real-time MPI
00:18:14.940 | and making the demo that I haven't read up
00:18:17.000 | on the other launches too much.
00:18:18.700 | I mean, I'm aware of them,
00:18:19.540 | but I think I'm excited to see how all distillation works.
00:18:23.920 | That's something that we've been doing,
00:18:25.360 | like, I don't know,
00:18:26.200 | I've been like doing it between our models for a while
00:18:28.700 | and I've seen really good results.
00:18:29.800 | Like I've done back in the day,
00:18:31.480 | like from GPT-4 to GPT-3.5
00:18:33.840 | and got like pretty much the same level
00:18:37.480 | of like function calling with like hundreds of functions.
00:18:40.000 | - Yeah.
00:18:40.840 | - So that was super, super compelling.
00:18:42.600 | So I feel like easier distillation.
00:18:44.440 | That's something I'm really excited for.
00:18:45.760 | - I see.
00:18:46.600 | Is it a tool?
00:18:47.440 | So I saw evals.
00:18:48.560 | - Yeah.
00:18:49.400 | - Like what is the distillation product?
00:18:50.520 | It wasn't super clear to be honest.
00:18:51.800 | - I think I want to let that team,
00:18:54.600 | I want to let that team talk about it.
00:18:55.680 | - Okay, great.
00:18:56.520 | Well, I appreciate you jumping on and amazing demo.
00:18:59.000 | It was beautifully designed.
00:19:00.400 | I'm sure that was part of you and Roman and.
00:19:03.180 | - Yeah, I guess shout out to like the first people
00:19:06.000 | to like creators of Wanderlust originally
00:19:07.940 | were like Simon and Karolis.
00:19:09.960 | And then like I took it and built the voice component
00:19:13.240 | and the voice calling components.
00:19:15.320 | Yeah, so it's been a big team effort.
00:19:17.760 | And then like the entire PI team
00:19:19.160 | for like debugging everything as it's been going on.
00:19:21.400 | It's been so good.
00:19:22.240 | - Yeah, you are the first consumers on the UX team.
00:19:24.240 | - Yeah.
00:19:25.080 | - Yeah, I mean the classic role of what we do there.
00:19:26.680 | - Yeah.
00:19:27.520 | - Okay, yeah.
00:19:28.340 | Anything else?
00:19:29.180 | Any other call section?
00:19:30.000 | - No, enjoy Dev Day.
00:19:30.840 | - Thank you.
00:19:31.680 | - Yeah.
00:19:32.500 | - That's it.
00:19:33.340 | - The latent space crew then talked to Olivia Godmont,
00:19:35.640 | head of product for the open AI platform,
00:19:37.880 | who led the entire Dev Day keynote
00:19:40.340 | and introduced all the major new features
00:19:42.640 | and updates that we talked about today.
00:19:44.880 | Okay, so we are here with Olivia Godmont.
00:19:47.760 | - That's right.
00:19:48.880 | - I don't pronounce French.
00:19:49.800 | - That's fine, we're perfect.
00:19:51.980 | - And it was amazing to see your keynote today.
00:19:55.580 | What was the backstory of preparing something like this,
00:19:58.500 | preparing like Dev Day?
00:20:00.480 | - Essentially came from a couple of places.
00:20:02.480 | Number one, excellent reception from last year's Dev Day.
00:20:05.520 | Developers, startup founders, researchers
00:20:07.400 | want to spend more time with open AI
00:20:08.800 | and we want to spend more time with them as well.
00:20:10.600 | And so for us, it was a no-brainer, frankly,
00:20:13.360 | to do it again, like a nice conference.
00:20:15.800 | The second thing is going global.
00:20:18.360 | We've done a few events, like in Paris
00:20:20.120 | and a few other non-American countries.
00:20:23.120 | And so this year, we're doing SF, Singapore, and London
00:20:26.200 | to frankly just meet more developers.
00:20:28.420 | - Yeah, I'm very excited for the Singapore one.
00:20:30.160 | - Oh yeah.
00:20:31.000 | Will you be there?
00:20:32.240 | - I don't know, I don't know if I got an invite.
00:20:34.760 | (laughs)
00:20:35.800 | I can't just say it to you.
00:20:37.120 | Yeah, yeah.
00:20:37.960 | (laughs)
00:20:39.080 | - Yeah, and then there was some speculation
00:20:41.360 | around October 1st.
00:20:42.680 | - Yeah.
00:20:43.520 | - Is it because O1, October 1st?
00:20:46.080 | - That has nothing to do.
00:20:47.200 | I discovered the tweet yesterday
00:20:48.640 | where people are so creative.
00:20:50.240 | (laughs)
00:20:51.080 | No, O1, there was no connection to October 1st.
00:20:53.080 | But in hindsight, that would have been a pretty good meme,
00:20:56.200 | but yeah, no.
00:20:57.120 | - Okay.
00:20:58.400 | Yeah, and I think open AI's outreach to developers
00:21:02.080 | is something that I felt the whole in 2022
00:21:06.080 | when people were trying to build a chat GPT
00:21:09.160 | and there was no function calling,
00:21:10.280 | all that stuff that you talked about in the past.
00:21:12.600 | And that's why I started my own conference
00:21:14.560 | as like, here's our little developer conference thing.
00:21:17.720 | - Yeah.
00:21:18.640 | - But to see this opening at Dev Day now
00:21:20.240 | and to see so many developer-oriented products
00:21:22.960 | coming out of open AI, I think it's really encouraging.
00:21:24.760 | - Yeah, totally.
00:21:25.600 | It's a, that's what I said essentially,
00:21:27.680 | like developers are basically the people
00:21:31.840 | who make the best connection between the technology
00:21:34.800 | and the future, essentially.
00:21:36.560 | Like, essentially see a capability,
00:21:38.480 | see a low-level technology,
00:21:40.400 | and are like, hey, I see how that application
00:21:42.840 | or that use case that can be enabled.
00:21:44.880 | And so in the direction of enabling AGI
00:21:49.320 | like for all of humanity,
00:21:50.680 | it's a no-brainer for us technically to partner with devs.
00:21:53.280 | - Yeah.
00:21:54.120 | And most importantly, you almost never had waitlists,
00:21:57.440 | which compared to other releases, people usually have.
00:22:01.080 | What is the, you know, you had prompt caching,
00:22:04.360 | you had real-time voice API.
00:22:05.880 | We, you know, Sean did a long Twitter thread
00:22:07.760 | so people know the releases.
00:22:09.320 | - Yeah.
00:22:10.160 | - What is the thing that was like sneakily the hardest
00:22:12.120 | to actually get ready for Dev Day?
00:22:14.400 | Or like, what was the kind of like, you know,
00:22:16.840 | last 24 hours, anything that you didn't know
00:22:18.920 | was gonna work?
00:22:19.760 | - Yeah, they're all fairly, like I would say,
00:22:22.720 | involved, like features to ship.
00:22:24.960 | So the team has been working for a month, all of them.
00:22:29.240 | The one which I would say is the newest for open AI
00:22:31.920 | is the real-time API.
00:22:33.760 | For a couple of reasons.
00:22:34.920 | I mean, one, you know, it's a new modality.
00:22:37.560 | Second, like it's the first time
00:22:38.840 | that we have an actual like WebSocket-based API.
00:22:41.400 | And so I would say that's the one that required
00:22:44.040 | like the most work over the month
00:22:46.960 | to get right from a developer perspective
00:22:49.040 | and to also make sure that our existing safety mitigation
00:22:51.680 | that worked well with like real-time ODO in and ODO out.
00:22:54.480 | - Yeah.
00:22:56.320 | What design choices or what was like the sort of design,
00:22:59.280 | design choices that you want to highlight?
00:23:00.520 | Like, you know, like, I think for me,
00:23:03.000 | like WebSockets, you just receive a bunch of events.
00:23:05.480 | It's two-way.
00:23:06.400 | I obviously don't have a ton of experience.
00:23:09.880 | I think a lot of developers are going to have to embrace
00:23:12.600 | this real-time programming.
00:23:14.600 | Like, what are you designing for?
00:23:15.960 | Or like, what advice would you have
00:23:17.280 | for developers exploring this?
00:23:19.480 | - The core design hypothesis was essentially
00:23:23.680 | how do we enable like human-level latency?
00:23:25.880 | We did a bunch of tests.
00:23:28.280 | Like on average, like human beings,
00:23:29.880 | like, you know, it takes like something like 300 milliseconds
00:23:32.800 | to converse with each other.
00:23:34.120 | And so that was a design principle, essentially.
00:23:36.560 | Like working backward from that
00:23:37.920 | and, you know, making your technology work.
00:23:39.760 | And so we evaluated a few options
00:23:41.120 | and WebSockets was the one that we landed on.
00:23:43.360 | So that was like one design choice.
00:23:45.360 | A few other like big design choices
00:23:46.920 | that we had to make, prompt caching.
00:23:48.720 | Prompt caching, the design like target
00:23:51.800 | was automated from the get-go,
00:23:53.440 | like zero code change from the developer.
00:23:56.040 | That way you don't have to learn like
00:23:58.240 | what is a prompt prefix and, you know,
00:24:00.080 | how long does a cache work?
00:24:01.160 | Like we just do it as much as we can, essentially.
00:24:03.680 | So that was a big design choice as well.
00:24:06.680 | And then finally on distillation, like an evaluation,
00:24:09.880 | the big design choice was something I learned as hype,
00:24:12.720 | like in my previous job.
00:24:14.000 | Like a philosophy around like a pit of success.
00:24:17.000 | Like what is essentially the minimum number of steps
00:24:21.640 | for the majority of developers to do the right thing?
00:24:24.560 | 'Cause when you do evals on fat tuning,
00:24:26.240 | there are many, many ways like to mess it up, frankly,
00:24:28.240 | like, you know, and have like a crappy model,
00:24:29.760 | like evals that tell like a wrong story.
00:24:32.120 | And so our whole design was, okay,
00:24:34.400 | we actually care about like helping people
00:24:36.360 | who don't have like that much experience,
00:24:37.720 | like evaluating model,
00:24:38.960 | like get like in a few minutes, like to a good spot.
00:24:41.360 | And so how do we essentially enable that pit of success,
00:24:43.480 | like in the product flow?
00:24:45.240 | - Yeah.
00:24:46.160 | Yeah, I'm a little bit scared to fine tune,
00:24:50.160 | especially for vision,
00:24:52.240 | because I don't know what I don't know
00:24:55.920 | for stuff like vision, right?
00:24:57.760 | Like for text, I can evaluate pretty easily.
00:25:00.840 | For vision, let's say I'm like trying to,
00:25:03.560 | one of your examples was Grab,
00:25:05.440 | which very close to home, I'm from Singapore.
00:25:07.680 | I think your example was like,
00:25:10.040 | they identified stop signs better.
00:25:12.440 | Why is that hard?
00:25:14.400 | Why do I have to fine tune that?
00:25:16.520 | If I fine tune that, do I lose other things?
00:25:18.720 | You know, like there's a lot of unknowns with vision
00:25:21.320 | that I think developers have to figure out.
00:25:23.560 | - For sure.
00:25:24.400 | Vision is going to open up like a new,
00:25:27.600 | I would say evaluation space.
00:25:29.440 | 'Cause you're right, like it's harder,
00:25:30.480 | like to tell correct from incorrect essentially with images.
00:25:34.640 | What I can say is we've been alpha testing
00:25:36.520 | like the vision fine tuning
00:25:37.560 | like for several weeks at that point,
00:25:39.360 | we are seeing like even higher performance uplift
00:25:42.040 | compared to text fine tuning.
00:25:43.520 | So that's, there is something here,
00:25:46.240 | like we've been pretty impressed,
00:25:47.360 | like in a good way, frankly,
00:25:48.200 | but you know how well it works.
00:25:49.880 | But for sure, like, you know,
00:25:50.760 | I expect the developers who are moving from one modality
00:25:52.800 | to like text and images will have like more testing,
00:25:55.960 | evaluation, like, you know, to set in place,
00:25:57.800 | like to make sure it works well.
00:25:59.400 | - The model distillation and evals
00:26:03.760 | is definitely like the most interesting,
00:26:05.240 | moving away from just being a model provider
00:26:07.160 | to being a platform provider.
00:26:08.880 | How should people think about being the source of truth?
00:26:11.560 | Like do you want OpenAI to be like the system of record
00:26:13.840 | of all the prompting?
00:26:14.800 | Because people sometimes store it
00:26:16.240 | in like different data sources.
00:26:17.920 | And then is that going to be the same as the models evolve?
00:26:21.320 | So you don't have to worry about, you know,
00:26:22.960 | refactoring the data, like things like that,
00:26:25.160 | like future model structures.
00:26:27.320 | - The vision is, if you want to be a source of truth,
00:26:30.520 | you have to earn it, right?
00:26:31.920 | Like we're not going to force people like to pass us data
00:26:34.280 | if there is no value prop, like, you know,
00:26:35.600 | for us to store the data.
00:26:36.840 | The vision here is at the moment,
00:26:40.080 | like most developers like use like a one size fits all model
00:26:43.560 | like be off the shelf, like GP40 essentially.
00:26:46.280 | The vision we have is fast forward a couple of years.
00:26:49.760 | I think like most developers will essentially
00:26:51.640 | like have an automated, continuous, fine-tuned model.
00:26:56.640 | The more like you use the model,
00:26:59.160 | the more data you pass to the model provider,
00:27:01.360 | like the model is automatically like fine-tuned,
00:27:03.520 | evaluated to get some eval sets.
00:27:05.560 | And essentially like you don't have to every month
00:27:07.840 | when there is a new snapshot, like, you know,
00:27:09.360 | to go online and you know, try a few new things.
00:27:11.880 | That's the direction.
00:27:13.160 | We are pretty far away from it.
00:27:14.720 | But I think like that evaluation and decision product
00:27:18.160 | is essentially a first good step in that direction.
00:27:20.480 | It's like, hey, if you are excited by the direction
00:27:23.760 | and you give us like evaluation data,
00:27:25.520 | we can actually log your completion data
00:27:27.200 | and start to do some automation on behalf.
00:27:29.560 | - And then you can do evals for free
00:27:32.800 | if you share data with OpenAI.
00:27:35.320 | How should people think about when it's worth it,
00:27:37.760 | when it's not?
00:27:38.600 | Sometimes people get overly protective of their data
00:27:40.680 | when it's actually not that useful.
00:27:42.320 | But how should developers think about
00:27:44.440 | when it's right to do it, when not,
00:27:45.760 | or if you have any thoughts on it?
00:27:47.440 | - The default policy is still the same.
00:27:48.720 | Like, you know, we don't trade on like any API data
00:27:50.560 | unless you opt in.
00:27:51.480 | What we've seen from feedback is evaluation
00:27:55.000 | can be expensive.
00:27:56.120 | Like if you're on like all one evals
00:27:57.720 | on like thousands of samples,
00:27:59.160 | like your bill will get increased,
00:28:00.600 | like, you know, pretty significantly.
00:28:03.200 | That's problem statement number one.
00:28:05.000 | Problem statement number two is essentially
00:28:06.920 | I want to get to a world where
00:28:08.600 | whatever OpenAI ships a new model snapshot,
00:28:12.400 | we have full confidence that there is no regression
00:28:14.680 | for the task that developer care about.
00:28:16.520 | And for that to be the case,
00:28:17.600 | essentially we need to get evals.
00:28:19.440 | And so that essentially is a sort of a two birds, one stone.
00:28:22.400 | It's like we subsidize basically the evals
00:28:24.960 | and we also use the evals when we ship like new models
00:28:27.440 | to make sure that, you know,
00:28:28.320 | we keep going in the right direction.
00:28:29.920 | So in my sense, it's a win-win,
00:28:31.640 | but again, like completely opt-in.
00:28:32.960 | I expect that, you know,
00:28:33.800 | many developers will not want to share their data
00:28:35.440 | and that's perfectly fine to me.
00:28:37.200 | - Yeah, I think free evals though,
00:28:39.120 | very, very good incentive.
00:28:40.920 | - Yeah.
00:28:41.760 | - I mean, it's a fair trade.
00:28:42.680 | Like you get data, we get free evals.
00:28:44.480 | - Exactly, and we sanitize PII, everything, you know,
00:28:47.320 | like we have no interest in like the actual like sensitive
00:28:49.600 | data, we just want to have good evaluation
00:28:52.640 | on the real use cases.
00:28:54.240 | - Like I always want to eval the eval.
00:28:56.880 | I don't know if that ever came up.
00:28:58.480 | Like sometimes the evals themselves are wrong.
00:29:00.920 | And then there's no way for me to tell you.
00:29:03.880 | - Everyone who is starting with LLM,
00:29:05.280 | tinkering with LLM is like, yeah, evaluation, easy.
00:29:07.480 | You know, I've done testing like all my life.
00:29:09.560 | And then you start to actually build the virus,
00:29:11.360 | understand like all the corner cases
00:29:13.240 | and you realize, wow, that's like a whole field itself.
00:29:16.640 | So yeah, good evaluation is hard.
00:29:18.480 | And so, yeah.
00:29:19.480 | - Yeah, but I think there's a, you know,
00:29:22.200 | I just talked to Braintrust,
00:29:23.880 | which I think is one of your partners.
00:29:26.840 | They also emphasize code-based evals
00:29:29.160 | versus your sort of low-code.
00:29:31.120 | What I see is like, I don't know,
00:29:33.000 | maybe there's some more that you didn't demo,
00:29:34.640 | but what I see is like kind of like a low-code experience,
00:29:36.600 | right, for evals.
00:29:37.880 | Would you ever support like a more code-based,
00:29:39.720 | like would I run code on OpenAI's eval platform?
00:29:44.000 | - For sure.
00:29:44.840 | I mean, we meet developers where they are.
00:29:46.120 | You know, at the moment,
00:29:46.960 | the demand was more for like, you know,
00:29:48.440 | easy to get started, like eval.
00:29:50.760 | But you know, if we need to expose
00:29:52.080 | like an evaluation API, for instance,
00:29:53.600 | for people like, you know, to pass like,
00:29:54.920 | you know, their existing test data, we'll do it.
00:29:58.240 | So yeah, there is no, you know, philosophical,
00:30:00.680 | I would say like, you know, misalignment on that.
00:30:02.280 | - Yeah, yeah, yeah.
00:30:03.120 | What I think this is becoming, by the way,
00:30:04.960 | and I don't like, it's basically like you're becoming AWS,
00:30:09.960 | like the AI cloud.
00:30:12.120 | And I don't know if like, that's a conscious strategy
00:30:15.000 | or it's like, it doesn't even have to be a conscious
00:30:17.040 | strategy, like you're going to offer storage,
00:30:18.800 | you're going to offer compute,
00:30:19.840 | you're going to offer networking.
00:30:22.240 | I don't know what networking looks like.
00:30:24.040 | Networking is maybe like caching,
00:30:26.280 | or like it's a CDN, it's a prompt CDN.
00:30:30.800 | But it's the AI versions of everything, right?
00:30:33.360 | Like, do you see the analogies or?
00:30:35.200 | - Yeah, totally.
00:30:36.040 | Whenever, whenever I talk to developers,
00:30:40.320 | I feel like good models are just half of the story
00:30:43.720 | to build a good app.
00:30:45.200 | There's a ton more you need to do.
00:30:46.800 | Evaluation is a perfect example.
00:30:48.200 | Like, you know, you can have the best model in the world
00:30:49.720 | if you're in the dark, like, you know,
00:30:50.640 | it's really hard to gain the confidence.
00:30:52.960 | And so our philosophy is,
00:30:55.040 | that's number one.
00:30:59.440 | Number two, like the whole like software development stack
00:31:01.600 | is being basically reinvented, you know, with LLMs.
00:31:04.600 | There is no freaking way that open AI can build everything.
00:31:08.120 | Like, there is just too much to build, frankly.
00:31:10.080 | And so my philosophy is,
00:31:12.240 | essentially we'll focus on like the tools,
00:31:14.200 | which are like the closest to the model itself.
00:31:17.200 | So that's why you see us, like, you know,
00:31:18.680 | investing quite a bit in like fine tuning, distillation,
00:31:21.120 | now evaluation.
00:31:22.360 | 'Cause we think that it actually makes sense
00:31:23.960 | to have like in one spot, like, you know, all of that.
00:31:26.800 | Like there is some sort of virtual circle,
00:31:28.360 | essentially, that you can set in place.
00:31:30.040 | But stuff like, you know, LLM apps,
00:31:32.280 | like tools which are like further away from the model,
00:31:35.280 | I don't know.
00:31:36.120 | If you want to do like, you know, super elaborate,
00:31:38.720 | like home management or, you know, like tooling,
00:31:41.440 | like I'm not sure, like, you know, open AI
00:31:42.760 | has like such a big edge, frankly, like, you know,
00:31:44.440 | to build that sort of tools.
00:31:45.760 | So that's how we view it at the moment.
00:31:48.040 | But again, frankly, like the philosophy is like super simple.
00:31:50.280 | The strategy is super simple.
00:31:52.000 | It's meeting developers where they want us to be.
00:31:54.080 | And so, you know, that's frankly like, you know,
00:31:56.840 | day in, day out, like, you know, what I try to do.
00:31:59.920 | Cool.
00:32:00.920 | Thank you so much for the time.
00:32:01.920 | I'm sure you...
00:32:02.840 | - Yeah, I have more questions on...
00:32:04.320 | - Yeah, yeah, yeah.
00:32:05.160 | - A couple of questions on voice
00:32:06.000 | and then also like your call to action,
00:32:07.480 | like what you want feedback on, right?
00:32:08.880 | So, but I think we should spend a bit more time on voice
00:32:11.440 | 'cause I feel like that's like the big splash thing.
00:32:16.000 | I talked to, well, I mean, just like,
00:32:20.360 | what is the future of real time for open AI?
00:32:22.280 | - Yeah.
00:32:23.280 | - 'Cause I think obviously video is next.
00:32:24.760 | You already have it in the ChatGBT desktop app.
00:32:28.640 | Do we just like have a permanent life?
00:32:31.640 | Like, you know, like,
00:32:32.480 | are developers just gonna be like sending sockets
00:32:35.000 | back and forth with open AI?
00:32:36.120 | Like how do we program for that?
00:32:37.680 | Like what is the future?
00:32:39.760 | - Yeah, that makes sense.
00:32:41.080 | I think with multi-modality,
00:32:42.320 | like real time is quickly becoming like, you know,
00:32:44.760 | essentially the right experience,
00:32:46.320 | like to build an application.
00:32:48.480 | So my expectation is that we'll see like a non-trivial,
00:32:51.240 | like volume of applications,
00:32:53.760 | like moving to a real time API.
00:32:56.800 | Like if you zoom out, like audio is a really simple,
00:32:59.120 | like audio until basically now,
00:33:03.680 | audio on the web in apps
00:33:07.200 | was basically very much like a second class citizen.
00:33:11.000 | Like you basically did like an audio chatbot
00:33:12.960 | for users who did not have a choice.
00:33:15.240 | You know, they were like struggling to read,
00:33:16.960 | or I don't know,
00:33:17.800 | they were like not super educated with technology.
00:33:19.880 | And so frankly, it was like the crappy option,
00:33:21.840 | you know, compared to text.
00:33:23.400 | But when you talk to people in the real world,
00:33:26.720 | the vast majority of people like prefer to talk
00:33:29.360 | and listen instead of typing and writing.
00:33:32.840 | - We speak before we write.
00:33:33.840 | - Exactly.
00:33:34.920 | I don't know.
00:33:35.760 | I mean, I'm sure it's the case for you in Singapore.
00:33:37.000 | For me, my friends in Europe,
00:33:38.000 | the number of like WhatsApp, like voice notes
00:33:39.800 | that you receive every day.
00:33:41.360 | I mean, just people, it makes sense frankly, like, you know.
00:33:43.200 | - Chinese.
00:33:44.040 | - Chinese, yeah.
00:33:45.040 | - Yeah, all voice.
00:33:46.120 | - You know, it's easier.
00:33:47.120 | There is more emotions.
00:33:48.240 | I mean, you know, you get the point across like pretty well.
00:33:51.400 | And so my personal ambition for like the real-time API
00:33:54.880 | and like audio in general is to make like audio
00:33:57.320 | and like multimodality like truly a first-class experience.
00:34:00.200 | Like, you know, if you're like, you know,
00:34:01.520 | the amazing, like super bold, like startup out of YC,
00:34:04.520 | you're going to be like the next like billion,
00:34:06.040 | like, you know, user application
00:34:07.560 | to make it like truly audio first
00:34:08.920 | and make it feel like, you know, an actual good,
00:34:10.720 | like, you know, perfect experience.
00:34:12.200 | So that's essentially the ambition.
00:34:13.880 | And I think like, yeah, it could be pretty big.
00:34:15.680 | - Yeah.
00:34:16.520 | I think one issue that people have with the voice so far
00:34:19.760 | as release in advanced voice mode is the refusals.
00:34:23.040 | - Yeah.
00:34:24.280 | - You guys had a very inspiring model spec.
00:34:27.200 | I think Joanne worked on that,
00:34:29.680 | where you said like, yeah,
00:34:31.160 | we don't want to overly refuse all the time.
00:34:33.880 | In fact, like even if like not safe for work,
00:34:35.760 | like in some occasions, it's okay.
00:34:37.960 | - Yeah.
00:34:39.440 | - How is there an API that we can say,
00:34:41.680 | not safe for work, okay.
00:34:43.360 | - I think we'll get there.
00:34:44.360 | I think we'll get there.
00:34:45.680 | The model spec like nailed it.
00:34:47.160 | Like, you know.
00:34:48.000 | - It nailed it, it's so good.
00:34:49.560 | - Yeah.
00:34:50.400 | We're not in the business of like policing, you know,
00:34:52.400 | if you can say like vulgar words or whatever.
00:34:54.600 | You know, there are some use cases like, you know,
00:34:56.080 | I'm writing like a Hollywood script.
00:34:58.040 | I want to say like vulgar words, it's perfectly fine.
00:35:00.240 | You know?
00:35:01.080 | And so I think the direction where we'll go here
00:35:02.920 | is that basically there will always be like, you know,
00:35:05.520 | a set of behavior that will, you know,
00:35:08.000 | just like forbid frankly,
00:35:09.200 | because they're illegal against our terms of services.
00:35:12.320 | But then there will be like, you know,
00:35:13.360 | some more like risky like themes,
00:35:14.800 | which are completely legal, like, you know,
00:35:16.960 | vulgar words or, you know, not safe for work stuff.
00:35:19.480 | Um, where basically we'll expose like a controllable,
00:35:22.920 | like safety, like knobs in the API to basically allow you
00:35:26.040 | to say, hey, that theme, okay.
00:35:27.680 | That theme not okay.
00:35:28.960 | How sensitive do you want the threshold to be
00:35:30.920 | on safety refusals?
00:35:32.280 | I think that's the direction.
00:35:33.240 | - So a safety API.
00:35:34.440 | - Yeah. In a way, yeah.
00:35:35.880 | - Yeah. We have, we've never had that.
00:35:37.280 | - Yeah.
00:35:38.120 | - Because right now it's you, it's whatever you decide.
00:35:39.640 | And then it's, that's it.
00:35:40.800 | That, that, that would be the main reason
00:35:41.960 | I don't use opening a voice.
00:35:43.520 | - Yeah.
00:35:44.360 | - It's because of.
00:35:45.200 | - It's overpoliced at the moment.
00:35:46.040 | - Over refusal, over refusals.
00:35:46.880 | - Yeah, yeah, yeah.
00:35:47.840 | - No, we got to fix that.
00:35:48.680 | - Yeah.
00:35:49.520 | - Like singing.
00:35:50.340 | - Like singing.
00:35:51.180 | - We're trying to do voice-carrying.
00:35:52.020 | - So every singer, and you've been locked off singing.
00:35:53.840 | - Yeah, yeah, yeah.
00:35:54.960 | - But I understand music gets you in trouble.
00:35:56.560 | (laughing)
00:35:58.560 | Okay. Yeah.
00:35:59.400 | So, and then just generally like,
00:36:00.360 | what do you want to hear from developers?
00:36:01.520 | Like we have all developers watching, you know,
00:36:04.960 | what feedback do you want, anything specific as well,
00:36:08.840 | like from, especially from today,
00:36:11.520 | anything that you are unsure about that you're like,
00:36:14.560 | our feedback could really help you decide.
00:36:16.200 | - For sure.
00:36:17.480 | - I think, essentially it's becoming pretty clear
00:36:21.200 | after today that, you know,
00:36:23.040 | I would say the open-ended actions become pretty clear,
00:36:24.920 | like, you know, after today.
00:36:26.400 | Investment in reasoning, investment in multimodality,
00:36:29.800 | investment as well, like in, I would say,
00:36:31.200 | tool use, like function calling.
00:36:32.800 | To me, the biggest question I have is, you know,
00:36:37.080 | where should we put the cursor next?
00:36:39.200 | I think we need all three of them, frankly, like, you know,
00:36:40.760 | so we'll keep pushing.
00:36:42.000 | - Hire 10,000 people, or actually no need,
00:36:44.240 | build a bunch of bots.
00:36:45.360 | (laughing)
00:36:46.400 | And so, let's take O1, for instance,
00:36:49.960 | like is O1 smart enough, like for your problems?
00:36:52.560 | Like, you know, let's set aside for a second
00:36:54.280 | the existing models, like for the apps
00:36:55.840 | that you would love to build,
00:36:57.640 | is O1 basically it in reasoning,
00:36:59.280 | or do we still have, like, you know, a step to do?
00:37:00.560 | - Preview is not enough, I need the full one.
00:37:02.840 | - Yeah, so that's exactly that sort of feedback.
00:37:05.720 | Essentially, what I would love to do is for developers,
00:37:08.120 | I mean, there's a thing that Sam,
00:37:09.000 | which has been saying, like, over and over again,
00:37:10.440 | like, you know, it's easier said than done,
00:37:12.000 | but I think it's directionally correct.
00:37:14.320 | As a developer, as a founder,
00:37:16.520 | you basically want to build an app
00:37:18.520 | which is a bit too difficult for the model today, right?
00:37:22.720 | Like, what you think is right,
00:37:24.560 | it's like sort of working, sometimes not working,
00:37:27.840 | and that way, you know,
00:37:28.680 | that basically gives us, like, a goalpost,
00:37:30.520 | and be like, okay, that's what you need to enable
00:37:32.760 | with the next model release, like in a few months.
00:37:35.680 | And so, I would say that usually,
00:37:37.760 | like, that's the sort of feedback
00:37:38.720 | which is, like, the most useful,
00:37:40.160 | that I can, like, directly, like, you know, incorporate.
00:37:43.520 | - Awesome.
00:37:44.520 | I think that's our time.
00:37:46.320 | - Thank you so much, guys.
00:37:47.160 | - Yeah, thank you so much. - Yeah, thank you.
00:37:49.360 | - We were particularly impressed
00:37:51.160 | that Olivier addressed the not safe
00:37:53.120 | for work moderation policy question head on,
00:37:55.960 | as that had only previously been picked up on
00:37:58.400 | in Reddit forums.
00:37:59.840 | This is an encouraging sign that we will return to
00:38:02.600 | in the closing candor with Sam Altman
00:38:04.840 | at the end of this episode.
00:38:06.800 | Next, a chat with Roman Hewitt, friend of the pod,
00:38:10.560 | AI Engineer World's fair-closing keynote speaker,
00:38:13.520 | and head of developer experience at OpenAI
00:38:16.680 | on his incredible live demos,
00:38:18.800 | and advice to AI engineers on all the new modalities.
00:38:22.360 | - All right, we're live from OpenAI Dev Day.
00:38:24.280 | We're with Roman, who just did two great demos on stage,
00:38:27.240 | and he's been a friend of Later Space,
00:38:28.680 | so thanks for taking some of the time.
00:38:30.000 | - Of course, yeah, thank you for being here
00:38:31.400 | and spending the time with us today.
00:38:33.080 | - Yeah, appreciate.
00:38:33.920 | Appreciate you guys putting this on.
00:38:35.360 | I know it's, like, extra work,
00:38:37.480 | but it really shows the developers
00:38:38.960 | that you care about reaching out.
00:38:41.480 | - Yeah, of course.
00:38:42.320 | I think when you go back to the OpenAI mission,
00:38:43.960 | I think for us, it's super important
00:38:45.400 | that we have the developers involved in everything we do,
00:38:48.680 | making sure that, you know, they have all of the tools
00:38:51.240 | they need to build successful apps.
00:38:52.960 | And we really believe that the developers
00:38:54.800 | are always gonna invent the ideas, the prototypes,
00:38:57.800 | the fun factors of AI that we can't build ourselves.
00:39:00.280 | So it's really cool to have everyone here.
00:39:02.000 | - We had Michelle from you guys on,
00:39:05.080 | and she very-- - Great episode.
00:39:06.400 | - Thank you.
00:39:07.240 | She very seriously said, "API is the path to AGI."
00:39:09.520 | - Correct.
00:39:10.360 | - And people in our YouTube comments were like,
00:39:12.440 | "API is not AGI."
00:39:13.480 | I'm like, no, like, she's very serious.
00:39:15.480 | API is the path to AGI because, like,
00:39:17.440 | you're not gonna build everything
00:39:18.320 | like the developers are, right?
00:39:19.280 | - Of course, yeah, that's the whole value
00:39:21.240 | of having a platform and an ecosystem of amazing builders
00:39:23.880 | who can, like, in turn, create all of these apps.
00:39:27.440 | I'm sure we talked about this before,
00:39:28.840 | but there's now more than 3 million developers
00:39:30.600 | building on OpenAI, so it's pretty exciting
00:39:32.360 | to see all of that energy into creating new things.
00:39:35.680 | - Yeah.
00:39:37.160 | - I was gonna say, you built two apps on stage today,
00:39:39.960 | an international space station tracker,
00:39:41.920 | and then a drone command.
00:39:43.640 | The hardest thing must have been opening Xcode
00:39:45.920 | and setting that up.
00:39:46.760 | Now, like, the models are so good
00:39:48.000 | that they can do everything else.
00:39:49.240 | - Yes.
00:39:50.080 | - You had two modes of interaction.
00:39:50.960 | You had kind of like a GPT app to get the plan with a one,
00:39:54.680 | and then you had cursor to do apply some of the changes.
00:39:58.120 | - Correct.
00:39:58.960 | - How should people think about the best way
00:40:00.280 | to consume the coding models, especially,
00:40:02.680 | both for, you know, brand new projects
00:40:05.000 | and then existing projects
00:40:06.400 | that they're trying to modify instead?
00:40:07.960 | - Yeah, I mean, one of the things that's really cool
00:40:10.120 | about O1 Preview and O1 Mini being available in the API
00:40:14.320 | is that you can use it in your favorite tools,
00:40:16.040 | like cursor, like I did, right?
00:40:17.440 | And that's also what, like, Devin from Cognition
00:40:19.920 | can use in their own software engineering agents.
00:40:22.800 | In the case of Xcode, like,
00:40:24.440 | it's not quite deeply integrated in Xcode,
00:40:26.160 | so that's why I had, like, chat GPT side-by-side.
00:40:28.520 | - Copy-paste.
00:40:29.760 | - But it's cool, right, because I could instruct O1 Preview
00:40:32.800 | to be, like, my coding partner
00:40:35.160 | and brainstorming partner for this app,
00:40:37.280 | but also consolidate all of the files
00:40:39.800 | and architect the app the way I wanted.
00:40:41.400 | So all I had to do was just, like,
00:40:42.960 | port the code over to Xcode,
00:40:44.400 | and zero shot, the app built.
00:40:47.640 | I don't think I conveyed, by the way,
00:40:49.040 | how big a deal that is,
00:40:50.200 | but, like, you can now create an iPhone app from scratch,
00:40:54.000 | describing a lot of intricate details that you want,
00:40:57.600 | and your vision comes to life in, like, a minute.
00:41:00.280 | - Yeah.
00:41:01.120 | - It's pretty outstanding.
00:41:01.960 | - I have to admit, I was a bit skeptical,
00:41:03.760 | because if I open up Xcode,
00:41:05.440 | I don't know anything about iOS programming.
00:41:07.960 | You know which file to paste it in.
00:41:09.840 | You probably set it up a little bit.
00:41:11.320 | So I'm like, I have to go home and test it
00:41:13.400 | to, like, figure it out.
00:41:14.240 | And I need the chat GPT desktop app
00:41:16.040 | so that it can tell me where to click.
00:41:17.960 | - Yeah, I mean, like, Xcode and iOS development
00:41:21.720 | has become easier over the years
00:41:23.560 | since they introduced Swift and SwiftUI.
00:41:25.880 | I think back in the days of Objective-C,
00:41:28.560 | or, like, you know, the storyboard,
00:41:30.360 | it was a bit harder to get in for someone new,
00:41:32.840 | but now with Swift and SwiftUI,
00:41:34.400 | their dev tools are really exceptional.
00:41:36.480 | But now, when you combine that with O1,
00:41:38.920 | as your brainstorming and coding partner,
00:41:41.040 | it's like your architect, effectively.
00:41:42.960 | That's the best way, I think, to describe O1.
00:41:44.920 | People ask me, like, "Well, can GPT-4 do some of that?"
00:41:48.080 | And it certainly can.
00:41:49.320 | But I think it will just start spitting out code, right?
00:41:52.520 | And I think what's great about O1
00:41:54.320 | is that it can, like, make up a plan.
00:41:56.000 | In this case, for instance,
00:41:57.120 | the iOS app had to fetch data from an API.
00:41:59.280 | It had to look at the docs.
00:42:00.320 | It had to look at, like, how do I parse this JSON?
00:42:02.840 | Where do I store this thing?
00:42:04.920 | And kind of wire things up together.
00:42:06.480 | So that's where it really shines.
00:42:08.280 | - Is Mini or Preview the better model
00:42:10.480 | that people should be using?
00:42:11.600 | Like, how?
00:42:13.560 | - I think people should try both.
00:42:15.320 | We are obviously very excited about the upcoming O1
00:42:18.080 | that we shared the evals for.
00:42:20.080 | But we noticed that O1 Mini is very, very good
00:42:22.520 | at everything math, coding, everything STEM.
00:42:26.120 | If you need for your kind of brainstorming
00:42:29.240 | or your kind of science part,
00:42:31.520 | you need some broader knowledge
00:42:33.400 | than reaching for O1 Preview is better.
00:42:36.080 | But, yeah, I used O1 Mini for my second demo,
00:42:38.760 | and it worked perfectly.
00:42:40.640 | All I needed was very much, like, something rooted in code,
00:42:43.760 | architecting and wiring up, like, a front-end, a back-end,
00:42:46.720 | some UDP packets, some WebSockets,
00:42:48.440 | something very specific, and it did that perfectly.
00:42:51.440 | - And then maybe just talking about Voice and Wanderlust,
00:42:53.800 | the app that keeps on giving.
00:42:55.200 | - It does, indeed, yeah.
00:42:57.360 | - What's the backstory behind, like, preparing for all that?
00:43:00.480 | - You know, it's funny, 'cause when last year for Dev Day,
00:43:03.000 | we were trying to think about what could be a great demo app
00:43:06.640 | to show, like, an assistive experience.
00:43:09.200 | I've always thought travel is a kind of a great use case,
00:43:12.760 | 'cause you have, like, pictures, you have locations,
00:43:15.520 | you have the need for translations, potentially.
00:43:18.200 | There's, like, so many use cases that are bounded to travel
00:43:21.760 | that I thought last year, let's use a travel app.
00:43:23.800 | And that's how Wanderlust came to be.
00:43:26.000 | But of course, a year ago,
00:43:27.240 | all we had was a text-based assistant.
00:43:29.640 | And now we thought, well, if there's a voice modality,
00:43:33.000 | what if we just bring this app back as a Wink,
00:43:36.120 | and what if we were interacting better with voice?
00:43:39.040 | And so with this new demo, what I showed was the ability
00:43:42.240 | to, like, have a complete conversation in real time
00:43:45.600 | with the app, but also the thing we wanted to highlight
00:43:50.200 | was the ability to call tools and functions, right?
00:43:52.440 | So, like, in this case, we placed a phone call
00:43:55.960 | using the Twilio API, interfacing with our AI agents,
00:43:59.920 | but developers are so smart that they'll come up
00:44:03.000 | with so many great ideas
00:44:04.280 | that we could not think ourselves, right?
00:44:06.000 | But what if you could have, like, a 911 dispatcher?
00:44:10.640 | What if you could have, like, a customer service center
00:44:14.360 | that is much smarter than what we've been used to today?
00:44:17.080 | There's gonna be so many use cases for real time.
00:44:18.800 | It's awesome.
00:44:19.640 | - Yeah, and sometimes, actually, you, like,
00:44:21.720 | this should kill phone trees.
00:44:23.360 | Like, there should not be, like, dial one.
00:44:26.120 | - Of course.
00:44:26.960 | (speaking in foreign language)
00:44:28.920 | - Yeah, exactly.
00:44:29.760 | (speaking in foreign language)
00:44:31.080 | I mean, even you starting speaking Spanish
00:44:33.520 | would just do the thing, you know?
00:44:36.000 | You don't even have to ask.
00:44:37.120 | So, yeah, I'm excited for this future
00:44:38.720 | where we don't have to interact with those legacy systems.
00:44:41.360 | - Yeah, yeah.
00:44:42.480 | Is there anything, so, you're doing function calling
00:44:45.440 | in a streaming environment.
00:44:47.440 | So, basically, it's WebSockets, it's UDP, I think.
00:44:52.360 | It's basically not guaranteed to be exactly once delivery.
00:44:55.720 | Like, is there any coding challenges
00:44:57.560 | that you encountered when building this?
00:45:00.160 | - Yeah, it's a bit more delicate to get into it.
00:45:03.240 | We also think that, for now,
00:45:04.600 | what we shipped is a beta of this API.
00:45:07.680 | I think there's much more to build onto it.
00:45:10.600 | It does have the function calling and the tools,
00:45:13.200 | but we think that, for instance,
00:45:14.440 | if you wanna have something very robust on your client side,
00:45:18.320 | maybe you wanna have WebRTC as a client, right?
00:45:20.880 | And as opposed to, like,
00:45:22.160 | directly working with the sockets at scale.
00:45:25.560 | So, that's why we have partners like LifeKit and Agora
00:45:28.000 | if you wanna use them.
00:45:29.480 | And I'm sure we'll have many more in the future.
00:45:32.320 | But, yeah, we keep on iterating on that,
00:45:34.760 | and I'm sure the feedback of developers in the weeks to come
00:45:37.360 | is gonna be super critical for us to get it right.
00:45:39.440 | - Yeah, I think LifeKit has been fairly public
00:45:41.120 | that they are used in the ChatGPT app.
00:45:44.200 | Like, is it just all open source,
00:45:47.360 | and we just use it directly with OpenAI,
00:45:49.880 | or do we use LifeKit Cloud or something?
00:45:52.560 | - So, right now, we released the API.
00:45:54.560 | We released some sample code, also,
00:45:56.440 | and referenced clients for people
00:45:58.160 | to get started with our API.
00:45:59.960 | And we also partnered with LifeKit and Agora,
00:46:03.720 | so they also have their own, like,
00:46:05.520 | ways to help you get started.
00:46:07.160 | That plugs natively with the real-time API.
00:46:10.120 | So, depending on the use case,
00:46:11.320 | people can decide what to use.
00:46:14.040 | If you're working on something that's completely client,
00:46:16.640 | or if you're working on something on the server side
00:46:18.720 | for the voice interaction, you may have different needs.
00:46:20.480 | So, we wanna support all of those.
00:46:22.160 | - I know you've got a run.
00:46:24.920 | Is there anything that you want the AI engineering community
00:46:28.080 | to give feedback on specifically?
00:46:29.720 | Like, even down to, like, you know,
00:46:32.040 | a specific API endpoint, or, like,
00:46:34.000 | what's, like, the thing that you want?
00:46:35.600 | - Yeah, absolutely.
00:46:36.440 | I mean, you know, if we take a step back,
00:46:38.320 | I think Dev Day this year is a little different
00:46:41.080 | from last year, and in a few different ways.
00:46:43.800 | But one way is that we wanted to keep it intimate,
00:46:46.200 | even more intimate than last year.
00:46:48.280 | We wanted to make sure that the community
00:46:50.920 | is on the spotlight.
00:46:52.680 | That's why we have community talks and everything.
00:46:55.000 | And the takeaway here is, like,
00:46:57.040 | learning from the very best developers and AI engineers.
00:47:00.360 | And so, you know, we wanna learn from them.
00:47:03.120 | Most of what we shipped this morning,
00:47:04.760 | including things like prompt caching,
00:47:07.520 | the ability to generate prompts quickly in the playground,
00:47:09.760 | or even things like vision fine-tuning.
00:47:11.680 | These are all things that developers
00:47:13.200 | have been asking of us.
00:47:14.640 | And so the takeaway I would leave them with
00:47:17.640 | is to say, like, hey, the roadmap that we're working on
00:47:20.560 | is heavily influenced by them, and they work.
00:47:23.160 | And so we love feedback from high feature requests,
00:47:27.720 | as you say, down to, like,
00:47:29.360 | very intricate details of an API endpoint.
00:47:32.240 | We love feedback.
00:47:33.080 | So, yes, that's how we build this API.
00:47:37.800 | - Yeah, I think the model distillation thing as well,
00:47:40.360 | it might be, like, the most boring,
00:47:42.760 | but, like, actually used a lot.
00:47:44.520 | - True, yeah.
00:47:45.360 | And I think maybe the most unexpected, right?
00:47:47.240 | Because I think if I read Twitter correctly
00:47:49.800 | the past few days,
00:47:51.680 | a lot of people were expecting us
00:47:53.720 | to ship the real-time API for speech-to-speech.
00:47:56.520 | I don't think developers were expecting us
00:47:58.600 | to have more tools for distillation.
00:48:01.000 | And we really think that's gonna be a big deal, right?
00:48:02.840 | If you're building apps that have, you know,
00:48:05.720 | you want high, like, low-latency, low-cost,
00:48:09.720 | but high-performance, high-quality on the use case,
00:48:12.280 | distillation is gonna be amazing.
00:48:13.640 | - Yeah, I sat in the distillation session just now,
00:48:16.320 | and they showed how they distilled from 4.0 to 4.0 mini,
00:48:18.960 | and it was, like, only, like, a 2% hit in the performance,
00:48:22.120 | and 15x cheaper.
00:48:23.560 | - Yeah, yeah, I was there as well
00:48:25.080 | for the super-human kind of use case
00:48:27.280 | inspired for an e-mail client.
00:48:28.440 | Yeah, this was really good.
00:48:29.800 | - Cool, man.
00:48:31.960 | - Amazing, thank you so much.
00:48:33.920 | Thanks again for being here today.
00:48:35.160 | It's always great to have you.
00:48:36.240 | - As you might have picked up at the end of that chat,
00:48:38.680 | there were many sessions throughout the day
00:48:40.720 | focused on specific new capabilities,
00:48:43.520 | like the new model distillation features,
00:48:45.600 | combining evals and fine-tuning.
00:48:48.360 | For our next session,
00:48:49.480 | we are delighted to bring back two former guests of the pod,
00:48:53.760 | which is something listeners have been greatly enjoying
00:48:56.400 | in our second year of doing the "Latent Space" podcast.
00:49:00.400 | Michelle Pokras of the API team
00:49:02.880 | joined us recently to talk about structured outputs,
00:49:06.000 | and today gave an updated long-form session at Dev Day,
00:49:10.200 | describing the implementation details
00:49:12.240 | of the new structured output mode.
00:49:14.040 | We also got her updated thoughts on the Voice Mode API
00:49:18.120 | we discussed in her episode,
00:49:19.840 | now that it is finally announced.
00:49:22.160 | She is joined by friend of the pod and super-blogger,
00:49:24.800 | Simon Willison, who also came back as guest co-host
00:49:28.400 | in our Dev Day 2023 episode.
00:49:31.680 | - Great, we're back live at Dev Day,
00:49:34.440 | returning guest, Michelle,
00:49:36.640 | and then returning guest co-host, Four.
00:49:39.400 | - Four for sure, yeah, I don't know.
00:49:41.240 | - I've lost count. - I've lost count.
00:49:42.400 | - It's been a few, but-
00:49:43.400 | - Simon Willison is back.
00:49:44.800 | Yeah, we just wrapped everything up.
00:49:48.640 | Congrats on getting everything live.
00:49:51.520 | Simon did a great live blog, so if you haven't caught up-
00:49:54.760 | - I wrote my, I implemented my live blog
00:49:57.280 | while waiting for the first talk to start,
00:49:59.320 | using like a GPT-4, I wrote me the JavaScript,
00:50:03.000 | and I got that live just in time,
00:50:04.400 | and then, yeah, I was live blogging the whole day.
00:50:06.240 | - Are you a cursor enjoyer?
00:50:08.120 | - I haven't really gotten to cursor yet, to be honest.
00:50:10.120 | Like, I just haven't spent enough time for it to click,
00:50:12.440 | I think. - Yeah.
00:50:13.440 | - I'm more a copy and paste things out
00:50:14.760 | to Claude and chat GPT.
00:50:16.120 | - Yeah, it's interesting.
00:50:18.120 | Yeah, I've converted to cursor for,
00:50:20.120 | and O1 is so easy to just toggle on and off.
00:50:23.160 | - Yeah. - What's your workflow?
00:50:24.800 | - Copy-paste. - Okay, I'm gonna be real,
00:50:26.320 | I'm still in VS Code co-pilot, so-
00:50:29.080 | - Same here, team co-pilot. - Co-pilot is actually
00:50:31.560 | the reason I joined OpenAI, it was, you know,
00:50:34.120 | before chat GPT, this is the thing that really got me,
00:50:36.360 | so I'm still into it, but I keep meaning to try out cursor,
00:50:39.120 | and I think, now that things have calmed down,
00:50:40.720 | I'm gonna give it a real go.
00:50:42.320 | - Yeah, it's a big thing to change your tool of choice.
00:50:44.720 | - Yes, yeah, I'm pretty dialed, so.
00:50:46.920 | - Yeah, I mean, you know, if you want,
00:50:49.240 | you can just fork VS Code and make your own.
00:50:51.360 | (laughing)
00:50:52.200 | That's the thing to do. - It's the done thing, right?
00:50:54.280 | - Yeah. - No jokes about doing
00:50:55.120 | a hackathon where the only thing you do is fork VS Code,
00:50:57.480 | and bet me the best fork win.
00:50:59.240 | - Nice. (laughing)
00:51:00.680 | - It's actually a really good idea.
00:51:02.400 | - Yeah, so, I mean, congrats on launching everything today.
00:51:08.000 | I know, like, we touched on it a little bit,
00:51:09.560 | but everyone was kinda guessing that Voice API was coming,
00:51:12.480 | and we talked about it in our episode.
00:51:15.360 | How do you feel going into the launch?
00:51:18.340 | Like, any design decisions that you wanna highlight?
00:51:22.000 | - Yeah, super jazzed about it.
00:51:23.560 | The team has been working on it for a while,
00:51:24.880 | it's like a very different API for us.
00:51:26.600 | It's the first WebSocket API,
00:51:28.400 | so a lot of different design decisions to be made.
00:51:30.280 | It's like, what kind of events do you send?
00:51:31.880 | When do you send an event?
00:51:33.320 | What are the event names?
00:51:34.520 | What do you send, like, on connection
00:51:36.000 | versus on future messages?
00:51:37.760 | So there've been a lot of interesting decisions there.
00:51:39.880 | The team has also hacked together really cool projects
00:51:42.300 | as we've been testing it.
00:51:43.640 | One that I really liked is we had an internal hackathon
00:51:46.000 | for the API team, and some folks built, like,
00:51:49.280 | a little hack that you could use Vim with Voice Mode.
00:51:54.480 | So, like, control Vim, and you would tell the model, like,
00:51:57.320 | write a file, and it would, you know,
00:51:58.760 | know all the Vim commands and type those in.
00:52:00.840 | So, yeah, a lot of cool stuff we've been hacking on,
00:52:02.960 | and really excited to see what people build with it.
00:52:05.640 | - I've gotta call out a demo from today.
00:52:07.640 | I think it was Katja had a 3D visualization
00:52:10.560 | of the solar system, like WebGL solar system
00:52:12.840 | you could talk to.
00:52:13.660 | That is one of the coolest conference demos I've ever seen.
00:52:15.880 | That was so convincing.
00:52:17.480 | I really want the code.
00:52:18.560 | I really want the code for that to get put out there.
00:52:20.960 | - I'll talk to the team.
00:52:21.800 | I think we should probably put it up.
00:52:22.640 | - Absolutely beautiful example.
00:52:24.720 | And it made me realize that the Realtime API,
00:52:27.180 | this WebSocket API, it means that building a website
00:52:29.480 | that you can just talk to is easy now.
00:52:32.640 | It's like, it's not difficult to build, spin up a web app
00:52:35.600 | where you have a conversation with it,
00:52:37.080 | it calls functions for different things,
00:52:38.520 | it interacts with what's on the screen.
00:52:40.360 | I'm so excited about that.
00:52:41.560 | There are all of these projects I thought I'd never get to,
00:52:43.640 | and now I'm like, you know what?
00:52:45.360 | Spend a weekend on it.
00:52:46.320 | I can have a talk to your data, talk to your database
00:52:49.680 | with a little web application.
00:52:51.600 | - Yeah. - That's so cool.
00:52:52.520 | - Chat with PDF, but really chat with it.
00:52:54.760 | - Really chat with PDF, go completely.
00:52:57.120 | - Totally.
00:52:57.960 | - And that's not even hard to build.
00:52:59.120 | That's the crazy thing about this.
00:53:00.760 | - Yeah, very cool.
00:53:02.520 | Yeah, when I first saw the space demo,
00:53:04.520 | I was actually just wowed, and I had a similar moment,
00:53:08.080 | I think, to all the people in the crowd.
00:53:10.240 | I also thought Roman's drone demo was super cool.
00:53:13.600 | - That was a super fun one as well.
00:53:14.800 | - Yeah, I actually saw that live this morning,
00:53:16.520 | and I was holding my breath, for sure.
00:53:19.400 | - Knowing Roman, he probably spent the last two days
00:53:21.280 | of the week working on it. (laughs)
00:53:24.800 | But yeah, I'm curious about,
00:53:26.640 | you were talking with Roman actually earlier
00:53:28.240 | about what the different levels of extraction are
00:53:31.160 | with WebSockets.
00:53:32.040 | It's something that most developers have zero experience with.
00:53:34.720 | I have zero experience with it.
00:53:36.840 | Apparently, there's the RTC level,
00:53:38.960 | and then there's the WebSocket level,
00:53:40.160 | and there's levels in between.
00:53:42.040 | - Not so much.
00:53:42.880 | I mean, with WebSockets,
00:53:44.040 | with the way they've built their API,
00:53:47.040 | you can connect directly to the OpenAI WebSocket
00:53:49.720 | from your browser, and it's actually
00:53:51.440 | just regular JavaScript.
00:53:52.720 | You instantiate the WebSocket thing.
00:53:54.800 | It looks quite easy from their example code.
00:53:57.360 | The problem is that if you do that,
00:53:59.120 | you're sending your API key from source code
00:54:02.040 | that anyone can view.
00:54:03.360 | - Yeah, we don't recommend that for production.
00:54:04.880 | - So it doesn't work for production,
00:54:06.640 | which is frustrating because it means
00:54:07.920 | that you have to build a proxy.
00:54:09.960 | So I'm gonna have to go home and build myself
00:54:11.480 | a little WebSocket proxy just to hide my API key.
00:54:14.800 | I want OpenAI to do that.
00:54:16.040 | I want OpenAI to solve that problem for me
00:54:18.560 | so I don't have to build the 1,000th WebSocket proxy
00:54:21.920 | just for that one problem.
00:54:22.760 | - Totally.
00:54:23.600 | We've also partnered with some partner solutions.
00:54:26.760 | We've partnered with, I think,
00:54:28.440 | Agora, LiveKit, a few others.
00:54:33.440 | So there's some loose solutions there,
00:54:34.840 | but yeah, we hear you.
00:54:36.000 | It's a beta.
00:54:37.680 | - Yeah, yeah.
00:54:38.520 | I mean, you still want a solution
00:54:40.840 | where someone brings their own key
00:54:42.760 | and they can trust that you don't get it, right?
00:54:45.080 | - Kind of.
00:54:45.920 | I mean, I've been building a lot of bring-your-own-key apps
00:54:48.240 | where it's my HTML and JavaScript,
00:54:50.640 | I store the key in local storage in their browser,
00:54:53.520 | and it never goes anywhere near my server,
00:54:55.520 | which works, but how do they trust me?
00:54:57.400 | How do they know I'm not gonna ship
00:54:58.800 | another piece of JavaScript that steals the key from me?
00:55:00.560 | - And so nominally,
00:55:01.720 | this actually comes with the crypto background.
00:55:03.880 | This is what MetaMask does.
00:55:06.080 | - Yeah, it's a public/private key thing.
00:55:09.960 | - Yeah.
00:55:10.800 | Like, why doesn't OpenAI do that?
00:55:13.280 | I don't know if, obviously, it's--
00:55:14.800 | - I mean, as with most things,
00:55:16.680 | I think there's some really interesting question
00:55:19.200 | and a really interesting reason,
00:55:20.400 | and the answer is just, you know,
00:55:22.000 | it's not been the top priority,
00:55:23.160 | and it's hard for a small team to do everything.
00:55:26.440 | I have been hearing a lot more
00:55:28.120 | about the need for things like sign-in with OpenAI.
00:55:31.280 | - I want OAuth.
00:55:32.120 | I want to bounce my users through chat GPT,
00:55:34.120 | and I get back a token that lets me spend up to $4
00:55:37.400 | on the API on their behalf.
00:55:39.280 | That would solve it.
00:55:40.120 | Then I could ship all of my stupid little experiments,
00:55:42.680 | which currently require people to copy and paste
00:55:45.000 | the API key in, which cuts off everyone.
00:55:47.520 | Nobody knows how to do that.
00:55:48.360 | - Totally, I hear you.
00:55:49.520 | Something we're thinking about, and yeah, stay here.
00:55:52.480 | - Yeah, yeah.
00:55:53.920 | Right now, I think the only player in town is OpenRouter.
00:55:56.720 | That is basically, it's funny.
00:56:00.360 | It was made by, I forget his name,
00:56:04.080 | but he used to be CTO of OpenSea,
00:56:06.640 | and the first thing he did when he came over
00:56:08.520 | was build MetaMask for AI.
00:56:09.880 | - Totally.
00:56:12.200 | Yeah, very cool.
00:56:14.960 | What's the most underrated release from today?
00:56:18.840 | - Vision Fine Tuning.
00:56:20.600 | Vision Fine Tuning is so underrated.
00:56:22.320 | For the past two months, whenever I talk to founders,
00:56:25.000 | they tell me this is the thing they need most.
00:56:27.000 | A lot of people are doing OCR on very bespoke formats,
00:56:30.320 | like government documents, and Vision Fine Tuning
00:56:32.800 | can help a lot with that use case.
00:56:34.640 | Also, bounding boxes.
00:56:36.080 | People have found a lot of improvements
00:56:38.080 | for bounding boxes with Vision Fine Tuning.
00:56:40.040 | So yeah, I think it's pretty slept on,
00:56:41.600 | and people should try it.
00:56:42.800 | You only really need 100 images to get going.
00:56:45.120 | - Tell me more about bounding boxes.
00:56:46.560 | I didn't think GPT-4 Vision could do bounding boxes at all.
00:56:50.400 | - Yeah, it's actually not that amazing at it.
00:56:53.040 | We're working on it, but with fine tuning,
00:56:55.680 | you can make it really good for your use case.
00:56:57.440 | - That's cool, 'cause I've been using
00:56:58.640 | Google Gemini's bounding box stuff recently.
00:57:00.320 | It's very, very impressive.
00:57:01.960 | - Yeah, totally.
00:57:02.800 | - But being able to fine tune a model for that.
00:57:04.120 | The first thing I'm gonna do with fine tuning for images
00:57:06.280 | is I've got five chickens, and I'm gonna fine tune a model
00:57:09.880 | that can tell which chicken is which,
00:57:11.800 | which is hard, 'cause three of them are gray.
00:57:13.480 | - Yeah.
00:57:14.320 | - So there's a little bit of--
00:57:15.520 | - Okay, this is my new favorite use case.
00:57:17.160 | This is awesome.
00:57:18.000 | - Yeah, it's, I've managed to do it with prompting,
00:57:21.400 | just like I gave Claude pictures of all of the chickens,
00:57:24.440 | and then said, "Okay, which chicken is this?"
00:57:26.240 | - Yeah.
00:57:27.080 | - But it's not quite good enough,
00:57:27.900 | 'cause it confuses the gray chicken.
00:57:29.520 | - Listen, we can close that eval gap.
00:57:31.320 | - Yeah, that's, it's gonna be a great eval.
00:57:33.200 | Like, my chicken eval's gonna be fantastic.
00:57:35.240 | - I'm also really jazzed about the evals product.
00:57:37.800 | It's kind of like a sub-launch of the distillation thing,
00:57:40.600 | but people have been struggling to make evals,
00:57:42.920 | and the first time I saw the flow
00:57:44.300 | with how easy it is to make an eval in our product,
00:57:46.980 | I was just blown away.
00:57:48.640 | So recommend people really try that.
00:57:50.000 | I think that's what's holding a lot of people back
00:57:52.400 | from really investing in AI,
00:57:53.680 | 'cause they just have a hard time figuring out
00:57:55.440 | if it's going well for their use case.
00:57:57.160 | So we've been working on making it easier to do that.
00:57:59.760 | - Does the eval product include structure output testing,
00:58:03.760 | like function calling and things?
00:58:05.480 | - If it matches your JSON schema, yeah.
00:58:09.880 | - No, I mean, we have guaranteed structured output anyway.
00:58:13.640 | - Right, so we don't have to test it.
00:58:15.760 | - Well, not the schema, but like the performance.
00:58:19.260 | - I think so, yeah.
00:58:20.100 | - So you might call the wrong function, or.
00:58:22.320 | - Oh, I see.
00:58:23.160 | - You're gonna write schema, wrong output.
00:58:24.840 | - So you can do function calling testing.
00:58:26.720 | - I'm pretty sure.
00:58:27.560 | I'll have to check that for you, but I think so.
00:58:29.120 | - Well, we'll make sure it's sent out.
00:58:30.760 | - Fun fact, after our podcast,
00:58:33.060 | they released function calling v3,
00:58:35.200 | which is multi-turn function calling benchmarks.
00:58:38.480 | - We're having the guy on the podcast as well, sorry?
00:58:40.960 | - Are you saying the BFCL?
00:58:42.240 | - BFCL.
00:58:43.080 | - Yeah, yeah.
00:58:44.320 | - What would you ask the BFCL guys,
00:58:45.440 | 'cause we're actually having them next on the podcast.
00:58:47.460 | - Yeah.
00:58:48.300 | Yeah, I think we tried out v3.
00:58:52.760 | - It's just multi-turn.
00:58:55.320 | - We have some feedback from the founder.
00:58:57.760 | We should probably cut this, but we wanna make it better.
00:59:03.840 | What, like, how do you think about the evolution
00:59:08.560 | of like the API design?
00:59:09.920 | I think to me, that's like the most important thing.
00:59:11.480 | So even with the open AI levels, like chatbots,
00:59:15.040 | I can understand what the API design looks like.
00:59:17.740 | Reasoning, I can kind of understand it,
00:59:20.280 | even though like chain of thought kind of changes things.
00:59:23.200 | As you think about real-time voice,
00:59:24.720 | and then you think about agents,
00:59:26.080 | it's like, how do you think about how you design the API,
00:59:29.800 | and like what the shape of it is?
00:59:32.120 | - Yeah, so I think we're starting
00:59:34.480 | with the lowest level capabilities,
00:59:36.840 | and then we build on top of that
00:59:38.000 | as we know that they're useful.
00:59:39.080 | So a really good example of this is real-time.
00:59:41.640 | We're actually going to be shipping audio capabilities
00:59:44.520 | in chat completions.
00:59:45.720 | So this is like the lowest level capability.
00:59:48.120 | So you supply in audio, and you can get back raw audio,
00:59:51.080 | and it works at the request response layer.
00:59:53.220 | But in through building advanced voice mode,
00:59:55.360 | we realized ourselves that like it's pretty hard to do
00:59:58.040 | with something like chat completions.
00:59:59.760 | And so that led us to building this WebSocket API.
01:00:02.840 | So we really learned a lot from our own tools,
01:00:04.520 | and we think the chat completions thing is nice,
01:00:06.720 | and for certain use cases are async stuff,
01:00:08.880 | but you're really gonna want a real-time API.
01:00:11.280 | And then as we test more with developers,
01:00:13.720 | we might see that it makes sense to have
01:00:16.040 | another layer of abstraction on top of that,
01:00:19.200 | something like closer to more client-side libraries.
01:00:24.200 | But for now, that's where we feel
01:00:26.160 | we have a really good point of view.
01:00:28.240 | - So that's a question I have is,
01:00:29.800 | if I've got a half hour long audio recording,
01:00:32.680 | at the moment, the only way I can feed that in
01:00:34.840 | is if I call the WebSocket API and slice it up
01:00:37.720 | into little JSON basics for snippets
01:00:40.000 | and fire them all over.
01:00:41.300 | In that case, I'd rather just give you a,
01:00:42.880 | like an image in the chat completion API,
01:00:45.040 | give you a URL to my MP3 files and input.
01:00:46.960 | Is that something-- - That's what we're gonna do.
01:00:48.320 | - Oh, thank goodness for that.
01:00:49.160 | - Yes, it's in the blog post.
01:00:50.440 | I think it's a short one-liner,
01:00:51.760 | but it's rolling out, I think, in the coming weeks.
01:00:53.960 | - Oh, wow, oh, really soon then.
01:00:55.640 | - The team has been sprinting,
01:00:57.600 | we're just putting finishing touches on stuff.
01:00:58.440 | - Do you have a feel for the length limit on that?
01:01:01.120 | - I don't have it off the top.
01:01:02.480 | - Okay. - I'm sorry.
01:01:03.520 | - 'Cause yeah, often I want to do,
01:01:04.920 | I do a lot of work with transcripts
01:01:06.440 | of hour-long YouTube videos,
01:01:07.680 | which currently, I run them through Whisper
01:01:09.920 | and then I do the transcript that way,
01:01:11.240 | but being able to do the multimodal thing,
01:01:13.480 | those would be really useful.
01:01:14.400 | - Totally, yeah, we're really jazzed about it.
01:01:15.680 | We wanna basically give the lowest capabilities we have,
01:01:18.400 | lowest level capabilities,
01:01:19.760 | and the things that make it easier to use.
01:01:22.080 | And so, targeting kind of both, yeah.
01:01:27.280 | - I just realized what I can do, though,
01:01:28.920 | is I do a lot of Unix utilities, little Unix things.
01:01:32.480 | I want to be able to pipe the output of a command
01:01:34.640 | into something which streams that up
01:01:37.080 | to the WebSocket API and then speaks it out loud.
01:01:40.120 | So I can do streaming speech of the output of things.
01:01:43.360 | That should work.
01:01:44.640 | Like, I think you've given me everything I need for that.
01:01:46.360 | That's cool.
01:01:47.860 | - Yeah, excited to see what you build.
01:01:49.760 | - I heard there are multiple competing solutions,
01:01:55.760 | and you guys eval it before you pick WebSockets,
01:01:58.760 | like server-set events, polling.
01:02:01.600 | Can you give your thoughts on the live updating paradigms
01:02:08.280 | that you guys looked at?
01:02:09.480 | 'Cause I think a lot of engineers
01:02:10.560 | have looked at stuff like this.
01:02:12.520 | - Well, I think WebSockets are just a natural fit
01:02:14.480 | for bidirectional streaming.
01:02:16.440 | Other places I've worked, like Coinbase,
01:02:18.360 | we had a WebSocket API for pricing data,
01:02:21.000 | and I think it's just a very natural format.
01:02:24.760 | - So it wasn't even really that controversial at all?
01:02:28.000 | - I don't think it was super controversial.
01:02:29.600 | I mean, we definitely explored the space a little bit,
01:02:31.380 | but I think we came to WebSockets pretty quickly, yeah.
01:02:35.300 | - Cool.
01:02:36.140 | Video?
01:02:39.000 | - Yeah, not yet, but, you know, possible in the future.
01:02:43.600 | - I actually was hoping for the ChatGPT desktop app
01:02:47.080 | with video today, because that was demoed.
01:02:49.600 | - Yeah, this is dev day.
01:02:51.840 | - Yeah.
01:02:52.680 | - I think the moment we have the ability
01:02:54.160 | to send images over the WebSocket API, we get video.
01:02:57.360 | - My question is, how frequently?
01:03:00.040 | Yeah, because, yeah, I mean, sending a whole video frame
01:03:04.560 | of like a 1080p screen, maybe it might be too much.
01:03:08.460 | What's the limitations on a WebSocket chunk going over?
01:03:11.680 | I don't know.
01:03:12.520 | - I don't have that off the top.
01:03:15.440 | - Like Google Gemini, you can do an hour's worth of video
01:03:18.140 | in their context window, just by slicing it up
01:03:20.600 | into one frame at 10 frames a second, and it does work.
01:03:24.680 | So I don't know, but then that's the weird thing
01:03:28.820 | about Gemini is it's so good at you just giving it a flood
01:03:31.480 | of individual frames, it'll be interesting to see
01:03:33.400 | if GPT 4.0 can handle that or not.
01:03:35.960 | - Do you have any more feature requests?
01:03:42.680 | I know it's been a long day for everybody,
01:03:44.720 | but you got me a show right here, so.
01:03:47.480 | - I want you to do all of the accounting for me.
01:03:50.640 | I want my users to be able to run my apps,
01:03:53.620 | and I want them to call your APIs with their user ID
01:03:57.460 | and have you go, oh, they've spent 30 cents,
01:04:00.520 | cut them off at a dollar, I can check how much they spent,
01:04:03.440 | all of that stuff, 'cause I'm having to build that
01:04:04.960 | at the moment, and I really don't want to.
01:04:06.720 | I don't want to be a token accountant.
01:04:08.320 | I want you to do the token accounting for me.
01:04:10.680 | - Yeah, totally, I hear you.
01:04:12.640 | It's good feedback.
01:04:13.760 | - Well, how does that contrast with your actual priorities?
01:04:16.480 | Like, I feel like you have a bunch of priorities.
01:04:19.160 | They showed some on stage with multimodality and all that.
01:04:21.860 | - Yeah.
01:04:22.700 | Yeah, it's hard to say.
01:04:26.840 | I would say things change really quickly.
01:04:28.880 | Things that are big blockers for user adoption
01:04:33.520 | we find very important, and yeah.
01:04:36.700 | It's a rolling prioritization, yeah.
01:04:40.920 | - No Assistance API update.
01:04:43.680 | - Not at this time, yeah.
01:04:45.760 | - Yeah.
01:04:46.600 | - I was hoping for an all-one native thing in Assistance.
01:04:51.280 | - Yeah.
01:04:52.120 | - 'Cause I thought they would go well together.
01:04:53.680 | - We're still kind of iterating on the formats.
01:04:56.220 | I think there are some problems with the Assistance API,
01:04:58.720 | some things it does really well,
01:05:00.320 | and I think we'll keep iterating
01:05:01.900 | and land on something really good,
01:05:03.240 | but just, you know, it wasn't quite ready yet.
01:05:05.340 | Some of the things that are good
01:05:06.180 | in the Assistance API is hosted tools.
01:05:07.980 | People really like hosted tools, and especially RAG.
01:05:11.800 | And then some things that are less intuitive
01:05:13.820 | is just how many API requests you need to get going
01:05:16.360 | with the Assistance API.
01:05:17.680 | - It's quite.
01:05:18.520 | - It's quite a lot.
01:05:19.600 | Yeah, you gotta create an Assistant,
01:05:21.360 | you gotta create a thread, you gotta do all this stuff.
01:05:24.600 | So yeah, it's something we're thinking about.
01:05:25.820 | It shouldn't be so hard.
01:05:27.000 | - The only thing I've used it for so far is Code Interpreter.
01:05:29.320 | It's like it's an API to Code Interpreter.
01:05:31.080 | - Yes.
01:05:31.920 | - Crazy exciting.
01:05:32.760 | - Yes, we wanna fix that and make it easier to use.
01:05:36.240 | - I want Code Interpreter over WebSockets.
01:05:38.240 | That would be wildly interesting.
01:05:41.480 | - Yeah, do you wanna bring your own Code Interpreter
01:05:44.140 | or you wanna use OpenAI as well?
01:05:44.980 | - I wanna use that 'cause Code Interpreter's a hard problem.
01:05:47.700 | Sandboxing and all of that stuff is.
01:05:49.660 | - Yeah, but there's a bunch of
01:05:50.580 | Code Interpreter as a service things out there.
01:05:52.620 | - There are a few now, yeah.
01:05:53.540 | - Because there's, I think you don't allow
01:05:56.340 | arbitrary installation of packages.
01:05:57.860 | - Oh, they do.
01:05:58.700 | - Let's.
01:05:59.520 | - They really do.
01:06:00.360 | - Let's use your hack.
01:06:01.200 | - Yeah, and I do.
01:06:02.260 | - Yeah, you can upload a PIP package.
01:06:04.620 | - You can run, you can compile C code in Code Interpreter.
01:06:07.500 | - I know.
01:06:08.340 | - If you know how to do it.
01:06:09.160 | - That's a hack, that's a hack.
01:06:10.000 | - That's a glorious hack, though.
01:06:12.340 | - Okay.
01:06:13.180 | - I've had it write me custom SQLite extensions in C
01:06:15.780 | and compile them and run them inside of Python
01:06:18.140 | and it works.
01:06:19.160 | (laughing)
01:06:21.420 | - I mean, yeah, there's others, E2B is one of them.
01:06:25.420 | Like, yeah, it'll be interesting to see
01:06:27.580 | what the real-time version of that would be.
01:06:30.420 | Yeah.
01:06:31.260 | - Also, Michel, thank you for the update.
01:06:34.780 | We left the episode as what will voice mode look like?
01:06:38.820 | - Obviously, you knew what it looked like,
01:06:40.100 | but you didn't say it, so now you could.
01:06:41.700 | - Yeah, here we are.
01:06:42.900 | Hope you guys find it.
01:06:44.260 | - Yeah.
01:06:45.100 | - Cool.
01:06:45.920 | - Awesome, thank you.
01:06:46.760 | - That's it.
01:06:47.600 | - Our final guest today,
01:06:48.420 | and also a familiar recent voice on the Latent Space pod,
01:06:52.020 | presented at one of the community talks
01:06:53.980 | at this year's Dev Day.
01:06:55.180 | Alastair Pullen of Cosene made a huge impression
01:06:59.140 | with all of you.
01:07:00.020 | Special shout out to listeners like Jesse from Morph Labs
01:07:03.700 | when he came on to talk about
01:07:05.000 | how he created synthetic data sets
01:07:07.300 | to fine-tune the largest Loras
01:07:09.500 | that had ever been created for GPT-4.0
01:07:12.340 | to post the highest-ever scores on SuiBench
01:07:15.020 | and SuiBench Verified,
01:07:16.940 | while not getting recognition for it
01:07:18.940 | because he refused to disclose his reasoning traces
01:07:21.700 | to the SuiBench team.
01:07:22.960 | Now that OpenAI's R1 preview is announced,
01:07:26.940 | it is incredible to see the OpenAI team
01:07:29.820 | also obscure their chain-of-thought traces
01:07:32.100 | for competitive reasons
01:07:33.620 | and still perform lower than Cosene's genie model.
01:07:36.660 | We snagged some time with Ali
01:07:39.260 | to break down what has happened since his episode aired.
01:07:42.220 | Welcome back, Ali.
01:07:43.060 | - Thank you so much.
01:07:43.900 | Thanks for having me.
01:07:45.460 | - So you just spoke at OpenAI Dev Day.
01:07:47.940 | What was the experience like?
01:07:48.940 | Did they reach out to you?
01:07:50.240 | You seem to have a very close relationship.
01:07:53.140 | - Yeah, so off the back of the work that we've done
01:07:56.260 | that we spoke about last time we saw each other,
01:07:59.420 | I think that OpenAI definitely felt
01:08:01.980 | that the work we've been doing around fine-tuning
01:08:03.820 | was worth sharing.
01:08:04.820 | I would obviously tend to agree,
01:08:07.020 | but today I spoke about some of the techniques
01:08:10.020 | that we learned.
01:08:10.860 | Obviously, it was like a non-linear path
01:08:12.620 | arriving to where we've arrived
01:08:13.900 | and the techniques that we built to build genie.
01:08:16.300 | So I think I shared a few extra pieces
01:08:20.540 | about some of the techniques
01:08:21.420 | and how it really works under the hood,
01:08:22.940 | how you generate a data set to show the model
01:08:25.260 | how to do what we show the model.
01:08:26.900 | And that was mainly what I spoke about today.
01:08:29.940 | I mean, yeah, they reached out.
01:08:31.620 | I was super excited at the opportunity, obviously.
01:08:34.460 | Like, it's not every day that you get to come and do this,
01:08:36.540 | especially in San Francisco.
01:08:37.620 | So yeah, they reached out and they were like,
01:08:39.460 | do you wanna talk at Dev Day?
01:08:40.620 | You can speak about basically anything you want
01:08:42.380 | related to what you've built.
01:08:43.340 | And I was like, sure, that's amazing.
01:08:44.620 | I'll talk about fine-tuning,
01:08:46.100 | how you build a model that does this software engineering.
01:08:48.500 | So yeah.
01:08:49.340 | - Yeah, and the trick here is when we talked,
01:08:52.980 | O1 was not out.
01:08:53.900 | - No, it wasn't.
01:08:54.740 | - Did you know about O1 or?
01:08:56.740 | - I didn't know.
01:08:58.380 | I knew some bits and pieces.
01:08:59.980 | No, not really.
01:09:01.100 | I knew a reasoning model was on the way.
01:09:02.580 | I didn't know what it was gonna be called.
01:09:03.740 | I knew as much as everyone else.
01:09:04.740 | Strawberry was the name back then.
01:09:06.420 | - Because, you know, I'll fast forward.
01:09:09.180 | You were the first to hide
01:09:10.940 | your chain of thought reasoning traces as IP.
01:09:13.940 | - Yes.
01:09:14.780 | - Right, famously that got you in trouble
01:09:16.140 | with Sweetbatch or whatever.
01:09:16.980 | - Yes.
01:09:18.420 | I feel slightly vindicated by that now, not gonna lie.
01:09:20.580 | - And now, obviously, O1 is doing it.
01:09:21.980 | - Yeah, the fact that, yeah, I mean, like,
01:09:24.020 | I think it's true to say right now
01:09:26.620 | that the reasoning of your model
01:09:29.540 | gives you the edge that you have.
01:09:31.300 | And, like, the amount of effort
01:09:34.060 | that we put into our data pipeline
01:09:35.540 | to generate these human-like reasoning traces was,
01:09:38.260 | I mean, that wasn't for nothing,
01:09:39.500 | that we knew that this was the way
01:09:40.860 | that you'd unlock more performance,
01:09:41.960 | getting them all to think in a specific way.
01:09:44.140 | In our case, we wanted it to think like a software engineer.
01:09:46.140 | But, yeah, I think that
01:09:48.580 | the approach that other people have taken,
01:09:52.940 | like OpenAI, in terms of reasoning,
01:09:54.900 | has definitely showed us that we were
01:09:56.780 | going down the right path pretty early on.
01:09:59.660 | And even now, we've started replacing
01:10:03.060 | some of the reasoning traces in our genie model
01:10:05.500 | with reasoning traces generated by O1,
01:10:07.500 | or at least in tandem with O1.
01:10:09.060 | And we've already started seeing improvements
01:10:10.540 | in performance from that point.
01:10:12.780 | But no, like, back to your point,
01:10:14.820 | in terms of, like, the whole, like, withholding them,
01:10:18.380 | I still think that that was the right decision to do
01:10:21.900 | because of the very reason
01:10:23.300 | that everyone else has decided to not share those things.
01:10:26.220 | It's, it is exactly, it shows exactly how we do what we do.
01:10:30.380 | And that is our edge at the moment.
01:10:31.580 | So, yeah.
01:10:32.420 | - As a founder, so, they also feature Cognition on stage,
01:10:37.620 | talked about them.
01:10:38.980 | How does that make you feel that, like, you know,
01:10:41.580 | they're like, "Hey, O1 is so much better, makes us better."
01:10:44.060 | For you, it should be like,
01:10:45.220 | "Oh, I'm so excited about it, too."
01:10:46.860 | Because now, all of a sudden, it's like,
01:10:48.260 | it kind of, like, raises the floor for everybody.
01:10:50.260 | Like, how should people, especially new founders,
01:10:52.420 | how should they think about, you know,
01:10:53.980 | worrying about the new model
01:10:54.980 | versus, like, being excited about them,
01:10:56.380 | just focusing on, like, the core FD
01:10:57.820 | and maybe switching out some of the parts,
01:10:59.460 | like you mentioned?
01:11:00.300 | - Yeah, I, speaking for us, I mean, obviously, like,
01:11:04.260 | we were extremely excited about O1
01:11:06.260 | because at that point, the process of reasoning
01:11:09.540 | is obviously very much baked into the model.
01:11:11.460 | We fundamentally, if you, like,
01:11:12.900 | remove all distractions and everything,
01:11:14.260 | we are a reasoning company, right?
01:11:15.780 | We want to reason in the way
01:11:16.980 | that a software engineer reasons.
01:11:18.540 | So, when I saw that model announced,
01:11:19.940 | I thought immediately, "Well, I can improve the quality
01:11:22.420 | "of my traces coming out of my pipeline."
01:11:24.460 | So, like, my signal-to-noise ratio gets better.
01:11:26.940 | And then, not immediately, but down the line,
01:11:29.340 | I'm going to be able to train those traces into O1 itself.
01:11:32.540 | So, I'm going to get even more performance that way as well.
01:11:35.540 | So, it's, for us, a really nice position to be in,
01:11:38.380 | to be able to take advantage of it,
01:11:39.580 | both on the prompted side and the fine-tuned side.
01:11:42.060 | And also because, fundamentally, like,
01:11:46.020 | we are, I think, fairly clearly in a position now
01:11:48.460 | where we don't have to worry about
01:11:49.860 | what happens when O2 comes out,
01:11:51.420 | what happens when O3 comes out.
01:11:52.820 | This process continues, like, even going from,
01:11:55.620 | you know, when we first started going from 3.5 to 4,
01:11:58.540 | we saw this happen.
01:11:59.780 | And then, from 4 turbo to 4.0,
01:12:03.620 | and then from 4.0 to O1,
01:12:04.980 | we've seen the performance get better every time.
01:12:07.780 | And I think, I mean, like,
01:12:09.620 | the crude advice I'd give to any startup founder
01:12:11.860 | is try to put yourself in a position
01:12:13.100 | where you can take advantage of the same,
01:12:15.100 | you know, like, sea-level rise every time, essentially.
01:12:18.220 | - Do you make anything out of the fact
01:12:19.660 | that you were able to take 4.0 and fine-tune it
01:12:22.860 | higher than O1 currently scores on SweeBench Verified?
01:12:26.460 | - Yeah, I mean, like, that was obviously,
01:12:29.020 | to be honest with you, you realized that before I did.
01:12:32.420 | But it was-- - Adding value.
01:12:33.620 | - Yes, absolutely, that's a value-add investor right there.
01:12:35.780 | No, obviously, I think it's been,
01:12:38.180 | that in of itself is really vindicating to see
01:12:40.540 | because I think we have heard from some people,
01:12:45.540 | not a lot of people, but some people saying,
01:12:46.940 | "Well, okay, well, if O1 can reason,
01:12:48.540 | "then what's the point of doing your reasoning?"
01:12:50.260 | But it shows how much more signal is in, like,
01:12:52.580 | the custom reasoning that we generate.
01:12:54.460 | And again, it's the very sort of obvious thing.
01:12:59.020 | If you take something that's made to be general
01:13:01.020 | and you make it specific,
01:13:01.940 | of course it's gonna be better at that thing, right?
01:13:04.740 | So it was obviously great to see, like,
01:13:06.980 | we still are better than O1 out of the box,
01:13:09.740 | you know, even with an older model,
01:13:11.100 | and I'm sure that that delta will continue to grow
01:13:13.540 | once we're able to train O1,
01:13:14.900 | and once we've done more work on our dataset using O1,
01:13:17.140 | like, that delta will grow as well.
01:13:18.580 | - It's not obvious to me
01:13:19.540 | that they will allow you to fine-tune O1,
01:13:21.220 | but, you know, maybe they'll try.
01:13:23.500 | I think the core question
01:13:26.540 | that OpenAI really doesn't want you to figure out
01:13:29.820 | is can you use an open-source model and beat O1?
01:13:33.020 | - Interesting.
01:13:35.060 | - Because you basically have shown proof of concept
01:13:37.980 | that a non-O1 model can beat O1,
01:13:40.740 | and their whole O1 marketing is, "Don't bother trying."
01:13:44.500 | Like, "Don't bother stitching together
01:13:46.420 | "multiple chain-of-thought calls.
01:13:47.540 | "We did something special, secret sauce.
01:13:49.380 | "You don't know anything about it."
01:13:51.300 | And somehow, you know,
01:13:53.180 | your 4.0 chain-of-thought reasoning
01:13:56.380 | as a software engineer is still better.
01:13:58.260 | Maybe it doesn't last.
01:13:59.100 | Maybe they're gonna run O1 for five hours
01:14:01.380 | instead of five minutes, and then suddenly it works.
01:14:03.860 | So I don't know.
01:14:04.900 | - It's hard to know.
01:14:05.740 | I mean, one of the things that we just want to do
01:14:07.260 | out of sheer curiosity
01:14:08.420 | is do something like fine-tune 4.05b on the same dataset.
01:14:11.820 | Like, same context window length, right?
01:14:13.420 | So it should be fairly easy.
01:14:15.020 | We haven't done it yet.
01:14:15.860 | Truthfully, we have been so swamped
01:14:17.380 | with the waitlist, shipping product, you know, dev day,
01:14:20.580 | like, you know, onboarding customers from our waitlist.
01:14:22.860 | All these different things have gotten in the way,
01:14:25.020 | but it is definitely something out of more curiosity
01:14:27.340 | than anything else I'd like to try out.
01:14:28.780 | But also, it opens up a new vector of, like,
01:14:31.340 | if someone has a VPC
01:14:33.180 | where they can't deploy an OpenAI model,
01:14:34.660 | but they might be able to deploy an open-source model,
01:14:36.740 | it opens that up for us, as well,
01:14:39.100 | from a customer perspective.
01:14:40.100 | So it'll probably be quite useful.
01:14:41.340 | I'd be very keen to see what the results of it.
01:14:44.020 | - I suspect the answer is yes,
01:14:45.580 | but there may be, it may be hard to do.
01:14:47.900 | So, like, Reflection 70b was, like,
01:14:49.500 | a really crappy attempt at doing it.
01:14:51.700 | You guys were much better,
01:14:52.940 | and that's why we had you on the show.
01:14:54.580 | I, yeah, I'm interested to see if there's Open01, basically.
01:14:58.100 | People want Open01.
01:14:59.020 | - Yeah, I'm sure they do.
01:15:00.140 | As soon as we do it, and, like,
01:15:02.820 | once we've wrapped up what we're doing in San Francisco,
01:15:04.660 | I'm sure we'll give it a go.
01:15:06.140 | I spoke to some guys today, actually,
01:15:08.340 | about fine-tuning 4.05b,
01:15:10.660 | who might be able to allow us to do it very easily.
01:15:14.740 | I don't want to have to, basically,
01:15:15.580 | do all the setup myself, so.
01:15:17.820 | Yeah, that might happen sooner rather than later.
01:15:20.020 | - Yeah.
01:15:20.860 | - Anything?
01:15:21.780 | - Anything from the releases today
01:15:23.980 | that you're super excited about?
01:15:25.060 | So, prompt caching, I'm guessing,
01:15:26.580 | when you're, like, dealing with a lot of code bases,
01:15:28.420 | that might be helpful.
01:15:29.340 | Is there anything with vision fine-tuning,
01:15:31.580 | related to, like, more, like, UI-related development?
01:15:34.020 | - Yeah, definitely.
01:15:34.940 | Yeah, I mean, we were, like, we were talking.
01:15:36.420 | It's funny, like, my co-founder, Sam, who you've met,
01:15:39.340 | and I were talking about the idea
01:15:41.020 | of doing vision fine-tuning, like, way back,
01:15:43.500 | like, well over a year ago,
01:15:44.580 | before Genie existed as it does now.
01:15:47.100 | When we collected our original dataset
01:15:49.700 | to do what we do now,
01:15:52.300 | whenever there were image links
01:15:54.340 | and links to, like, graphical resources and stuff,
01:15:57.220 | we also pulled that in as well.
01:15:58.740 | We never had the opportunity to use it,
01:16:00.500 | but it's something we have in storage.
01:16:02.100 | And, again, like, when we have the time,
01:16:04.060 | it's something that I'm super excited,
01:16:05.460 | particularly on the UI side,
01:16:06.940 | to be able to, like, leverage.
01:16:09.540 | Particularly, if you think about one of the things,
01:16:11.220 | not to sidetrack, but one of the things we've noticed is,
01:16:14.540 | I know SweetBench is, like,
01:16:15.580 | the most commonly talked about thing,
01:16:16.780 | and, honestly, it's an amazing project,
01:16:18.740 | but one of the things we've learned the most
01:16:20.500 | from actually shipping this product to users is
01:16:22.860 | it's a pretty bad proxy at telling us
01:16:24.900 | how competent the model is.
01:16:25.940 | So, for example, when people are doing, like,
01:16:27.740 | React development using Genie,
01:16:30.300 | for us, it's impossible to know
01:16:33.060 | whether what it's written has actually done,
01:16:34.740 | you know, done what it wanted to.
01:16:36.340 | So, at least, even using, like,
01:16:38.540 | the fine-tuning for vision to be able to help eval,
01:16:41.500 | like, what we output,
01:16:42.540 | is already something that's very useful.
01:16:45.100 | But also, in terms of being able to pair,
01:16:46.820 | here's a UI I want,
01:16:47.860 | here's the code that actually, like, represents that UI,
01:16:50.860 | is also gonna be super useful as well, I think.
01:16:52.940 | In terms of generally,
01:16:54.060 | what have I been most impressed by?
01:16:56.780 | The distillation thing is awesome.
01:16:59.180 | I think we'll probably end up using it in places.
01:17:03.180 | But what it shows me more broadly
01:17:05.140 | about OpenAI's approach is
01:17:07.580 | they're gonna be building a lot of the things
01:17:09.180 | that we've had to hack together internally,
01:17:11.100 | in terms from a tooling point of view,
01:17:13.300 | just to make our lives so much easier.
01:17:14.820 | And I've spoken to, you know,
01:17:16.340 | John, the head of fine-tuning, extensively about this.
01:17:18.860 | But there's a bunch of tools
01:17:19.900 | that we've had to build internally
01:17:21.260 | for things like dealing with model lineage,
01:17:23.660 | dealing with data set lineage,
01:17:24.980 | because it gets so messy so quickly,
01:17:27.780 | that we would love OpenAI to build.
01:17:29.660 | Like, absolutely would love them to build.
01:17:31.580 | It's not what gives us our edge,
01:17:32.860 | but it certainly means that then
01:17:35.260 | we don't have to build it and maintain it afterwards.
01:17:37.380 | So, it's a really good first step, I think,
01:17:39.860 | in, like, the overall maturity
01:17:42.100 | of the fine-tuning product and API,
01:17:43.820 | in terms of where they're going,
01:17:44.940 | to see those early products.
01:17:47.220 | And I think that they'll be continuing
01:17:48.460 | in that direction going on.
01:17:50.460 | - Did you not, so there's a very active ecosystem
01:17:53.540 | of LLLmaps tools.
01:17:55.940 | Did you not evaluate those before building your own?
01:17:58.340 | - We did, but I think fundamentally, like...
01:18:04.100 | - No moat. (laughs)
01:18:07.340 | - Yeah, like, I think, in a lot of places,
01:18:09.900 | it was never a big enough pain point to be like,
01:18:12.300 | "Oh, we absolutely must outsource this."
01:18:14.380 | It's definitely, in many places,
01:18:15.820 | something that you can hack a script together
01:18:18.180 | in a day or two, and then hook it up
01:18:21.620 | to our already existing internal tool UI,
01:18:23.940 | and then you have, you know, what you need.
01:18:25.780 | And whenever you need a new thing, you just tack it on.
01:18:28.260 | But for, like, all of these LLLmaps tools,
01:18:32.220 | I've never felt the pain point enough
01:18:33.940 | to really, like, bother.
01:18:35.380 | And that's not to deride them at all.
01:18:37.660 | I'm sure many people find them useful,
01:18:39.100 | but just for us as a company,
01:18:40.820 | we've never felt the need for them.
01:18:43.100 | So it's great that OpenAI are gonna build them in,
01:18:47.220 | 'cause it's really nice to have them there, for sure.
01:18:50.300 | But it's not something that, like,
01:18:51.940 | I'd ever consider really paying for externally
01:18:53.780 | or something like that, if that makes sense.
01:18:55.100 | - Yeah.
01:18:56.100 | Does voice mode factor into Genie?
01:18:58.340 | - Maybe one day, that'd be sick, wouldn't it?
01:19:00.220 | - I don't know. - Yeah, I think so.
01:19:02.060 | - You're the first person that we are,
01:19:03.220 | we've been asking this question to everybody.
01:19:04.980 | - Yeah, I think--
01:19:05.820 | - You're the first person to not mention voice mode.
01:19:07.140 | - Oh, well, it's currently so distant from what we do.
01:19:11.540 | But I definitely think, like, this whole talk
01:19:14.900 | of we want it to be a full-on AI software engineering
01:19:16.980 | colleague, like, there is definitely a vector
01:19:19.700 | in some way that you can build that in.
01:19:22.420 | Maybe even during the ideation stage,
01:19:25.100 | talking through a problem with Genie
01:19:26.820 | in terms of how we wanna build something down the line.
01:19:30.180 | I think that might be useful, but honestly,
01:19:32.580 | like, that would be nice to have when we have the time, yeah.
01:19:35.580 | - Yeah, amazing.
01:19:36.420 | One last question, on your, in your talk,
01:19:40.660 | you mentioned a lot about curating your data
01:19:43.060 | and your distribution and all that.
01:19:44.380 | - Yes.
01:19:45.220 | - And before we sat down, you talked a little bit
01:19:46.860 | about having to diversify--
01:19:48.220 | - Absolutely. - Your data set.
01:19:49.620 | - Yeah.
01:19:50.460 | - What's driving that, what are you finding?
01:19:51.980 | - So, we have been rolling people off the wait list
01:19:55.340 | that we sort of amassed when we announced
01:19:57.220 | when I last saw you.
01:19:58.260 | And it's been really interesting because,
01:20:01.460 | as I may have mentioned on the podcast,
01:20:03.180 | like, we had to be very opinionated about the data mix
01:20:05.540 | and the data set that we put together
01:20:06.860 | for like, sort of the V0 of Genie.
01:20:08.700 | Again, like, to your point,
01:20:11.540 | JavaScript, JavaScript, JavaScript, Python, right?
01:20:14.540 | There's a lot of JavaScript in its various forms in there.
01:20:18.180 | But it turns out that when we've shipped it
01:20:20.380 | to the very early alpha users we rolled it out to,
01:20:23.260 | for example, we had some guys using it
01:20:26.500 | with a C# code base.
01:20:28.180 | And C# currently represents, I think,
01:20:29.940 | about 3% of the overall data mix.
01:20:32.180 | And they weren't getting the levels of performance
01:20:35.380 | that they saw when they tried it with a Python code base.
01:20:39.060 | And it was obviously, like,
01:20:41.860 | not great for them to have a bad experience,
01:20:43.180 | but it was nice to be able to correlate it
01:20:45.260 | with the actual, like, objective data mix that we saw.
01:20:48.780 | So, we did, what we've been doing
01:20:51.460 | is, like, little top-up fine-tunes
01:20:52.980 | where we take, like, the general Genie model
01:20:54.420 | and do an incremental fine-tune on top
01:20:56.300 | with just a bit more data for a given,
01:20:58.380 | you know, vertical language.
01:21:00.900 | And we've been seeing improvements coming from that.
01:21:03.020 | So, again, this is one of the great things
01:21:05.980 | about sort of baptism by fire
01:21:07.700 | and letting people use it and giving you feedback
01:21:09.460 | and telling you where it sucks.
01:21:11.340 | Because that is not something
01:21:12.500 | that we could have just known ahead of time.
01:21:14.740 | So, I want that data mix to,
01:21:16.180 | over time as we roll it out to more and more people,
01:21:18.140 | and we are trying to do that as fast as possible,
01:21:20.180 | we're still a team of five for the time being,
01:21:22.420 | to be as general and as representative
01:21:25.860 | of what our users do as possible
01:21:27.620 | and not what we think they need.
01:21:29.460 | - Yeah, so, every customer
01:21:33.660 | is going to have their own little fine-tune.
01:21:36.540 | - There is gonna be the option to,
01:21:38.540 | yeah, there is gonna be the option
01:21:39.500 | to fine-tune the model on your code base.
01:21:41.860 | It won't be in, like, the base pricing tier,
01:21:43.940 | but you will definitely be able to do that.
01:21:46.460 | It will go through all of your code base history,
01:21:48.780 | learn how everything happened,
01:21:50.060 | and then you'll have an incrementally fine-tuned Genie
01:21:52.220 | just on your code base.
01:21:53.220 | And that's what enterprises really love the idea of.
01:21:56.020 | - Lovely. - Yeah.
01:21:56.900 | - Perfect.
01:21:57.740 | Anything else? - Cool.
01:21:58.580 | - Yeah, that's it.
01:21:59.420 | Thank you so much. - Thank you so much, guys.
01:22:00.740 | Good to see you. - Thank you.
01:22:01.700 | - Lastly, this year's Dev Day ended with an extended Q&A
01:22:05.820 | with Sam Altman and Kevin Weil.
01:22:08.140 | We think both the questions asked and answers given
01:22:12.060 | were particularly insightful,
01:22:13.900 | so we are posting what we could snag of the audio here
01:22:16.380 | from publicly available sources,
01:22:18.460 | credited in the show notes, for you to pick through.
01:22:21.820 | If the poorer quality audio here is a problem,
01:22:24.620 | we recommend waiting for approximately one to two months
01:22:28.100 | until the final video is released on YouTube.
01:22:31.420 | In the meantime, we particularly recommend Sam's answers
01:22:34.700 | on the moderation policy,
01:22:36.380 | on the underappreciated importance of agents
01:22:38.980 | and AI employees beyond level three,
01:22:41.820 | and his projections of the intelligence
01:22:43.740 | of O1, O2, and O3 models in future.
01:22:47.700 | (audience cheering)
01:22:53.820 | - All right, I think everybody knows you.
01:22:55.260 | For those who don't know me,
01:22:56.300 | I'm Kevin Weil, Chief Product Officer at OpenAI.
01:22:59.120 | I have the good fortune of getting to turn
01:23:01.340 | the amazing research that our research teams do
01:23:05.640 | into the products that you all use every day
01:23:08.140 | and the APIs that you all build on every day.
01:23:10.340 | I thought we'd start with some audience engagement here.
01:23:13.940 | So on the count of three, I'm gonna count to three,
01:23:17.820 | and I want you all to say,
01:23:19.900 | of all the things that you saw launched here today,
01:23:22.580 | what's the first thing you're gonna integrate?
01:23:24.940 | It's the thing you're most excited to build on, all right?
01:23:27.720 | You gotta do it, all right?
01:23:28.740 | One, two, three.
01:23:31.220 | - Realtime API. - Realtime API.
01:23:33.460 | (audience laughing)
01:23:36.300 | - I'll say it personally,
01:23:37.140 | I'm super excited about our distillation products.
01:23:39.900 | I think that's gonna be really, really interesting.
01:23:42.140 | (audience applauding)
01:23:45.300 | I'm also excited to see what you all do
01:23:48.320 | with advanced voice code with the Realtime API
01:23:50.580 | and with Vision fine-tuning in particular.
01:23:52.500 | So, okay.
01:23:54.460 | So I've got some questions for Sam.
01:23:57.020 | I've got my CEO here in the hot seat.
01:23:59.060 | Let's see if I can't make a career-limiting move.
01:24:01.300 | (audience laughing)
01:24:03.160 | So we'll start with an easy one, Sam.
01:24:06.140 | How close are we to AGI?
01:24:08.680 | (audience laughing)
01:24:11.380 | - You know, we used to, every time we finished a system,
01:24:15.500 | we would say, like, in what way is this not an AGI?
01:24:18.540 | And it used to be, like, very easy.
01:24:21.380 | Like, they got a little robotic hand
01:24:23.340 | that does a Rubik's Cube or a Dota bot,
01:24:24.980 | and it's like, oh, it does some things,
01:24:26.260 | but definitely not an AGI.
01:24:27.500 | It's obviously harder to say now.
01:24:31.580 | So we're trying to, like, stop talking about AGI
01:24:35.220 | as this general thing, and we have this levels framework,
01:24:38.580 | because the word AGI has become so over-limited.
01:24:41.820 | So, like, real quickly, we use one for chatbots,
01:24:44.700 | two for reasoners, three for agents,
01:24:46.540 | four for innovators, five for organizations, like, roughly.
01:24:50.220 | I think we clearly got to level two,
01:24:52.220 | or we clearly got to level two with O1.
01:24:55.100 | And it, you know, can do really quite impressive
01:24:59.980 | cognitive tasks, it's a very smart model.
01:25:02.020 | It doesn't feel AGI-like in a few important ways,
01:25:06.160 | but I think if you just do the one next step
01:25:09.060 | of making it, you know, very agent-like,
01:25:13.380 | which is on level three, and which I think
01:25:14.860 | we will be able to do in the not-distant future,
01:25:16.940 | it will feel surprisingly capable.
01:25:21.420 | Still probably not something that most of you
01:25:23.540 | would call an AGI, though, maybe some of you would,
01:25:26.900 | but it's gonna feel like, all right,
01:25:29.380 | this is like a significant thing.
01:25:31.340 | And then, the leap, and I think we did that pretty quickly,
01:25:34.580 | the leap from that to something that can
01:25:38.460 | really increase the rate of new scientific discovery,
01:25:42.580 | which, for me, is like a very important part
01:25:45.460 | of having an AGI, I feel a little bit
01:25:48.220 | less certain on that, but not a long time.
01:25:50.380 | Like, I think all of this now is gonna happen
01:25:52.060 | pretty quickly, and if you think about
01:25:53.980 | what happened from last decade to this one,
01:25:56.980 | in terms of model capabilities,
01:25:58.900 | and you're like, eh, I mean, if you go look at,
01:26:01.660 | like, if you go from my O1 on a hardware problem
01:26:04.020 | back to, like, Fort Turbo that we launched
01:26:05.820 | 11 months ago, you'll be like, wow,
01:26:07.860 | this has happened pretty fast.
01:26:09.360 | And I think the next year will be very steep progress,
01:26:12.740 | next two years will be very steep progress,
01:26:14.460 | harder than that, hard to see a lot of certainty,
01:26:16.620 | but I would say, like, not very,
01:26:18.780 | and at this point, the definitions really matter.
01:26:20.740 | And the fact that the definitions matter this much
01:26:23.600 | somehow means we're, like, getting pretty close.
01:26:26.260 | - Yeah, and, you know, there used to be
01:26:29.300 | this sense of AGI where it was like,
01:26:31.180 | it was a binary thing, and you were gonna
01:26:34.080 | go to sleep one day, and there was no AGI,
01:26:35.900 | and wake up the next day, and there was AGI.
01:26:37.980 | I don't think that's exactly how we think about it anymore,
01:26:40.860 | but how have your views on this evolved?
01:26:43.220 | - You know, the one, I agree with that,
01:26:46.060 | I think we're, like, you know, in this, like,
01:26:49.580 | kind of period where it's gonna feel
01:26:52.780 | very blurry for a while, and the, you know,
01:26:55.260 | is this AGI yet, or is this not AGI,
01:26:57.220 | or kind of, like, at what point,
01:26:58.820 | it's just gonna be this, like, smooth exponential,
01:27:01.100 | and, you know, probably most people
01:27:03.820 | looking back at history won't agree, like,
01:27:05.420 | when that milestone was hit, and will just realize
01:27:07.340 | it was, like, a silly thing.
01:27:09.020 | Even the Turing test, which I thought always
01:27:11.260 | was, like, this very clear milestone, you know,
01:27:13.860 | there was this, like, fuzzy period,
01:27:15.820 | it kind of, like, went whoosh and bye,
01:27:18.100 | no one cared, but I think the right framework
01:27:23.100 | is just this one exponential.
01:27:25.380 | That said, if we can make an AI system
01:27:29.940 | that is, like, materially better at all of open AI
01:27:33.140 | than doing AI research, that does feel to me
01:27:36.540 | like some sort of important discontinuity.
01:27:39.140 | It's probably still wrong to think about it that way.
01:27:41.100 | It probably still is this smooth exponential curve,
01:27:43.700 | but that feels like a new milestone.
01:27:46.100 | Is open AI still as committed to research
01:27:50.940 | as it was in the early days?
01:27:52.660 | Will research still drive the core of our advancements
01:27:55.380 | in our product development?
01:27:57.580 | Yeah, I mean, I think more than ever.
01:28:01.500 | There was, like, a time in our history
01:28:04.180 | when the right thing to do was just to scale up compute,
01:28:06.860 | and we saw that with Conviction,
01:28:08.340 | and we had a spirit of, like, we'll do whatever works.
01:28:11.060 | You know, like, we want to, we have this mission,
01:28:13.260 | we want to, like, build, say, AGI,
01:28:14.820 | figure out how to share the benefits.
01:28:16.380 | If the answer is, like, rack up GPUs, we'll do that.
01:28:19.780 | And right now, the answer is, again,
01:28:20.980 | really push on research.
01:28:22.460 | And I think you see this with O1.
01:28:24.700 | Like, that is a giant research breakthrough
01:28:27.180 | that we were attacking for many vectors
01:28:29.260 | over a long period of time that came together
01:28:31.300 | in this really powerful way.
01:28:32.700 | We have many more giant research breakthroughs to come,
01:28:37.140 | but the thing that I think is most special
01:28:40.660 | about OpenAI is that we really deeply care
01:28:45.660 | about research, and we understand how to, I think,
01:28:50.220 | it's easy to copy something you know works.
01:28:54.060 | And, you know, I actually don't,
01:28:56.500 | I mean, that is a bad thing.
01:28:57.340 | Like, when people copy OpenAI, I'm like, great,
01:28:58.900 | the world gets more AI, that's wonderful.
01:29:01.260 | But to do something new for the first time,
01:29:03.380 | to, like, really do research in the true sense of it,
01:29:05.860 | which is not like, you know,
01:29:07.100 | let's barely get soda out of this thing,
01:29:08.740 | or, like, let's tweak this.
01:29:10.380 | But, like, let's go find the new paradigm,
01:29:12.300 | and the one after that, and the one after that.
01:29:14.060 | That is what motivates us.
01:29:15.380 | And I think the thing that is special about us as an org,
01:29:18.620 | besides the fact that we, you know,
01:29:20.260 | married product and research
01:29:21.460 | and all this other stuff together,
01:29:23.180 | is that we know how to run that kind of a culture
01:29:26.020 | that can go, that can go push back the frontier.
01:29:30.220 | And that's really hard.
01:29:31.380 | But we love it.
01:29:32.900 | And that's, you know, I think we're gonna have to do that
01:29:35.220 | a few more times, and then we can take you on.
01:29:37.580 | - Yeah, I'll say, like, the litmus test for me,
01:29:39.740 | coming from the outside, from, you know,
01:29:41.940 | sort of normal tech companies,
01:29:43.900 | of how critical research is to open AI,
01:29:47.060 | is that building product in open AI
01:29:48.820 | is fundamentally different than any other place
01:29:51.580 | that I have ever done it before.
01:29:53.460 | You know, normally, you have some sense of your tech stack.
01:29:57.780 | You have some sense of what you have to work with,
01:29:59.940 | and what capabilities computers have.
01:30:03.380 | And then you're trying to build the best product, right?
01:30:06.020 | You're figuring out who your users are,
01:30:07.980 | what problems they have,
01:30:08.900 | and how you can help solve those problems for them.
01:30:12.020 | There is that at open AI.
01:30:13.540 | But also, the state of, like, what computers can do,
01:30:18.540 | just evolves every two months, three months.
01:30:23.140 | And suddenly, computers have a new capability
01:30:25.180 | that they've never had in the history of the world,
01:30:27.580 | and we're trying to figure out how to build a great product
01:30:31.180 | and expose that for developers and our APIs and so on.
01:30:34.260 | And, you know, you can't totally tell what's coming.
01:30:36.780 | It's coming through the mist a little bit at you,
01:30:39.340 | and gradually taking shape.
01:30:41.260 | It's fundamentally different
01:30:42.660 | than any other company I've ever worked at,
01:30:44.460 | and it's, I think--
01:30:45.300 | - Is that the thing that has most surprised you?
01:30:47.660 | - Yes.
01:30:48.740 | Yeah, and it's interesting how,
01:30:51.580 | even internally, we don't always have a sense.
01:30:54.540 | You have, like, okay, I think this capability's coming,
01:30:56.700 | but is it going to be, you know, 90% accurate,
01:31:01.020 | or 99% accurate in the next model?
01:31:04.340 | Because the difference really changes
01:31:05.940 | what kind of product you can build,
01:31:07.740 | and you know that you're gonna get to 99,
01:31:09.700 | you don't quite know when,
01:31:10.660 | and figuring out how you put a roadmap together
01:31:12.980 | in that world is really interesting.
01:31:14.500 | - Yeah, the degree to which we have to just, like,
01:31:16.500 | follow the science and let that determine
01:31:19.020 | what we go work on next and what products we build
01:31:21.980 | and everything else is, I think, hard to get across.
01:31:25.380 | Like, we have guesses about where things are gonna go.
01:31:28.220 | Sometimes we're right, often we're not.
01:31:30.060 | But if something starts working,
01:31:32.100 | or if something doesn't work
01:31:33.460 | that you thought was gonna work,
01:31:35.180 | our willingness to just say,
01:31:36.460 | we're gonna, like, pivot everything
01:31:37.700 | and do what the science allows,
01:31:40.500 | and you don't get to, like,
01:31:41.340 | pick what the science allows, that's surprising.
01:31:43.980 | - I was sitting with an Enterprise customer
01:31:46.500 | a couple weeks ago, and they said,
01:31:49.060 | you know, one of the things we really want,
01:31:50.460 | this is all working great, we love this,
01:31:51.860 | one of the things we really want is a notification
01:31:54.780 | 60 days in advance when you're gonna launch something.
01:31:58.300 | And I was like, I want that, too.
01:32:00.180 | (audience laughing)
01:32:03.500 | All right, so I'm going through,
01:32:04.700 | these are a bunch of questions from the audience,
01:32:06.740 | by the way, and we're gonna try and also
01:32:08.380 | leave some time at the end
01:32:09.220 | for people to ask some audience questions.
01:32:11.140 | So we've got some folks with mics,
01:32:12.620 | and when we get there, they'll be thinking.
01:32:15.580 | But next thing is, so many in the alignment community
01:32:20.580 | are genuinely concerned that open AI
01:32:23.100 | is now only paying lip service to alignment.
01:32:26.220 | Can you reassure us?
01:32:27.780 | - Yeah.
01:32:28.620 | I think it's true we have a different take on alignment
01:32:33.220 | than maybe what people write about
01:32:36.340 | on whatever that internet forum is.
01:32:38.460 | But we really do care a lot about building safe systems.
01:32:44.380 | We have an approach to do it
01:32:46.180 | that has been informed by our experience so far.
01:32:48.980 | And touch on that other question,
01:32:51.580 | which is you don't get to pick where the science goes.
01:32:54.700 | We want to figure out how to make capable models
01:32:58.940 | that get safer and safer over time.
01:33:01.260 | And a couple of years ago,
01:33:04.300 | we didn't think the whole strawberry or the O1 paradigm
01:33:06.620 | was gonna work in the way that it's worked.
01:33:09.100 | And that brought a whole new set of safety challenges,
01:33:12.540 | but also safety opportunities.
01:33:14.460 | And rather than kind of plan to make theoretical,
01:33:19.140 | once superintelligence gets here,
01:33:22.100 | here's the 17 principles,
01:33:24.780 | we have an approach of figure out
01:33:29.220 | where the capabilities are going,
01:33:30.700 | and then work to make that system safe.
01:33:34.620 | And O1 is obviously our most capable model ever,
01:33:39.260 | but it's also our most aligned model ever, by a lot.
01:33:42.300 | And as these models get better intelligence,
01:33:47.300 | better reasoning, whatever you want to call it,
01:33:49.180 | the things that we can do to align them,
01:33:51.700 | the things we can do to build really safe systems
01:33:53.740 | across the entire staff,
01:33:55.260 | our tool set keeps increasing as well.
01:33:59.060 | So, we have to build models that are generally accepted
01:34:04.060 | as safe and robust to be able to put them in the world.
01:34:08.940 | And when we started OpenAI,
01:34:11.660 | what the picture of alignment looked like,
01:34:13.820 | and what we thought the problems that we needed to solve
01:34:16.780 | were going to be, turned out to be nothing
01:34:19.220 | like the problems that actually are in front of us
01:34:21.020 | and that we had to solve now.
01:34:22.500 | And also, when we made the first GPT-3,
01:34:26.780 | if you asked me for the techniques that would have worked
01:34:29.180 | for us to be able to now deploy our current systems
01:34:32.220 | as generally accepted to be safe and robust,
01:34:35.660 | they would not have been the ones that turned out to work.
01:34:39.340 | So, by this idea of iterative deployment,
01:34:43.780 | which I think has been one of our most important
01:34:45.860 | safety stances ever, and sort of confronting reality
01:34:49.580 | as it's in front of us, we've made a lot of progress,
01:34:51.860 | and we expect to make more.
01:34:52.820 | We keep finding new problems to solve,
01:34:54.620 | but we also keep finding new techniques to solve them.
01:34:58.020 | All of that said,
01:35:00.260 | I think worrying about the sci-fi ways
01:35:06.900 | this all goes wrong is also very important.
01:35:08.940 | We have people thinking about that.
01:35:10.260 | It's a little bit less clear, kind of, what to do there,
01:35:13.780 | and sometimes you end up backtracking a lot, but,
01:35:16.220 | but I don't think it's, I also don't think it's fair
01:35:21.260 | to say we're only gonna work on the thing in front of us.
01:35:23.900 | We do have to think about where this is going,
01:35:25.300 | and we do that, too.
01:35:26.300 | And I think if we keep approaching the problem
01:35:30.220 | from both ends like that, most of our thrust on the,
01:35:32.980 | like, okay, here's the next thing, we're gonna deploy this,
01:35:35.860 | what needs to happen to get there, but also, like,
01:35:39.540 | what happens if this curve just keeps going?
01:35:41.140 | That's been an effective strategy for us.
01:35:44.180 | - I'll say also, it's one of the places where I really,
01:35:46.540 | I really like our philosophy of iterative deployment.
01:35:49.220 | When I was at Twitter, back, I don't know,
01:35:52.980 | a hundred years ago now,
01:35:54.140 | Ev said something that stuck with me, which is,
01:35:57.620 | no matter how many smart people you have inside your walls,
01:36:00.700 | there are way more smart people outside your walls.
01:36:04.100 | And so, when we try and get our, you know,
01:36:07.500 | it'd be one thing if we just said we're gonna try
01:36:09.700 | and figure out everything that could possibly go wrong
01:36:11.660 | within our walls, and it would be just us
01:36:13.700 | and the red teamers that we can hire, and so on.
01:36:16.620 | And we do that, we work really hard at that.
01:36:18.540 | But also, launching iteratively and launching carefully,
01:36:22.000 | and learning from the ways that folks like you all use it,
01:36:25.500 | what can go right, what can go wrong,
01:36:26.780 | I think is a big way that we get these things right.
01:36:29.020 | - I also think that as we head into this world
01:36:32.620 | of agents off doing things in the world,
01:36:36.140 | that is gonna become really, really important.
01:36:38.820 | As these systems get more complex
01:36:40.540 | and are acting off longer horizons,
01:36:42.460 | the pressure testing from the whole outside world,
01:36:47.020 | I really, really think will be critical.
01:36:48.940 | - Yeah, so we'll go, actually, we'll go off of that.
01:36:52.880 | Maybe talk to us a bit more about how you see agents
01:36:55.560 | fitting in middle of the night as long-term plans.
01:36:57.640 | - What do you think?
01:36:59.320 | - That's a huge part of the, I mean,
01:37:02.640 | I think the exciting thing is this set of models,
01:37:07.200 | O1 in particular, and all of its successors,
01:37:10.880 | are going to be what makes this possible.
01:37:13.480 | Because you finally have the ability to reason,
01:37:15.600 | to take hard problems, break them into simpler problems,
01:37:18.280 | and act on them, I mean, I think 2025 is gonna be
01:37:21.060 | the year you're just really this big.
01:37:22.920 | - Yeah, I mean, chat interfaces are great,
01:37:28.640 | and they will, I think, have an important place
01:37:30.720 | in the world, but the...
01:37:33.120 | When you can ask a model, when you can ask by ChatGT
01:37:39.360 | or some agent or something, and it's not just like
01:37:41.480 | you get a kind of quick response,
01:37:43.160 | or you think you get 15 seconds of thinking,
01:37:45.320 | and O1 gives you a nice piece of code back, or whatever.
01:37:48.320 | But you can really give something a multi-term interaction
01:37:52.440 | with environments or other people or whatever,
01:37:54.720 | like think for the equivalent of multiple days
01:37:56.920 | of human effort, and a really smart, really capable human,
01:38:01.760 | and have stuff happen, we all say that,
01:38:06.320 | we're all like, oh yeah, we can start the next thing,
01:38:08.160 | this is coming, this is gonna be another thing,
01:38:10.000 | and we just talk about it like, okay.
01:38:12.040 | You know, it's like the next model in evolution.
01:38:14.960 | I would bet, and we don't really know
01:38:16.400 | until we get to use these, that it's...
01:38:18.400 | We'll, of course, get used to it quickly,
01:38:20.880 | people get used to any new technology quickly,
01:38:22.960 | but this will be a very significant change
01:38:25.920 | to the way the world works in a short period of time.
01:38:28.920 | - Yeah, it's amazing, somebody was talking about
01:38:32.040 | getting used to new capabilities in AI models
01:38:34.040 | and how quickly, actually, I think it was about Waymo,
01:38:37.640 | but they were talking about how in the first 10 seconds
01:38:41.240 | of using Waymo, they were like, oh my god,
01:38:43.640 | is this thing, they're like, there's a bug, let's watch out,
01:38:46.000 | and then 10 minutes in, they were like,
01:38:47.840 | oh, this is really cool, and then 20 minutes in,
01:38:50.280 | they were like, checking their phone board.
01:38:52.760 | You know, it's amazing how much your sort of
01:38:54.640 | internal firmware updates for this new stuff very quickly.
01:38:59.400 | - Yeah, like I think that people will
01:39:02.720 | ask an agent to do something for them
01:39:06.200 | that would have taken them a month,
01:39:08.040 | and they'll finish in an hour, and it'll be great,
01:39:10.160 | and then they'll have like 10 of those at the same time,
01:39:12.720 | and then they'll have like 1,000 of those at the same time,
01:39:15.600 | and by 2030 or whatever, we'll look back and be like,
01:39:19.040 | yeah, this is just like what a human is supposed
01:39:20.960 | to be capable of, what a human used to like,
01:39:22.800 | you know, grind at for years or whatever,
01:39:25.560 | or many humans used to grind at for years,
01:39:27.160 | like I just now like ask a computer to do it,
01:39:29.600 | and it's like done in an hour.
01:39:31.360 | That's, why is it not a minute?
01:39:34.120 | (audience laughing)
01:39:36.920 | - Yeah, it's also, it's one of the things that makes
01:39:39.480 | having an amazing development platform great, too,
01:39:43.480 | because, you know, we'll experiment,
01:39:45.600 | and we'll build some agentic things, of course,
01:39:47.680 | and like we've already got, I think just like,
01:39:50.640 | we're just pushing the boundaries of what's possible today.
01:39:53.720 | You've got groups like Cognition doing amazing things
01:39:56.080 | in coding, like Harvey in case text,
01:39:59.280 | you got Speak doing cool things with language translation,
01:40:02.560 | like we're beginning to see this stuff work,
01:40:05.240 | and I think it's really gonna start working
01:40:07.720 | as we continue to iterate these models.
01:40:10.000 | - One of the very fun things for us
01:40:11.880 | about having this development platform
01:40:13.960 | is just getting to like watch the unbelievable speed
01:40:16.720 | and creativity of people that are building these experiences,
01:40:19.600 | like developers, very near and dear to our heart.
01:40:23.520 | It's kind of like the first thing we launched,
01:40:26.640 | and just many of us came building on the platforms,
01:40:29.800 | but the, so much of the capability of these models
01:40:33.600 | and great experiences have been built
01:40:35.680 | by people building on the platform.
01:40:37.880 | We'll continue to try to offer like great
01:40:40.360 | first-party products, but we know that we'll only ever be
01:40:44.040 | like a small, narrow slice of the apps or agents
01:40:47.440 | or whatever people build in the world,
01:40:48.960 | and seeing what has happened in the world
01:40:52.800 | in the last 18, 24 months,
01:40:55.980 | it's been like quite amazing to watch.
01:40:58.120 | I'm gonna keep going on the agent front here.
01:41:02.600 | What do you see as the current hurdles
01:41:04.400 | for computer-controlling agents?
01:41:06.520 | (typing)
01:41:08.600 | Safety and alignment.
01:41:11.520 | Like, if you are really going to give an agent
01:41:14.920 | the ability to start clicking around your computer,
01:41:18.220 | which you will, you are going to have a very high bar
01:41:22.760 | for the robustness and the reliability
01:41:25.800 | and the alignment of that system.
01:41:28.360 | So, technically speaking, I think that, you know,
01:41:30.840 | we're getting like pretty close to the capability side,
01:41:34.000 | but this sort of agent safety and trust framework,
01:41:37.680 | that's gonna, I think, be the long haul.
01:41:39.960 | - And now I'll kind of ask you a question
01:41:43.400 | that's almost the opposite of one of the questions
01:41:44.800 | from earlier.
01:41:45.800 | Do you think safety could act as a false positive
01:41:48.000 | and actually limit public access to critical tools
01:41:51.160 | that would enable a more egalitarian world?
01:41:53.560 | - The honest answer is yes, that will happen sometimes.
01:42:00.040 | Like, we'll try to get the balance right,
01:42:03.800 | but if we were fully, or didn't care about,
01:42:06.520 | like, safety and alignment at all,
01:42:08.800 | could we have launched O1 faster?
01:42:11.040 | Yeah, we could have done that.
01:42:12.640 | It would have come at a cost.
01:42:16.280 | It would have been things that would have gone really wrong.
01:42:18.040 | I'm very proud that we didn't.
01:42:19.540 | The cost, you know, I think,
01:42:23.400 | would have been manageable with O1,
01:42:24.840 | but by the time of O3 or whatever,
01:42:26.800 | like, maybe it would be pretty unacceptable.
01:42:29.780 | And so, starting on the conservative side,
01:42:34.380 | like, you know, I don't think people are complaining,
01:42:36.020 | like, oh, voice mode, like, it won't say this offensive thing
01:42:39.300 | and I really want it to, and, you know,
01:42:41.340 | formal company and let it offend me.
01:42:43.620 | You know what, I actually mostly agree.
01:42:45.460 | If you are trying to get O1 to say something offensive,
01:42:48.700 | it should follow the instructions of its user
01:42:51.180 | most of the time.
01:42:52.220 | There's plenty of cases where it shouldn't.
01:42:54.220 | But we have, like, a long history
01:42:56.540 | of when we put a new technology into the world,
01:42:58.780 | we start on the conservative side.
01:43:01.380 | We try to give society time to adapt.
01:43:03.780 | We try to understand where the real harms are
01:43:05.620 | versus sort of, like, kind of more theoretical ones.
01:43:08.760 | And that's, like, part of our approach to safety.
01:43:13.140 | And not everyone likes it all the time.
01:43:15.220 | I don't even like it all the time.
01:43:16.860 | But if we're right that these systems are,
01:43:20.900 | and we're gonna get it wrong too,
01:43:22.420 | like, sometimes we won't be conservative enough
01:43:24.060 | in some area.
01:43:24.900 | But if we're right that these systems are going to get
01:43:28.140 | as powerful as we think they are,
01:43:29.300 | as quickly as we think they might,
01:43:31.580 | then I think starting that way makes sense.
01:43:34.380 | And, you know, we, like, relax over time.
01:43:36.740 | - Totally agree.
01:43:38.740 | What's the next big challenge for a startup
01:43:42.060 | that's using AI as a core feature?
01:43:44.060 | I'll say, I've got one, which is,
01:43:52.420 | I think one of the challenges, and we face this too,
01:43:54.580 | 'cause we're also building products on top of our own models
01:43:57.700 | is trying to find the, kind of the frontier.
01:44:02.140 | You wanna be building,
01:44:03.460 | these AI models are evolving so rapidly,
01:44:05.660 | and if you're building for something
01:44:08.580 | that the AI model does well today,
01:44:11.380 | it'll work well today, but it's gonna feel old tomorrow.
01:44:15.940 | And so you wanna build for things
01:44:18.180 | that the AI model can just barely not do.
01:44:21.580 | You know, where maybe the early adopters will go for it
01:44:24.380 | and other people won't quite,
01:44:25.420 | but that just means that when the next model comes out,
01:44:27.540 | as we continue to make improvements,
01:44:29.460 | that use case that just barely didn't work,
01:44:31.500 | you're gonna be the first to do it,
01:44:33.700 | and it's gonna be amazing.
01:44:35.180 | But figuring out that boundary is really hard.
01:44:38.420 | I think it's where the best products
01:44:39.860 | are gonna get built up.
01:44:41.020 | - Totally agree with that.
01:44:44.980 | The other thing I wanna add is,
01:44:46.460 | I think it's, like, very tempting
01:44:48.420 | to think that a technology makes a startup,
01:44:52.460 | and that is almost never true.
01:44:56.020 | No matter how cool a new technology
01:44:57.660 | or a new sort of, like, tech title it is,
01:45:00.260 | it doesn't excuse you from having to do
01:45:02.740 | all the hard work of building a great company
01:45:05.300 | that is going to have durability
01:45:07.940 | or, like, accumulated advantage over time.
01:45:10.620 | And we hear from a lot of startups,
01:45:13.820 | and I see this as, like, a very common thing,
01:45:16.860 | which is, like, I can do this incredible thing,
01:45:18.940 | I can make this incredible service,
01:45:20.700 | and that seems like a complete answer,
01:45:24.500 | but it doesn't excuse you from any of, like,
01:45:26.140 | the normal laws of business.
01:45:27.740 | You still have to, like, build a good business
01:45:29.100 | and a good strategic position.
01:45:30.940 | And I think a mistake is that,
01:45:33.420 | in the unbelievable excitement and updraft of AI,
01:45:36.940 | people are very tempted to forget that.
01:45:38.900 | - This is an interesting one.
01:45:43.260 | The mode of voice is like tapping directly
01:45:45.860 | into the human API.
01:45:47.820 | How do you ensure ethical use of such a powerful tool
01:45:50.820 | with obvious abilities and manipulation?
01:45:54.020 | - Yeah, you know, voice mode was a really interesting one
01:45:59.020 | for me.
01:45:59.860 | It was, like, the first time that I felt like
01:46:01.220 | I sort of got really tricked by an AI
01:46:04.940 | in that, when I was playing with the first beta of it,
01:46:07.740 | I couldn't, like, I couldn't stop myself.
01:46:12.180 | I mean, I kind of, like, I still say, like,
01:46:15.180 | please switch out GBT.
01:46:16.540 | But in voice mode, I, like, couldn't not, kind of,
01:46:21.140 | use the normal niceties.
01:46:22.500 | I was, like, so convinced, like,
01:46:23.940 | ah, it might be a real perk, like, you know?
01:46:26.940 | And obviously, it's just, like,
01:46:29.100 | hacking some circuit in my brain,
01:46:31.020 | but I really felt it with voice mode.
01:46:32.900 | And I sort of still do.
01:46:35.780 | I think this is a more,
01:46:41.340 | this is an example of, like, a more general thing
01:46:43.100 | that we're gonna start facing,
01:46:44.540 | which is, as these systems become more and more capable,
01:46:48.980 | and as we try to make them as natural as possible
01:46:51.740 | to interact with,
01:46:52.780 | they're gonna, like, hit parts of our neural circuitry
01:46:58.380 | that have, like, evolved to deal with other people.
01:47:00.940 | And, you know, there's, like, a bunch of clear lines
01:47:04.940 | about things we don't wanna do.
01:47:06.220 | Like, we don't.
01:47:07.900 | Like, there's a whole bunch of, like,
01:47:09.220 | weird personality growth hacking, like,
01:47:12.500 | I think vaguely socially manipulative stuff we could do.
01:47:15.540 | But then there's these, like, other things
01:47:17.500 | that are just not nearly as clear-cut.
01:47:20.380 | Like, you want the voice mode to feel
01:47:22.020 | as natural as possible,
01:47:23.460 | but then you get across the uncanny valley,
01:47:25.500 | and it, like, at least in me, triggers something.
01:47:28.180 | And, you know, me saying, like,
01:47:31.300 | please and thank you to Judge18,
01:47:32.580 | no problem, probably a good thing to do.
01:47:33.900 | You never know.
01:47:34.940 | (audience laughs)
01:47:37.780 | But I think this, like, really points at
01:47:42.620 | the kinds of safety and alignment issues
01:47:44.300 | we have to start in all of this.
01:47:46.100 | All right, back to brass tacks.
01:47:49.980 | Sam, when's O1 gonna support Function Tools?
01:47:52.540 | - Do you know?
01:47:54.300 | - Before the end of the year.
01:47:56.820 | There are three things that we really wanna get in for--
01:47:59.580 | (audience applauds)
01:48:02.580 | We're gonna record this,
01:48:05.180 | take this back to the research team,
01:48:06.540 | show them how bad we need to do this.
01:48:09.660 | I mean, there are a handful of things
01:48:11.780 | that we really wanted to get into O1,
01:48:13.340 | and we also, you know, it's a balance of,
01:48:15.740 | should we get this out to the world earlier
01:48:17.260 | and begin, you know, learning from it,
01:48:18.980 | learning from how you all use it,
01:48:20.220 | or should we launch a fully complete thing
01:48:23.420 | that is, you know, in line with it,
01:48:25.780 | that has all the abilities
01:48:26.700 | that every other model that we've launched has?
01:48:29.140 | I'm really excited to see things like system prompts
01:48:31.780 | and structured outputs and function calling
01:48:33.780 | make it into O1.
01:48:35.100 | We will be there by the end of the year.
01:48:36.580 | It really matters to us, too.
01:48:38.780 | (audience applauds)
01:48:41.780 | - In addition to that,
01:48:43.900 | just 'cause I can't resist this opportunity
01:48:45.380 | to reinforce this, like,
01:48:46.780 | we will get all of those things in
01:48:48.580 | and a whole bunch more things you'll have asked for.
01:48:51.020 | The model is gonna get so much better so fast.
01:48:56.300 | Like, we are so early.
01:48:59.060 | This is like, you know,
01:49:00.100 | maybe it's the GPT-2 scale moment,
01:49:02.380 | but like, we know how to get to GPT-4,
01:49:04.620 | and we have the fundamental stuff in place
01:49:06.060 | now to get to GPT-4.
01:49:07.900 | And in addition to planning for us
01:49:11.860 | to build all of those things,
01:49:13.780 | plan for the model to just get, like,
01:49:15.500 | radically smarter, like, you know,
01:49:17.980 | hope you all come back next year
01:49:19.340 | and plan for it to feel like way more
01:49:21.340 | of a year of improvement than from 4.0 to O1.
01:49:24.260 | (audience applauds)
01:49:28.940 | - What feature or capability of a competitor
01:49:36.300 | do you really admire?
01:49:37.380 | - I think Google's notebook thing is super cool.
01:49:41.740 | - Yeah.
01:49:42.580 | - What are they called?
01:49:44.420 | - Notebook 11.
01:49:45.260 | - Notebook 11, yeah.
01:49:46.260 | Uh, I was like, I woke up early this morning
01:49:49.740 | and I was like, looking at examples on Twitter
01:49:51.260 | and I was just like, this is like,
01:49:53.420 | this is just cool.
01:49:54.460 | Like, this is just a good, cool thing.
01:49:57.060 | And like, I think not enough of,
01:50:01.300 | not enough of the world is like,
01:50:02.940 | shipping new and different things.
01:50:04.540 | It's mostly like, the same stuff.
01:50:06.500 | But that, I think, is like,
01:50:08.540 | that brought me a lot of joy this morning.
01:50:10.180 | - Yeah.
01:50:11.020 | - It was very, very well done.
01:50:12.020 | - One of the things I really appreciate
01:50:13.300 | about that product is the,
01:50:15.300 | there's the, just the format itself
01:50:17.300 | is really interesting,
01:50:18.420 | but they also nailed the podcast-style voices.
01:50:21.420 | - Yes.
01:50:22.260 | - They have really nice microphones.
01:50:23.700 | (audience laughs)
01:50:25.300 | They have these sort of sonoran voices.
01:50:28.020 | Did you guys see, somebody on Twitter was saying,
01:50:31.700 | like, the cool thing to do is take your LinkedIn
01:50:35.180 | and put a, you know, give me a hit,
01:50:37.460 | and give it to these, give it to Notebook LM,
01:50:40.060 | and you'll have two podcasters riffing back and forth
01:50:42.500 | about how amazing you are.
01:50:43.820 | (audience laughs)
01:50:44.980 | All of your accomplishments over the years.
01:50:47.180 | I'll say mine is, I think Anthropic
01:50:52.700 | did a really good job on projects.
01:50:56.260 | It's kind of a different take on what we did with GPTs.
01:50:59.980 | And GPTs are a little bit more long-lived.
01:51:02.100 | It's something you build and can use over and over again.
01:51:04.300 | Projects are kind of the same idea,
01:51:06.260 | but like, more temporary, meant to be kind of stood up,
01:51:10.140 | used for a while, and then you can move on.
01:51:12.700 | And that different mental model makes a difference.
01:51:16.860 | I think they did a really nice job with that.
01:51:19.100 | All right, we're getting close to audience questions,
01:51:23.940 | so be thinking of what you want to ask.
01:51:25.900 | So at OpenAI, how do you balance
01:51:30.180 | what you think users may need
01:51:32.460 | versus what they actually need today?
01:51:35.100 | - Also, I've got a question for you.
01:51:36.900 | - Yeah, well, I think it does get back
01:51:38.500 | to a bit of what we were saying around trying to build
01:51:41.260 | for what the model can just like,
01:51:43.260 | not quite do, but almost do.
01:51:46.020 | But it's a real balance, too, as we, you know,
01:51:51.300 | we support over 200 million people every week on ChatGPT.
01:51:56.300 | You also can't say, no, it's cool,
01:51:58.980 | like, deal with this bug for three months or this issue.
01:52:01.900 | We've got something really cool coming.
01:52:04.180 | You've got us all for the needs of today.
01:52:06.980 | And there are some really interesting product problems.
01:52:09.020 | I mean, you think about, I'm speaking to a group of people
01:52:12.020 | who know AI really well.
01:52:13.620 | Think of all the people in the world
01:52:14.780 | who have never used any of these products,
01:52:17.220 | and that is the vast majority of the world still.
01:52:19.680 | You're basically giving them a text interface,
01:52:25.340 | and on the other side of the text interface
01:52:27.100 | is this like alien intelligence that's constantly evolving
01:52:30.980 | that they've never seen or interacted with,
01:52:33.300 | and you're trying to teach them all the crazy things
01:52:35.860 | that you can actually do, and all the ways it can help
01:52:37.820 | and integrate into your life and solve problems for you.
01:52:40.620 | But people don't know what to do with it.
01:52:43.460 | You know, like, you come in, and you're just like,
01:52:45.140 | people type, like, hi, and it responds, you know,
01:52:49.500 | hey, great to see, like, how can I help you today?
01:52:52.300 | And then you're like, okay, I don't know what to say,
01:52:55.540 | and then you end up, you kind of walk away,
01:52:57.020 | and you're like, well, I didn't see the magic in that.
01:52:59.340 | And so it's a real challenge figuring out how you,
01:53:02.500 | I mean, we all have a hundred different ways
01:53:04.780 | that we use chat GPT and AI tools in general,
01:53:07.940 | but teaching people what those can be,
01:53:10.300 | and then bringing them along as the model changes
01:53:13.300 | month by month by month, and suddenly gains
01:53:15.380 | these capabilities way faster than we as humans
01:53:18.020 | gain the capabilities, it's a really interesting
01:53:20.580 | set of problems, and I know it's one that you all solve
01:53:23.380 | in different ways as well.
01:53:25.220 | - I have a question.
01:53:28.400 | Who feels like, they spend a lot of time with O1,
01:53:31.140 | and they would say, like, I feel definitively
01:53:32.860 | smarter than that thing.
01:53:34.860 | (audience laughing)
01:53:38.620 | Do you think you still go by O2?
01:53:40.360 | (audience laughing)
01:53:43.900 | No one taking the bet of, like, being smarter than O2?
01:53:47.740 | So, one of the challenges that we face is, like,
01:53:50.620 | we know how to go do this thing that we think will be,
01:53:53.660 | like, it is probably smarter than all of us,
01:53:58.220 | like, a broad array of tasks, and yet you have to, like,
01:54:02.020 | still like fixed bugs, and do the, hey, how are you problem,
01:54:06.220 | and mostly what we believe in is that if we keep pushing
01:54:09.020 | on model intelligence, people will do incredible things
01:54:12.780 | with that, you know, we want to build the smartest,
01:54:16.060 | most helpful models in the world, and people then find
01:54:19.060 | all sorts of ways to use that, and build on top of that.
01:54:22.260 | It has been definitely an evolution for us,
01:54:28.940 | to not just be entirely research-focused,
01:54:31.300 | and then we do have to fix all those bugs,
01:54:32.820 | and make this super usable, and I think we've gotten better
01:54:36.620 | at balancing that, but still, as part of our culture,
01:54:40.100 | I think, we trust that if we can keep pushing
01:54:42.780 | on intelligence, so it's all four of you around down here,
01:54:47.780 | it'll, people will build just incredible things
01:54:51.900 | with that capability.
01:54:52.900 | - Yeah, I think it's a core part of the philosophy,
01:54:56.100 | and you do a good job pushing us to always,
01:54:58.500 | we'll basically incorporate the frontier of intelligence
01:55:02.060 | into our products, both in the APIs,
01:55:03.900 | and into our first-party products,
01:55:06.940 | because it's easy to kind of stick to the thing you know,
01:55:09.500 | the thing that works well, but you're always pushing us
01:55:12.820 | to like, get the frontier in, even if it only kind of works,
01:55:16.820 | because it's gonna work really well soon,
01:55:19.580 | so I always find that a really helpful push.
01:55:22.540 | You kind of answered the next one.
01:55:25.060 | You do say please and thank you to the models,
01:55:26.740 | I'm curious, how many people say please and thank you?
01:55:30.020 | - Isn't that so interesting? - I do, too.
01:55:32.140 | (laughing)
01:55:33.860 | I kind of can't, I mean, I'd feel bad if I don't.
01:55:36.340 | Okay, last question, and then we'll go
01:55:42.660 | into audience questions for the last 10 or so minutes.
01:55:45.780 | Do you plan to build models specifically made
01:55:48.020 | for agentic use cases, things that are better
01:55:50.380 | at reasoning and tool calling?
01:55:55.780 | - We plan to make models that are great
01:55:57.300 | at agentic use cases, that'll be a key priority
01:55:59.540 | for us over the coming months.
01:56:01.420 | Specifically is a hard thing to ask for,
01:56:03.500 | 'cause I think it's also just how
01:56:04.580 | we keep making smarter models.
01:56:06.340 | So yes, there's some things like tool use,
01:56:09.220 | function calling that we need to build that'll help,
01:56:11.460 | but mostly we just wanna make the best reasoning models
01:56:13.780 | in the world, those will also be
01:56:14.980 | the best agentic-based models in the world.
01:56:17.140 | Cool, let's go to audience questions.
01:56:20.420 | I don't know who's got the mic.
01:56:23.340 | All right, we got a mic.
01:56:24.980 | (audience laughs)
01:56:27.820 | - How extensively do you dog food
01:56:34.420 | your own technology in your company?
01:56:37.500 | Do you have any interesting examples
01:56:39.740 | that may not be obvious?
01:56:41.340 | - Yeah, I mean, we put models up for agent-only use
01:56:46.580 | even before they're done training.
01:56:47.900 | We use checkpoints and try to have people use them
01:56:51.180 | for whatever they can, and try to build new ways
01:56:54.260 | to explore the capability of the model internally
01:56:57.500 | and use them for our own development or research
01:56:59.620 | or whatever else as much as we can.
01:57:01.260 | We're still always surprised by the creativity
01:57:03.060 | of the outside world and what people do,
01:57:05.580 | but basically the way we've figured out
01:57:09.900 | every step along our way of what to push on next,
01:57:14.900 | what we can productize, what the models are really good at
01:57:19.540 | is by internal dog food.
01:57:21.460 | That's like our whole,
01:57:23.260 | that's how we like to go our way through this.
01:57:25.560 | We don't yet have employees that are based off of O1,
01:57:31.780 | but as we move into the world of agents,
01:57:35.740 | we will try that.
01:57:37.500 | We will try having things that we deploy
01:57:40.300 | in our internal systems that help you with stuff.
01:57:42.220 | There are things that get closer to that.
01:57:44.500 | Customer service, we have bots internally
01:57:47.660 | that do a ton about answering external questions
01:57:50.220 | and fielding internal people's questions on Slack
01:57:53.260 | and so on, and our customer service team
01:57:56.540 | is probably, I don't know, 20% the size
01:57:59.460 | it might otherwise need to be because of it.
01:58:02.060 | I know Matt Knight and our security team
01:58:04.660 | has talked extensively about all the different ways
01:58:07.020 | we use models internally to automate
01:58:10.300 | a bunch of security things and take what used to be
01:58:13.620 | a manual process where you might not have
01:58:16.100 | the number of humans to even look at everything incoming
01:58:19.140 | on models, taking, you know, separating signal from noise
01:58:23.100 | and highlighting to humans what they need
01:58:24.580 | to go look at, things like that.
01:58:26.260 | So I think internally there are tons of examples
01:58:28.660 | and people maybe underestimate the,
01:58:31.420 | you all probably will not be surprised by this,
01:58:33.540 | but a lot of folks that I talk to are.
01:58:35.660 | The extent to which it's not just using a model
01:58:38.300 | in a place, it's actually about using chains of models
01:58:42.540 | that are good at doing different things
01:58:44.140 | and connecting them all together
01:58:45.980 | to get one end-to-end process that is very good
01:58:48.260 | at the thing you're doing, even if the individual models
01:58:51.220 | have, you know, flaws and make mistakes.
01:58:53.940 | - Thank you.
01:59:01.220 | I'm wondering if you guys have any plans
01:59:03.420 | on sharing models for, like, offline usage
01:59:07.300 | because with this distillation thing,
01:59:09.180 | it's really cool that we can share our own models,
01:59:11.580 | but a lot of use cases, you really wanna
01:59:13.740 | kind of, like, have a version of it.
01:59:16.940 | - Okay, Ben?
01:59:17.940 | - We're open to it, it's not like a high priority
01:59:23.100 | on the current roadmap.
01:59:24.420 | If we had, like, more resources and bandwidth,
01:59:29.620 | we would go do that, I think there's a lot of reasons
01:59:33.980 | you want a local model, but it's not like,
01:59:37.620 | it's not like a this-year kind of thing.
01:59:39.620 | - Question.
01:59:43.740 | - Hi, my question is, there are many agencies
01:59:48.060 | in the government, above the local, state,
01:59:50.460 | and national level that could really greatly benefit
01:59:53.260 | from the tools that you guys are developing,
01:59:55.700 | but have perhaps some hesitancy on deploying them
01:59:58.900 | because of security concerns, data concerns,
02:00:01.140 | privacy concerns, and I guess I'm curious to know
02:00:05.260 | if there are any sort of planned partnerships
02:00:08.300 | with governments, world governments,
02:00:10.100 | once whatever ADI is achieved,
02:00:13.140 | because obviously if AGI can help solve problems
02:00:15.860 | like world hunger, poverty, climate change,
02:00:19.260 | government's gonna have to get involved with that, right?
02:00:22.380 | And I'm just curious to know if there is some,
02:00:25.780 | you know, plan that works when that time comes.
02:00:29.300 | - Yeah, I think, I actually think you don't wanna wait
02:00:31.580 | until AGI, you wanna start now, right?
02:00:33.300 | Because there's a learning process,
02:00:34.820 | and there's a lot of good that we can do
02:00:36.100 | with our current models, so we've even announced
02:00:38.180 | a handful of partnerships with government agencies,
02:00:40.220 | some states, I think Minnesota, some others, Pennsylvania,
02:00:43.820 | also with organizations like USAID.
02:00:45.940 | It's actually a huge priority of ours
02:00:48.980 | to be able to help governments around the world
02:00:52.900 | get acclimated, get benefit from the technology,
02:00:55.900 | and of all places, government feels like somewhere
02:00:57.700 | where you can automate a bunch of workflows
02:00:59.940 | and make things more efficient, reduce drudgery, and so on.
02:01:03.260 | So I think there's a huge amount of good we can do now,
02:01:05.820 | and if we do that now, it just accrues over the long run
02:01:09.020 | as the models get better and we get closer to AGI.
02:01:11.540 | - I've got a pretty open-ended question.
02:01:22.700 | What are your thoughts on open source?
02:01:25.060 | So whether that's open weights, just general discussion,
02:01:28.940 | where do you guys sit with open source?
02:01:31.260 | - I think open source is awesome.
02:01:34.860 | I think, again, if we had more bandwidth,
02:01:36.180 | we would do that too.
02:01:37.660 | We've gotten very close to making
02:01:39.900 | a big open source effort a few times,
02:01:42.100 | and then the really hard part is prioritization,
02:01:45.460 | and we have put other things ahead of it.
02:01:47.860 | Part of it is there's such good open source models
02:01:52.660 | in the world now that I think that segment,
02:01:56.660 | the thing we always end in most high is a really great
02:01:59.100 | on-device model, and I think that segment
02:02:01.340 | is fairly well-served.
02:02:04.780 | I do hope we do something at some point,
02:02:07.620 | but we want to find something that we feel like
02:02:09.540 | if we don't do it, then we'll just be in the same space
02:02:11.380 | and not make another thing that's a tiny bit better
02:02:13.820 | on benchmarks, because we think there's a lot
02:02:17.460 | of good stuff out there now, but spiritually,
02:02:20.460 | philosophically, very glad it exists.
02:02:22.660 | Would like to figure out how to contribute.
02:02:25.780 | - Hi Sam, hi Kevin.
02:02:28.780 | Thanks for inviting us to Dev Day.
02:02:30.380 | It's been awesome.
02:02:31.220 | All the live demos work, it's been incredible.
02:02:34.020 | Why can't Advanced Voice Mod sing?
02:02:36.980 | And as a follow-up to this, if it's a company,
02:02:39.420 | like a legal issue in terms of copyright, et cetera,
02:02:41.940 | is there a daylight between how you think about safety
02:02:44.180 | in terms of your own products, on your own hardware,
02:02:46.420 | versus giving us developers kind of the,
02:02:49.220 | I don't know, sign the right things off
02:02:50.900 | so we can make our Advanced Voice Mod sing?
02:02:54.780 | Could you answer this?
02:02:55.740 | (audience laughing)
02:02:58.580 | - You know, the funny thing is Sam asked the same question.
02:03:02.100 | Why can't this thing sing?
02:03:03.140 | I want it to sing, I've seen it sing before.
02:03:05.380 | Actually, it's a, there are things obviously
02:03:09.140 | that we can't have it sing, right?
02:03:11.020 | We can't have it sing copyrighted songs,
02:03:13.380 | we don't have the licenses, et cetera.
02:03:15.340 | And then there are things that it can't sing,
02:03:17.740 | and you could have it sing Happy Birthday,
02:03:19.100 | and that would be just fine, right?
02:03:21.020 | And we want that too.
02:03:22.900 | It's a matter of, I think once you,
02:03:25.540 | basically it's easier in finite time to say no,
02:03:29.020 | and then build it in, but it's nuanced to get it right,
02:03:31.260 | and we, you know, there are penalties
02:03:33.220 | to getting these kinds of things wrong.
02:03:34.860 | So it's really just where we are now.
02:03:37.420 | We really want the models to sing too.
02:03:39.420 | (audience applauding)
02:03:42.420 | People were tired of waiting for us to ship Voice Mod,
02:03:44.820 | which is like, very fair.
02:03:46.660 | We could have waited longer and kind of really got
02:03:48.980 | the classifications and filters on copyrighted music
02:03:53.140 | versus not, but we decided we would just ship it
02:03:54.740 | and we'll have more.
02:03:56.220 | But I think Sam has asked me like four or five times
02:03:58.260 | why it doesn't have a feature.
02:03:59.940 | (audience laughing)
02:04:02.940 | I mean, we still can't like offer something
02:04:07.100 | where we're gonna be in like pretty bad,
02:04:09.500 | you know, hot water developers or first party or whatever.
02:04:12.940 | So yes, we can like maybe have some differences,
02:04:15.340 | but we still have to be compliant with the law.
02:04:17.740 | - Could you speak a little to the future
02:04:22.620 | of where you see context windows going
02:04:24.860 | and kind of the timeline for when,
02:04:27.500 | how you see things balance between context window growth
02:04:29.980 | and RAG, basically, information retrieval?
02:04:32.660 | - I think there's like two different takes
02:04:36.940 | on that that matter.
02:04:37.980 | One is like, when is it gonna get to like,
02:04:39.940 | kind of normal long context,
02:04:41.940 | like context length, 10 million or whatever,
02:04:43.740 | like long enough that you just throw stuff in there
02:04:45.420 | and it's fast enough you're happy about it.
02:04:47.460 | And I expect everybody's gonna make
02:04:49.420 | pretty fast progress there, and that'll just be a thing.
02:04:52.220 | Long context has gotten weirdly less usage
02:04:55.460 | than I would have expected so far.
02:04:57.420 | But I think, you know, there's a bunch of reasons for that.
02:05:00.460 | I don't wanna go too much into it.
02:05:01.860 | And then there's this other question of like,
02:05:03.780 | when do we get to context length,
02:05:05.180 | not like 10 million, but 10 trillion?
02:05:07.620 | When do we get to the point where you throw like
02:05:09.220 | every piece of data you've ever seen
02:05:10.900 | in your entire life in there?
02:05:12.780 | And you know, like that's a whole different set of things.
02:05:17.620 | That obviously takes some research breakthroughs.
02:05:19.820 | But I assume that infinite context will happen
02:05:22.620 | at some point, and some point it's like,
02:05:25.140 | less than a decade.
02:05:26.900 | And that's gonna be just a totally different way
02:05:28.980 | that we use these models.
02:05:30.380 | Even getting to the like, 10 million tokens
02:05:33.060 | of very fast and accurate context,
02:05:35.500 | which I expect measured in like, months,
02:05:39.060 | something like that.
02:05:40.060 | You know, like, people will use that in all sorts of ways,
02:05:46.020 | and it'll be great.
02:05:48.620 | But yeah, the very, very long context I think is
02:05:52.500 | gonna happen, and it's really interesting.
02:05:54.620 | I think we maybe have time for one or two more.
02:05:58.940 | - Don't worry, this is,
02:06:01.060 | you can call me your favorite question.
02:06:03.020 | So, with voice, and all the other changes
02:06:07.540 | that users have experienced,
02:06:09.460 | since you all have launched your technology,
02:06:12.820 | what do you see is the vision for the new engagement layer,
02:06:17.180 | the form factor, and how we actually engage
02:06:20.340 | with this technology to make our lives so much better?
02:06:23.420 | - I love that question.
02:06:26.100 | It's one that we ask ourselves a lot, frankly.
02:06:28.660 | There's this, and I think it's one where developers
02:06:33.020 | can play a really big part here,
02:06:35.020 | because there's this trade-off between generality
02:06:37.180 | and specificity here.
02:06:39.300 | I'll give you an example.
02:06:40.260 | I was in Seoul and Tokyo a few weeks ago,
02:06:44.300 | and I was in a number of conversations with folks
02:06:46.660 | that, with whom I didn't have a common language,
02:06:48.860 | and we didn't have a translator around.
02:06:50.820 | Before, we would not have been able to have a conversation.
02:06:54.540 | We would have just sort of smiled at each other
02:06:56.100 | and continued on.
02:06:57.900 | I took out my phone.
02:06:59.360 | I said, "Junji P.T., I want you to be a translator for me.
02:07:03.020 | "When I speak in English, I want you to speak in Korean.
02:07:05.120 | "You hear Korean, I want you to repeat it in English."
02:07:07.740 | And I was able to have a full business conversation,
02:07:10.980 | and it was amazing.
02:07:11.820 | You think about the impact that could have,
02:07:14.060 | not just for business, but think about travel and tourism
02:07:16.500 | and people's willingness to go places
02:07:18.020 | where they might not have a word of the language.
02:07:20.820 | You can have these really amazing impacts.
02:07:23.360 | But inside ChatGBT, that was still a thing that I had to,
02:07:28.020 | like, ChatGBT's not optimized for that, right?
02:07:30.820 | Like, you want this sort of digital,
02:07:32.940 | you know, universal translator in your pocket
02:07:34.780 | that just knows that what you want to do is translate.
02:07:37.860 | Not that hard to build, but I think there's,
02:07:41.300 | we struggle with trying to build an application
02:07:45.020 | that can do lots of things for lots of people,
02:07:48.440 | and it keeps up, like we've been talking about a few times,
02:07:51.300 | it keeps up with the pace of change
02:07:53.900 | and with the capabilities,
02:07:55.260 | you know, agentive capabilities and so on.
02:07:57.580 | I think there's also a huge opportunity
02:08:00.700 | for the creativity of an audience like this
02:08:02.900 | to come in and, like, solve problems
02:08:05.380 | that we're not thinking of,
02:08:06.580 | that we don't have the expertise to do.
02:08:08.700 | And ultimately, the world is a much better place
02:08:10.580 | if we get more AI to more people,
02:08:12.740 | and it's why we are so proud to serve all of you.
02:08:15.240 | - I, the only thing I would add is if you just think
02:08:18.660 | about everything that's gonna come together,
02:08:20.300 | at some point in not that many years in the future,
02:08:22.820 | you'll walk up to a piece of glass,
02:08:24.620 | you will say whatever you want.
02:08:26.260 | They will have, like,
02:08:28.020 | there'll be incredible reasoning models,
02:08:29.460 | agents connected to everything.
02:08:30.980 | There'll be a video model streaming back to you,
02:08:32.860 | like a custom interface just for this one request.
02:08:35.660 | Whatever you need is just gonna get, like,
02:08:37.140 | rendered in real time in video.
02:08:38.460 | You'll be able to interact with it,
02:08:39.540 | you'll be able to, like, click through the stream
02:08:41.620 | or say different things, and it'll be off doing, like,
02:08:44.660 | again, the kinds of things that used to take, like,
02:08:46.500 | humans years to figure out.
02:08:48.460 | And it'll just, you know,
02:08:50.460 | dynamically render whatever you need,
02:08:51.860 | and it'll be a completely different way
02:08:54.180 | of using a computer,
02:08:55.820 | and also getting things to happen in the world
02:08:58.740 | that it's gonna be quite a while.
02:09:01.100 | - Awesome, thank you.
02:09:03.860 | That was a great question to end on.
02:09:05.100 | I think we're at time.
02:09:06.580 | Thank you so much for coming today.
02:09:08.180 | (audience applauding)
02:09:11.340 | - That's all for our coverage of Dev Day 2024.
02:09:18.220 | (gentle music)
02:09:21.740 | ♪ Under Dev Day lights, code ignites ♪
02:09:28.540 | ♪ Real-time voice streams reach new heights ♪
02:09:31.660 | ♪ O1 and GPT-408 ♪