back to indexBuilding AGI in Real Time (OpenAI Dev Day 2024)
Chapters
0:0 Intro by Suno.ai
1:23 NotebookLM Recap of DevDay
9:25 Ilan's Strawberry Demo with Realtime Voice Function Calling
19:16 Olivier Godement, Head of Product, OpenAI
36:57 Romain Huet, Head of DX, OpenAI
47:8 Michelle Pokrass, API Tech Lead at OpenAI ft. Simon Willison
64:45 Alistair Pullen, CEO, Cosine (Genie)
78:31 Sam Altman + Kevin Weill Q&A
123:7 Notebook LM Recap of Podcast
00:00:09.660 |
♪ Real time voice streams reach new heights ♪ 00:00:40.120 |
is covering major AI and ML conferences in podcast format. 00:00:48.640 |
with short samples of conversations with key players 00:01:15.120 |
Of course, you can also check the show notes for details. 00:01:34.720 |
- Seems like you're really interested in what's new with AI. 00:01:38.320 |
And it seems like OpenAI had a lot to announce. 00:01:41.360 |
New tools, changes to the company, it's a lot. 00:01:44.080 |
- It is, and especially since you're interested 00:01:49.200 |
you know, practical applications, we'll focus on that. 00:01:56.560 |
That seems like a big deal if we want AI to sound, 00:02:07.840 |
could actually handle it if you interrupted it. 00:02:11.360 |
- Right, not just these clunky back and forth 00:02:14.720 |
- And they actually showed it off, didn't they? 00:02:16.000 |
I read something about a travel app, one for languages. 00:02:28.760 |
And the tech behind it is fascinating, by the way. 00:02:43.520 |
- So imagine giving the AI access to this whole toolbox, 00:02:47.480 |
right, information, capabilities, all sorts of things. 00:02:52.080 |
With function calling, the AI can pull up details, 00:02:55.080 |
let's say about Fort Mason, right, from some database, 00:03:01.680 |
So instead of being limited to what it already knows, 00:03:09.360 |
And someone on Hacker News pointed out a cool detail. 00:03:14.760 |
of what's being said, so you can store that, analyze it. 00:03:21.360 |
into making this API easy for developers to use. 00:03:24.880 |
But while we're on OpenAI, you know, besides their tech, 00:03:28.320 |
there's been some news about like internal changes too. 00:03:31.360 |
Didn't they say they're moving away from being a nonprofit? 00:03:34.480 |
And it's got everyone talking, it's a major shift, 00:03:39.040 |
how that'll change things for OpenAI in the future. 00:03:41.600 |
I mean, there are definitely some valid questions 00:03:44.680 |
like will they have more money for research now? 00:03:53.280 |
especially with all the like the leadership changes 00:03:58.000 |
I read that their chief research officer left, 00:04:01.040 |
and their VP of research, and even their CTO. 00:04:05.320 |
A lot of people are connecting those departures 00:04:15.720 |
Like this whole fine-tuning thing really caught my eye. 00:04:19.160 |
It's essentially taking a pre-trained AI model 00:04:26.360 |
you get one that's tailored for a specific job. 00:04:32.920 |
Imagine you could train an AI on your company's data, 00:04:36.120 |
you know, like how you communicate your brand guidelines. 00:04:40.040 |
that's specifically trained for your company? 00:04:43.000 |
- And they're doing it with images now too, right? 00:04:45.080 |
Fine-tuning with vision is what they called it. 00:04:47.640 |
It's pretty incredible what they're doing with that, 00:04:51.040 |
- Like using AI to help doctors make diagnoses. 00:05:14.840 |
But training these AI models must be really expensive. 00:05:37.800 |
It's good that they're trying to make it more affordable. 00:05:39.920 |
And they're also doing something called model distillation. 00:05:52.240 |
- Make it simpler, but it still tastes the same. 00:05:59.200 |
and create a smaller, more efficient version. 00:06:01.480 |
- So it's like lighter weight, but still just as capable. 00:06:09.320 |
They don't need like a supercomputer to run them. 00:06:21.200 |
The one that's supposed to be this big leap forward. 00:06:27.240 |
it's not just a bigger, better language model. 00:06:31.200 |
- They're saying it can like actually reason, right? 00:06:52.800 |
- Well, OpenAI showed it doing some pretty impressive stuff 00:07:02.280 |
- So you're saying if I needed to like write a screenplay, 00:07:05.720 |
But if I wanted to solve some crazy physics problem, 00:07:15.640 |
And it takes longer to get those impressive results. 00:07:24.120 |
- It sounds like it's still in development though, right? 00:07:25.720 |
Is there anything else they're planning to add to it? 00:07:30.720 |
which will let developers like set some ground rules 00:07:36.640 |
And they're working on adding structured outputs 00:07:46.400 |
is formatted in a way that's easy to use, like JSON. 00:07:53.520 |
It's good that they're thinking about that stuff. 00:07:59.280 |
Dev Day finished up with this really interesting talk. 00:08:04.760 |
and Kevin Weil, their new chief product officer. 00:08:08.520 |
They talked about like the big picture for AI. 00:08:15.800 |
this whole AGI term, artificial general intelligence. 00:08:24.440 |
and people don't really understand what it means. 00:08:40.640 |
But they were also very clear about doing it responsibly. 00:08:54.000 |
It was a lot to take in, this whole Dev Day event, 00:08:59.160 |
and these big questions about the future of AI. 00:09:16.560 |
from our friendly local OpenAI developer experience engineer 00:09:25.280 |
and then go into a little interview with him. 00:09:39.840 |
We'll get those strawberries delivered for you. 00:09:56.960 |
Could you tell me what flavors of strawberry dips 00:10:06.400 |
How much would 400 chocolate-covered strawberries cost? 00:10:19.080 |
- I think that'll be around like $1,415 with 92 cents. 00:10:36.120 |
- Please deliver them to the Gateway Pavilion 00:10:45.720 |
So just to confirm, you want 400 chocolate-covered 00:11:17.040 |
You are dressed up like exactly like a strawberry salesman. 00:11:25.200 |
This is actually something I had been thinking about 00:11:33.640 |
is something like I've personally wanted for a long time. 00:11:44.400 |
And then we thought how cool would it be to have this 00:11:50.600 |
Would you call out any technical issues building? 00:11:54.000 |
Like you were basically one of the first people ever 00:11:58.360 |
Would you call any issues like integrating it 00:12:04.440 |
I noticed that you had like intents of things to fulfill. 00:12:11.920 |
the voice would prompt you role-playing the store guy. 00:12:17.120 |
So I think technically there's like the whole, 00:12:23.440 |
Like even separate from like AI and this like new capabilities 00:12:40.760 |
The function calling itself is sort of tangential to that. 00:12:43.880 |
Like you have to prompt it to call the functions 00:12:45.880 |
but then handling it isn't too much different 00:12:47.720 |
from like what you would do with assistant streaming 00:12:54.040 |
just like if everything in the API was streaming, 00:13:04.560 |
You guys showed like in the playground a lot of logs. 00:13:12.200 |
may have different names than the streaming events 00:13:18.040 |
It's things like, you know, function call started, 00:13:26.280 |
Conveniently, we send one with like has the full function 00:13:40.200 |
we discussed a little bit about the sensitivities 00:13:58.720 |
You wouldn't want someone just calling you with AI. 00:14:13.880 |
having consent of the person you're about to call, 00:14:18.920 |
have consented to like getting called with AI. 00:14:25.120 |
Definitely individuals are more sensitive than businesses. 00:14:27.720 |
I think businesses, you have a little bit more leeway. 00:14:39.600 |
It's kind of like getting on a booking platform, right? 00:14:43.480 |
But I think it's still very much like a gray area. 00:15:08.400 |
and then just front end in a loop until it ends call? 00:15:13.400 |
Like because the API is just based on sessions, 00:15:28.880 |
It's like, it's inherently almost like in a loop, 00:15:56.840 |
To be honest, I have not played without two too much, 00:16:11.240 |
I can't imagine, like what else is real-time? 00:16:13.640 |
- Well, I guess to use ChatGPT's voice mode as an example, 00:16:34.320 |
But like, given that this is the features that currently, 00:16:36.800 |
that exists, that we've demoed on ChatGPT, yeah. 00:16:41.320 |
where there's like a real-time text API, right? 00:16:49.520 |
- I don't know why you would, but it's actually, 00:16:52.760 |
so text-to-text here doesn't quite make a lot of sense. 00:16:56.400 |
I don't think you'll get a lot of latency game, 00:16:59.080 |
but like speech-to-text is really interesting 00:17:04.080 |
you can prevent responses like audio responses 00:17:14.800 |
unlike we weren't sure how well this was gonna work 00:17:17.620 |
because it's like, you have a voice answering, 00:17:21.360 |
Like that's a little bit more, you know, risky. 00:17:27.280 |
and make it so it always has to output a function, 00:17:29.960 |
like you can end up with pretty, pretty reliable, 00:17:38.560 |
Like one-sided voice, not, don't force two-sided on me. 00:17:43.880 |
yeah, I think having an output voice is great, 00:17:53.580 |
Do you want to comment on any of the other stuff 00:18:06.460 |
- 'Cause I can probably cache this for like five minutes. 00:18:09.220 |
but what if I don't make a call every five minutes? 00:18:12.620 |
I've been so caught up with the real-time MPI 00:18:19.540 |
but I think I'm excited to see how all distillation works. 00:18:26.200 |
I've been like doing it between our models for a while 00:18:37.480 |
of like function calling with like hundreds of functions. 00:18:56.520 |
Well, I appreciate you jumping on and amazing demo. 00:19:03.180 |
- Yeah, I guess shout out to like the first people 00:19:09.960 |
And then like I took it and built the voice component 00:19:19.160 |
for like debugging everything as it's been going on. 00:19:22.240 |
- Yeah, you are the first consumers on the UX team. 00:19:25.080 |
- Yeah, I mean the classic role of what we do there. 00:19:33.340 |
- The latent space crew then talked to Olivia Godmont, 00:19:51.980 |
- And it was amazing to see your keynote today. 00:19:55.580 |
What was the backstory of preparing something like this, 00:20:02.480 |
Number one, excellent reception from last year's Dev Day. 00:20:08.800 |
and we want to spend more time with them as well. 00:20:23.120 |
And so this year, we're doing SF, Singapore, and London 00:20:28.420 |
- Yeah, I'm very excited for the Singapore one. 00:20:32.240 |
- I don't know, I don't know if I got an invite. 00:20:51.080 |
No, O1, there was no connection to October 1st. 00:20:53.080 |
But in hindsight, that would have been a pretty good meme, 00:20:58.400 |
Yeah, and I think open AI's outreach to developers 00:21:10.280 |
all that stuff that you talked about in the past. 00:21:14.560 |
as like, here's our little developer conference thing. 00:21:20.240 |
and to see so many developer-oriented products 00:21:22.960 |
coming out of open AI, I think it's really encouraging. 00:21:31.840 |
who make the best connection between the technology 00:21:40.400 |
and are like, hey, I see how that application 00:21:50.680 |
it's a no-brainer for us technically to partner with devs. 00:21:54.120 |
And most importantly, you almost never had waitlists, 00:21:57.440 |
which compared to other releases, people usually have. 00:22:01.080 |
What is the, you know, you had prompt caching, 00:22:10.160 |
- What is the thing that was like sneakily the hardest 00:22:14.400 |
Or like, what was the kind of like, you know, 00:22:19.760 |
- Yeah, they're all fairly, like I would say, 00:22:24.960 |
So the team has been working for a month, all of them. 00:22:29.240 |
The one which I would say is the newest for open AI 00:22:38.840 |
that we have an actual like WebSocket-based API. 00:22:41.400 |
And so I would say that's the one that required 00:22:49.040 |
and to also make sure that our existing safety mitigation 00:22:51.680 |
that worked well with like real-time ODO in and ODO out. 00:22:56.320 |
What design choices or what was like the sort of design, 00:23:03.000 |
like WebSockets, you just receive a bunch of events. 00:23:09.880 |
I think a lot of developers are going to have to embrace 00:23:29.880 |
like, you know, it takes like something like 300 milliseconds 00:23:34.120 |
And so that was a design principle, essentially. 00:23:41.120 |
and WebSockets was the one that we landed on. 00:24:01.160 |
Like we just do it as much as we can, essentially. 00:24:06.680 |
And then finally on distillation, like an evaluation, 00:24:09.880 |
the big design choice was something I learned as hype, 00:24:14.000 |
Like a philosophy around like a pit of success. 00:24:17.000 |
Like what is essentially the minimum number of steps 00:24:21.640 |
for the majority of developers to do the right thing? 00:24:26.240 |
there are many, many ways like to mess it up, frankly, 00:24:28.240 |
like, you know, and have like a crappy model, 00:24:38.960 |
like get like in a few minutes, like to a good spot. 00:24:41.360 |
And so how do we essentially enable that pit of success, 00:25:05.440 |
which very close to home, I'm from Singapore. 00:25:18.720 |
You know, like there's a lot of unknowns with vision 00:25:30.480 |
like to tell correct from incorrect essentially with images. 00:25:39.360 |
we are seeing like even higher performance uplift 00:25:50.760 |
I expect the developers who are moving from one modality 00:25:52.800 |
to like text and images will have like more testing, 00:26:08.880 |
How should people think about being the source of truth? 00:26:11.560 |
Like do you want OpenAI to be like the system of record 00:26:17.920 |
And then is that going to be the same as the models evolve? 00:26:27.320 |
- The vision is, if you want to be a source of truth, 00:26:31.920 |
Like we're not going to force people like to pass us data 00:26:40.080 |
like most developers like use like a one size fits all model 00:26:43.560 |
like be off the shelf, like GP40 essentially. 00:26:46.280 |
The vision we have is fast forward a couple of years. 00:26:49.760 |
I think like most developers will essentially 00:26:51.640 |
like have an automated, continuous, fine-tuned model. 00:26:59.160 |
the more data you pass to the model provider, 00:27:01.360 |
like the model is automatically like fine-tuned, 00:27:05.560 |
And essentially like you don't have to every month 00:27:07.840 |
when there is a new snapshot, like, you know, 00:27:09.360 |
to go online and you know, try a few new things. 00:27:14.720 |
But I think like that evaluation and decision product 00:27:18.160 |
is essentially a first good step in that direction. 00:27:20.480 |
It's like, hey, if you are excited by the direction 00:27:35.320 |
How should people think about when it's worth it, 00:27:38.600 |
Sometimes people get overly protective of their data 00:27:48.720 |
Like, you know, we don't trade on like any API data 00:28:12.400 |
we have full confidence that there is no regression 00:28:19.440 |
And so that essentially is a sort of a two birds, one stone. 00:28:24.960 |
and we also use the evals when we ship like new models 00:28:33.800 |
many developers will not want to share their data 00:28:44.480 |
- Exactly, and we sanitize PII, everything, you know, 00:28:47.320 |
like we have no interest in like the actual like sensitive 00:28:58.480 |
Like sometimes the evals themselves are wrong. 00:29:05.280 |
tinkering with LLM is like, yeah, evaluation, easy. 00:29:07.480 |
You know, I've done testing like all my life. 00:29:09.560 |
And then you start to actually build the virus, 00:29:13.240 |
and you realize, wow, that's like a whole field itself. 00:29:33.000 |
maybe there's some more that you didn't demo, 00:29:34.640 |
but what I see is like kind of like a low-code experience, 00:29:37.880 |
Would you ever support like a more code-based, 00:29:39.720 |
like would I run code on OpenAI's eval platform? 00:29:54.920 |
you know, their existing test data, we'll do it. 00:29:58.240 |
So yeah, there is no, you know, philosophical, 00:30:00.680 |
I would say like, you know, misalignment on that. 00:30:04.960 |
and I don't like, it's basically like you're becoming AWS, 00:30:12.120 |
And I don't know if like, that's a conscious strategy 00:30:15.000 |
or it's like, it doesn't even have to be a conscious 00:30:17.040 |
strategy, like you're going to offer storage, 00:30:30.800 |
But it's the AI versions of everything, right? 00:30:40.320 |
I feel like good models are just half of the story 00:30:48.200 |
Like, you know, you can have the best model in the world 00:30:59.440 |
Number two, like the whole like software development stack 00:31:01.600 |
is being basically reinvented, you know, with LLMs. 00:31:04.600 |
There is no freaking way that open AI can build everything. 00:31:08.120 |
Like, there is just too much to build, frankly. 00:31:14.200 |
which are like the closest to the model itself. 00:31:18.680 |
investing quite a bit in like fine tuning, distillation, 00:31:23.960 |
to have like in one spot, like, you know, all of that. 00:31:32.280 |
like tools which are like further away from the model, 00:31:36.120 |
If you want to do like, you know, super elaborate, 00:31:38.720 |
like home management or, you know, like tooling, 00:31:42.760 |
has like such a big edge, frankly, like, you know, 00:31:48.040 |
But again, frankly, like the philosophy is like super simple. 00:31:52.000 |
It's meeting developers where they want us to be. 00:31:54.080 |
And so, you know, that's frankly like, you know, 00:31:56.840 |
day in, day out, like, you know, what I try to do. 00:32:08.880 |
So, but I think we should spend a bit more time on voice 00:32:11.440 |
'cause I feel like that's like the big splash thing. 00:32:24.760 |
You already have it in the ChatGBT desktop app. 00:32:32.480 |
are developers just gonna be like sending sockets 00:32:42.320 |
like real time is quickly becoming like, you know, 00:32:48.480 |
So my expectation is that we'll see like a non-trivial, 00:32:56.800 |
Like if you zoom out, like audio is a really simple, 00:33:07.200 |
was basically very much like a second class citizen. 00:33:17.800 |
they were like not super educated with technology. 00:33:19.880 |
And so frankly, it was like the crappy option, 00:33:23.400 |
But when you talk to people in the real world, 00:33:26.720 |
the vast majority of people like prefer to talk 00:33:35.760 |
I mean, I'm sure it's the case for you in Singapore. 00:33:38.000 |
the number of like WhatsApp, like voice notes 00:33:41.360 |
I mean, just people, it makes sense frankly, like, you know. 00:33:48.240 |
I mean, you know, you get the point across like pretty well. 00:33:51.400 |
And so my personal ambition for like the real-time API 00:33:54.880 |
and like audio in general is to make like audio 00:33:57.320 |
and like multimodality like truly a first-class experience. 00:34:01.520 |
the amazing, like super bold, like startup out of YC, 00:34:04.520 |
you're going to be like the next like billion, 00:34:08.920 |
and make it feel like, you know, an actual good, 00:34:13.880 |
And I think like, yeah, it could be pretty big. 00:34:16.520 |
I think one issue that people have with the voice so far 00:34:19.760 |
as release in advanced voice mode is the refusals. 00:34:33.880 |
In fact, like even if like not safe for work, 00:34:50.400 |
We're not in the business of like policing, you know, 00:34:52.400 |
if you can say like vulgar words or whatever. 00:34:54.600 |
You know, there are some use cases like, you know, 00:34:58.040 |
I want to say like vulgar words, it's perfectly fine. 00:35:01.080 |
And so I think the direction where we'll go here 00:35:02.920 |
is that basically there will always be like, you know, 00:35:09.200 |
because they're illegal against our terms of services. 00:35:16.960 |
vulgar words or, you know, not safe for work stuff. 00:35:19.480 |
Um, where basically we'll expose like a controllable, 00:35:22.920 |
like safety, like knobs in the API to basically allow you 00:35:28.960 |
How sensitive do you want the threshold to be 00:35:38.120 |
- Because right now it's you, it's whatever you decide. 00:35:52.020 |
- So every singer, and you've been locked off singing. 00:35:54.960 |
- But I understand music gets you in trouble. 00:36:01.520 |
Like we have all developers watching, you know, 00:36:04.960 |
what feedback do you want, anything specific as well, 00:36:11.520 |
anything that you are unsure about that you're like, 00:36:17.480 |
- I think, essentially it's becoming pretty clear 00:36:23.040 |
I would say the open-ended actions become pretty clear, 00:36:26.400 |
Investment in reasoning, investment in multimodality, 00:36:32.800 |
To me, the biggest question I have is, you know, 00:36:39.200 |
I think we need all three of them, frankly, like, you know, 00:36:49.960 |
like is O1 smart enough, like for your problems? 00:36:59.280 |
or do we still have, like, you know, a step to do? 00:37:00.560 |
- Preview is not enough, I need the full one. 00:37:02.840 |
- Yeah, so that's exactly that sort of feedback. 00:37:05.720 |
Essentially, what I would love to do is for developers, 00:37:09.000 |
which has been saying, like, over and over again, 00:37:18.520 |
which is a bit too difficult for the model today, right? 00:37:24.560 |
it's like sort of working, sometimes not working, 00:37:30.520 |
and be like, okay, that's what you need to enable 00:37:32.760 |
with the next model release, like in a few months. 00:37:40.160 |
that I can, like, directly, like, you know, incorporate. 00:37:47.160 |
- Yeah, thank you so much. - Yeah, thank you. 00:37:55.960 |
as that had only previously been picked up on 00:37:59.840 |
This is an encouraging sign that we will return to 00:38:06.800 |
Next, a chat with Roman Hewitt, friend of the pod, 00:38:10.560 |
AI Engineer World's fair-closing keynote speaker, 00:38:18.800 |
and advice to AI engineers on all the new modalities. 00:38:24.280 |
We're with Roman, who just did two great demos on stage, 00:38:42.320 |
I think when you go back to the OpenAI mission, 00:38:45.400 |
that we have the developers involved in everything we do, 00:38:48.680 |
making sure that, you know, they have all of the tools 00:38:54.800 |
are always gonna invent the ideas, the prototypes, 00:38:57.800 |
the fun factors of AI that we can't build ourselves. 00:39:07.240 |
She very seriously said, "API is the path to AGI." 00:39:10.360 |
- And people in our YouTube comments were like, 00:39:21.240 |
of having a platform and an ecosystem of amazing builders 00:39:23.880 |
who can, like, in turn, create all of these apps. 00:39:28.840 |
but there's now more than 3 million developers 00:39:32.360 |
to see all of that energy into creating new things. 00:39:37.160 |
- I was gonna say, you built two apps on stage today, 00:39:43.640 |
The hardest thing must have been opening Xcode 00:39:50.960 |
You had kind of like a GPT app to get the plan with a one, 00:39:54.680 |
and then you had cursor to do apply some of the changes. 00:40:07.960 |
- Yeah, I mean, one of the things that's really cool 00:40:10.120 |
about O1 Preview and O1 Mini being available in the API 00:40:14.320 |
is that you can use it in your favorite tools, 00:40:17.440 |
And that's also what, like, Devin from Cognition 00:40:19.920 |
can use in their own software engineering agents. 00:40:26.160 |
so that's why I had, like, chat GPT side-by-side. 00:40:29.760 |
- But it's cool, right, because I could instruct O1 Preview 00:40:50.200 |
but, like, you can now create an iPhone app from scratch, 00:40:54.000 |
describing a lot of intricate details that you want, 00:40:57.600 |
and your vision comes to life in, like, a minute. 00:41:17.960 |
- Yeah, I mean, like, Xcode and iOS development 00:41:30.360 |
it was a bit harder to get in for someone new, 00:41:42.960 |
That's the best way, I think, to describe O1. 00:41:44.920 |
People ask me, like, "Well, can GPT-4 do some of that?" 00:41:49.320 |
But I think it will just start spitting out code, right? 00:42:00.320 |
It had to look at, like, how do I parse this JSON? 00:42:15.320 |
We are obviously very excited about the upcoming O1 00:42:20.080 |
But we noticed that O1 Mini is very, very good 00:42:36.080 |
But, yeah, I used O1 Mini for my second demo, 00:42:40.640 |
All I needed was very much, like, something rooted in code, 00:42:43.760 |
architecting and wiring up, like, a front-end, a back-end, 00:42:48.440 |
something very specific, and it did that perfectly. 00:42:51.440 |
- And then maybe just talking about Voice and Wanderlust, 00:42:57.360 |
- What's the backstory behind, like, preparing for all that? 00:43:00.480 |
- You know, it's funny, 'cause when last year for Dev Day, 00:43:03.000 |
we were trying to think about what could be a great demo app 00:43:09.200 |
I've always thought travel is a kind of a great use case, 00:43:12.760 |
'cause you have, like, pictures, you have locations, 00:43:15.520 |
you have the need for translations, potentially. 00:43:18.200 |
There's, like, so many use cases that are bounded to travel 00:43:21.760 |
that I thought last year, let's use a travel app. 00:43:29.640 |
And now we thought, well, if there's a voice modality, 00:43:33.000 |
what if we just bring this app back as a Wink, 00:43:36.120 |
and what if we were interacting better with voice? 00:43:39.040 |
And so with this new demo, what I showed was the ability 00:43:42.240 |
to, like, have a complete conversation in real time 00:43:45.600 |
with the app, but also the thing we wanted to highlight 00:43:50.200 |
was the ability to call tools and functions, right? 00:43:52.440 |
So, like, in this case, we placed a phone call 00:43:55.960 |
using the Twilio API, interfacing with our AI agents, 00:43:59.920 |
but developers are so smart that they'll come up 00:44:06.000 |
But what if you could have, like, a 911 dispatcher? 00:44:10.640 |
What if you could have, like, a customer service center 00:44:14.360 |
that is much smarter than what we've been used to today? 00:44:17.080 |
There's gonna be so many use cases for real time. 00:44:38.720 |
where we don't have to interact with those legacy systems. 00:44:42.480 |
Is there anything, so, you're doing function calling 00:44:47.440 |
So, basically, it's WebSockets, it's UDP, I think. 00:44:52.360 |
It's basically not guaranteed to be exactly once delivery. 00:45:00.160 |
- Yeah, it's a bit more delicate to get into it. 00:45:10.600 |
It does have the function calling and the tools, 00:45:14.440 |
if you wanna have something very robust on your client side, 00:45:18.320 |
maybe you wanna have WebRTC as a client, right? 00:45:25.560 |
So, that's why we have partners like LifeKit and Agora 00:45:29.480 |
And I'm sure we'll have many more in the future. 00:45:34.760 |
and I'm sure the feedback of developers in the weeks to come 00:45:37.360 |
is gonna be super critical for us to get it right. 00:45:39.440 |
- Yeah, I think LifeKit has been fairly public 00:45:59.960 |
And we also partnered with LifeKit and Agora, 00:46:14.040 |
If you're working on something that's completely client, 00:46:16.640 |
or if you're working on something on the server side 00:46:18.720 |
for the voice interaction, you may have different needs. 00:46:24.920 |
Is there anything that you want the AI engineering community 00:46:38.320 |
I think Dev Day this year is a little different 00:46:43.800 |
But one way is that we wanted to keep it intimate, 00:46:52.680 |
That's why we have community talks and everything. 00:46:57.040 |
learning from the very best developers and AI engineers. 00:47:07.520 |
the ability to generate prompts quickly in the playground, 00:47:17.640 |
is to say, like, hey, the roadmap that we're working on 00:47:20.560 |
is heavily influenced by them, and they work. 00:47:23.160 |
And so we love feedback from high feature requests, 00:47:37.800 |
- Yeah, I think the model distillation thing as well, 00:47:45.360 |
And I think maybe the most unexpected, right? 00:47:53.720 |
to ship the real-time API for speech-to-speech. 00:48:01.000 |
And we really think that's gonna be a big deal, right? 00:48:09.720 |
but high-performance, high-quality on the use case, 00:48:13.640 |
- Yeah, I sat in the distillation session just now, 00:48:16.320 |
and they showed how they distilled from 4.0 to 4.0 mini, 00:48:18.960 |
and it was, like, only, like, a 2% hit in the performance, 00:48:36.240 |
- As you might have picked up at the end of that chat, 00:48:49.480 |
we are delighted to bring back two former guests of the pod, 00:48:53.760 |
which is something listeners have been greatly enjoying 00:48:56.400 |
in our second year of doing the "Latent Space" podcast. 00:49:02.880 |
joined us recently to talk about structured outputs, 00:49:06.000 |
and today gave an updated long-form session at Dev Day, 00:49:14.040 |
We also got her updated thoughts on the Voice Mode API 00:49:22.160 |
She is joined by friend of the pod and super-blogger, 00:49:24.800 |
Simon Willison, who also came back as guest co-host 00:49:51.520 |
Simon did a great live blog, so if you haven't caught up- 00:49:59.320 |
using like a GPT-4, I wrote me the JavaScript, 00:50:04.400 |
and then, yeah, I was live blogging the whole day. 00:50:08.120 |
- I haven't really gotten to cursor yet, to be honest. 00:50:10.120 |
Like, I just haven't spent enough time for it to click, 00:50:29.080 |
- Same here, team co-pilot. - Co-pilot is actually 00:50:31.560 |
the reason I joined OpenAI, it was, you know, 00:50:34.120 |
before chat GPT, this is the thing that really got me, 00:50:36.360 |
so I'm still into it, but I keep meaning to try out cursor, 00:50:39.120 |
and I think, now that things have calmed down, 00:50:42.320 |
- Yeah, it's a big thing to change your tool of choice. 00:50:52.200 |
That's the thing to do. - It's the done thing, right? 00:50:55.120 |
a hackathon where the only thing you do is fork VS Code, 00:51:02.400 |
- Yeah, so, I mean, congrats on launching everything today. 00:51:09.560 |
but everyone was kinda guessing that Voice API was coming, 00:51:18.340 |
Like, any design decisions that you wanna highlight? 00:51:28.400 |
so a lot of different design decisions to be made. 00:51:37.760 |
So there've been a lot of interesting decisions there. 00:51:39.880 |
The team has also hacked together really cool projects 00:51:43.640 |
One that I really liked is we had an internal hackathon 00:51:46.000 |
for the API team, and some folks built, like, 00:51:49.280 |
a little hack that you could use Vim with Voice Mode. 00:51:54.480 |
So, like, control Vim, and you would tell the model, like, 00:52:00.840 |
So, yeah, a lot of cool stuff we've been hacking on, 00:52:02.960 |
and really excited to see what people build with it. 00:52:13.660 |
That is one of the coolest conference demos I've ever seen. 00:52:18.560 |
I really want the code for that to get put out there. 00:52:24.720 |
And it made me realize that the Realtime API, 00:52:27.180 |
this WebSocket API, it means that building a website 00:52:32.640 |
It's like, it's not difficult to build, spin up a web app 00:52:41.560 |
There are all of these projects I thought I'd never get to, 00:52:46.320 |
I can have a talk to your data, talk to your database 00:53:04.520 |
I was actually just wowed, and I had a similar moment, 00:53:10.240 |
I also thought Roman's drone demo was super cool. 00:53:14.800 |
- Yeah, I actually saw that live this morning, 00:53:19.400 |
- Knowing Roman, he probably spent the last two days 00:53:28.240 |
about what the different levels of extraction are 00:53:32.040 |
It's something that most developers have zero experience with. 00:53:47.040 |
you can connect directly to the OpenAI WebSocket 00:54:03.360 |
- Yeah, we don't recommend that for production. 00:54:09.960 |
So I'm gonna have to go home and build myself 00:54:11.480 |
a little WebSocket proxy just to hide my API key. 00:54:18.560 |
so I don't have to build the 1,000th WebSocket proxy 00:54:23.600 |
We've also partnered with some partner solutions. 00:54:42.760 |
and they can trust that you don't get it, right? 00:54:45.920 |
I mean, I've been building a lot of bring-your-own-key apps 00:54:50.640 |
I store the key in local storage in their browser, 00:54:58.800 |
another piece of JavaScript that steals the key from me? 00:55:01.720 |
this actually comes with the crypto background. 00:55:16.680 |
I think there's some really interesting question 00:55:23.160 |
and it's hard for a small team to do everything. 00:55:28.120 |
about the need for things like sign-in with OpenAI. 00:55:34.120 |
and I get back a token that lets me spend up to $4 00:55:40.120 |
Then I could ship all of my stupid little experiments, 00:55:42.680 |
which currently require people to copy and paste 00:55:49.520 |
Something we're thinking about, and yeah, stay here. 00:55:53.920 |
Right now, I think the only player in town is OpenRouter. 00:56:14.960 |
What's the most underrated release from today? 00:56:22.320 |
For the past two months, whenever I talk to founders, 00:56:25.000 |
they tell me this is the thing they need most. 00:56:27.000 |
A lot of people are doing OCR on very bespoke formats, 00:56:30.320 |
like government documents, and Vision Fine Tuning 00:56:42.800 |
You only really need 100 images to get going. 00:56:46.560 |
I didn't think GPT-4 Vision could do bounding boxes at all. 00:56:50.400 |
- Yeah, it's actually not that amazing at it. 00:56:55.680 |
you can make it really good for your use case. 00:57:02.800 |
- But being able to fine tune a model for that. 00:57:04.120 |
The first thing I'm gonna do with fine tuning for images 00:57:06.280 |
is I've got five chickens, and I'm gonna fine tune a model 00:57:11.800 |
which is hard, 'cause three of them are gray. 00:57:18.000 |
- Yeah, it's, I've managed to do it with prompting, 00:57:21.400 |
just like I gave Claude pictures of all of the chickens, 00:57:24.440 |
and then said, "Okay, which chicken is this?" 00:57:35.240 |
- I'm also really jazzed about the evals product. 00:57:37.800 |
It's kind of like a sub-launch of the distillation thing, 00:57:40.600 |
but people have been struggling to make evals, 00:57:44.300 |
with how easy it is to make an eval in our product, 00:57:50.000 |
I think that's what's holding a lot of people back 00:57:53.680 |
'cause they just have a hard time figuring out 00:57:57.160 |
So we've been working on making it easier to do that. 00:57:59.760 |
- Does the eval product include structure output testing, 00:58:09.880 |
- No, I mean, we have guaranteed structured output anyway. 00:58:15.760 |
- Well, not the schema, but like the performance. 00:58:27.560 |
I'll have to check that for you, but I think so. 00:58:35.200 |
which is multi-turn function calling benchmarks. 00:58:38.480 |
- We're having the guy on the podcast as well, sorry? 00:58:45.440 |
'cause we're actually having them next on the podcast. 00:58:57.760 |
We should probably cut this, but we wanna make it better. 00:59:03.840 |
What, like, how do you think about the evolution 00:59:09.920 |
I think to me, that's like the most important thing. 00:59:11.480 |
So even with the open AI levels, like chatbots, 00:59:15.040 |
I can understand what the API design looks like. 00:59:20.280 |
even though like chain of thought kind of changes things. 00:59:26.080 |
it's like, how do you think about how you design the API, 00:59:39.080 |
So a really good example of this is real-time. 00:59:41.640 |
We're actually going to be shipping audio capabilities 00:59:48.120 |
So you supply in audio, and you can get back raw audio, 00:59:55.360 |
we realized ourselves that like it's pretty hard to do 00:59:59.760 |
And so that led us to building this WebSocket API. 01:00:02.840 |
So we really learned a lot from our own tools, 01:00:04.520 |
and we think the chat completions thing is nice, 01:00:08.880 |
but you're really gonna want a real-time API. 01:00:19.200 |
something like closer to more client-side libraries. 01:00:29.800 |
if I've got a half hour long audio recording, 01:00:32.680 |
at the moment, the only way I can feed that in 01:00:34.840 |
is if I call the WebSocket API and slice it up 01:00:46.960 |
Is that something-- - That's what we're gonna do. 01:00:51.760 |
but it's rolling out, I think, in the coming weeks. 01:00:57.600 |
we're just putting finishing touches on stuff. 01:00:58.440 |
- Do you have a feel for the length limit on that? 01:01:14.400 |
- Totally, yeah, we're really jazzed about it. 01:01:15.680 |
We wanna basically give the lowest capabilities we have, 01:01:28.920 |
is I do a lot of Unix utilities, little Unix things. 01:01:32.480 |
I want to be able to pipe the output of a command 01:01:37.080 |
to the WebSocket API and then speaks it out loud. 01:01:40.120 |
So I can do streaming speech of the output of things. 01:01:44.640 |
Like, I think you've given me everything I need for that. 01:01:49.760 |
- I heard there are multiple competing solutions, 01:01:55.760 |
and you guys eval it before you pick WebSockets, 01:02:01.600 |
Can you give your thoughts on the live updating paradigms 01:02:12.520 |
- Well, I think WebSockets are just a natural fit 01:02:24.760 |
- So it wasn't even really that controversial at all? 01:02:29.600 |
I mean, we definitely explored the space a little bit, 01:02:31.380 |
but I think we came to WebSockets pretty quickly, yeah. 01:02:39.000 |
- Yeah, not yet, but, you know, possible in the future. 01:02:43.600 |
- I actually was hoping for the ChatGPT desktop app 01:02:54.160 |
to send images over the WebSocket API, we get video. 01:03:00.040 |
Yeah, because, yeah, I mean, sending a whole video frame 01:03:04.560 |
of like a 1080p screen, maybe it might be too much. 01:03:08.460 |
What's the limitations on a WebSocket chunk going over? 01:03:15.440 |
- Like Google Gemini, you can do an hour's worth of video 01:03:18.140 |
in their context window, just by slicing it up 01:03:20.600 |
into one frame at 10 frames a second, and it does work. 01:03:24.680 |
So I don't know, but then that's the weird thing 01:03:28.820 |
about Gemini is it's so good at you just giving it a flood 01:03:31.480 |
of individual frames, it'll be interesting to see 01:03:47.480 |
- I want you to do all of the accounting for me. 01:03:53.620 |
and I want them to call your APIs with their user ID 01:04:00.520 |
cut them off at a dollar, I can check how much they spent, 01:04:03.440 |
all of that stuff, 'cause I'm having to build that 01:04:08.320 |
I want you to do the token accounting for me. 01:04:13.760 |
- Well, how does that contrast with your actual priorities? 01:04:16.480 |
Like, I feel like you have a bunch of priorities. 01:04:19.160 |
They showed some on stage with multimodality and all that. 01:04:28.880 |
Things that are big blockers for user adoption 01:04:46.600 |
- I was hoping for an all-one native thing in Assistance. 01:04:52.120 |
- 'Cause I thought they would go well together. 01:04:53.680 |
- We're still kind of iterating on the formats. 01:04:56.220 |
I think there are some problems with the Assistance API, 01:05:03.240 |
but just, you know, it wasn't quite ready yet. 01:05:07.980 |
People really like hosted tools, and especially RAG. 01:05:13.820 |
is just how many API requests you need to get going 01:05:21.360 |
you gotta create a thread, you gotta do all this stuff. 01:05:24.600 |
So yeah, it's something we're thinking about. 01:05:27.000 |
- The only thing I've used it for so far is Code Interpreter. 01:05:32.760 |
- Yes, we wanna fix that and make it easier to use. 01:05:41.480 |
- Yeah, do you wanna bring your own Code Interpreter 01:05:44.980 |
- I wanna use that 'cause Code Interpreter's a hard problem. 01:05:50.580 |
Code Interpreter as a service things out there. 01:06:04.620 |
- You can run, you can compile C code in Code Interpreter. 01:06:13.180 |
- I've had it write me custom SQLite extensions in C 01:06:15.780 |
and compile them and run them inside of Python 01:06:21.420 |
- I mean, yeah, there's others, E2B is one of them. 01:06:34.780 |
We left the episode as what will voice mode look like? 01:06:48.420 |
and also a familiar recent voice on the Latent Space pod, 01:06:55.180 |
Alastair Pullen of Cosene made a huge impression 01:07:00.020 |
Special shout out to listeners like Jesse from Morph Labs 01:07:18.940 |
because he refused to disclose his reasoning traces 01:07:33.620 |
and still perform lower than Cosene's genie model. 01:07:39.260 |
to break down what has happened since his episode aired. 01:07:53.140 |
- Yeah, so off the back of the work that we've done 01:07:56.260 |
that we spoke about last time we saw each other, 01:08:01.980 |
that the work we've been doing around fine-tuning 01:08:07.020 |
but today I spoke about some of the techniques 01:08:13.900 |
and the techniques that we built to build genie. 01:08:22.940 |
how you generate a data set to show the model 01:08:26.900 |
And that was mainly what I spoke about today. 01:08:31.620 |
I was super excited at the opportunity, obviously. 01:08:34.460 |
Like, it's not every day that you get to come and do this, 01:08:37.620 |
So yeah, they reached out and they were like, 01:08:40.620 |
You can speak about basically anything you want 01:08:46.100 |
how you build a model that does this software engineering. 01:08:49.340 |
- Yeah, and the trick here is when we talked, 01:09:10.940 |
your chain of thought reasoning traces as IP. 01:09:18.420 |
I feel slightly vindicated by that now, not gonna lie. 01:09:35.540 |
to generate these human-like reasoning traces was, 01:09:44.140 |
In our case, we wanted it to think like a software engineer. 01:10:03.060 |
some of the reasoning traces in our genie model 01:10:09.060 |
And we've already started seeing improvements 01:10:14.820 |
in terms of, like, the whole, like, withholding them, 01:10:18.380 |
I still think that that was the right decision to do 01:10:23.300 |
that everyone else has decided to not share those things. 01:10:26.220 |
It's, it is exactly, it shows exactly how we do what we do. 01:10:32.420 |
- As a founder, so, they also feature Cognition on stage, 01:10:38.980 |
How does that make you feel that, like, you know, 01:10:41.580 |
they're like, "Hey, O1 is so much better, makes us better." 01:10:48.260 |
it kind of, like, raises the floor for everybody. 01:10:50.260 |
Like, how should people, especially new founders, 01:11:00.300 |
- Yeah, I, speaking for us, I mean, obviously, like, 01:11:06.260 |
because at that point, the process of reasoning 01:11:19.940 |
I thought immediately, "Well, I can improve the quality 01:11:24.460 |
So, like, my signal-to-noise ratio gets better. 01:11:26.940 |
And then, not immediately, but down the line, 01:11:29.340 |
I'm going to be able to train those traces into O1 itself. 01:11:32.540 |
So, I'm going to get even more performance that way as well. 01:11:35.540 |
So, it's, for us, a really nice position to be in, 01:11:39.580 |
both on the prompted side and the fine-tuned side. 01:11:46.020 |
we are, I think, fairly clearly in a position now 01:11:52.820 |
This process continues, like, even going from, 01:11:55.620 |
you know, when we first started going from 3.5 to 4, 01:12:04.980 |
we've seen the performance get better every time. 01:12:09.620 |
the crude advice I'd give to any startup founder 01:12:15.100 |
you know, like, sea-level rise every time, essentially. 01:12:19.660 |
that you were able to take 4.0 and fine-tune it 01:12:22.860 |
higher than O1 currently scores on SweeBench Verified? 01:12:29.020 |
to be honest with you, you realized that before I did. 01:12:33.620 |
- Yes, absolutely, that's a value-add investor right there. 01:12:38.180 |
that in of itself is really vindicating to see 01:12:40.540 |
because I think we have heard from some people, 01:12:48.540 |
"then what's the point of doing your reasoning?" 01:12:50.260 |
But it shows how much more signal is in, like, 01:12:54.460 |
And again, it's the very sort of obvious thing. 01:12:59.020 |
If you take something that's made to be general 01:13:01.940 |
of course it's gonna be better at that thing, right? 01:13:11.100 |
and I'm sure that that delta will continue to grow 01:13:14.900 |
and once we've done more work on our dataset using O1, 01:13:26.540 |
that OpenAI really doesn't want you to figure out 01:13:29.820 |
is can you use an open-source model and beat O1? 01:13:35.060 |
- Because you basically have shown proof of concept 01:13:40.740 |
and their whole O1 marketing is, "Don't bother trying." 01:14:01.380 |
instead of five minutes, and then suddenly it works. 01:14:05.740 |
I mean, one of the things that we just want to do 01:14:08.420 |
is do something like fine-tune 4.05b on the same dataset. 01:14:17.380 |
with the waitlist, shipping product, you know, dev day, 01:14:20.580 |
like, you know, onboarding customers from our waitlist. 01:14:22.860 |
All these different things have gotten in the way, 01:14:25.020 |
but it is definitely something out of more curiosity 01:14:34.660 |
but they might be able to deploy an open-source model, 01:14:41.340 |
I'd be very keen to see what the results of it. 01:14:54.580 |
I, yeah, I'm interested to see if there's Open01, basically. 01:15:02.820 |
once we've wrapped up what we're doing in San Francisco, 01:15:10.660 |
who might be able to allow us to do it very easily. 01:15:17.820 |
Yeah, that might happen sooner rather than later. 01:15:26.580 |
when you're, like, dealing with a lot of code bases, 01:15:31.580 |
related to, like, more, like, UI-related development? 01:15:34.940 |
Yeah, I mean, we were, like, we were talking. 01:15:36.420 |
It's funny, like, my co-founder, Sam, who you've met, 01:15:54.340 |
and links to, like, graphical resources and stuff, 01:16:09.540 |
Particularly, if you think about one of the things, 01:16:11.220 |
not to sidetrack, but one of the things we've noticed is, 01:16:20.500 |
from actually shipping this product to users is 01:16:25.940 |
So, for example, when people are doing, like, 01:16:38.540 |
the fine-tuning for vision to be able to help eval, 01:16:47.860 |
here's the code that actually, like, represents that UI, 01:16:50.860 |
is also gonna be super useful as well, I think. 01:16:59.180 |
I think we'll probably end up using it in places. 01:17:07.580 |
they're gonna be building a lot of the things 01:17:16.340 |
John, the head of fine-tuning, extensively about this. 01:17:35.260 |
we don't have to build it and maintain it afterwards. 01:17:50.460 |
- Did you not, so there's a very active ecosystem 01:17:55.940 |
Did you not evaluate those before building your own? 01:18:09.900 |
it was never a big enough pain point to be like, 01:18:15.820 |
something that you can hack a script together 01:18:25.780 |
And whenever you need a new thing, you just tack it on. 01:18:43.100 |
So it's great that OpenAI are gonna build them in, 01:18:47.220 |
'cause it's really nice to have them there, for sure. 01:18:51.940 |
I'd ever consider really paying for externally 01:18:58.340 |
- Maybe one day, that'd be sick, wouldn't it? 01:19:03.220 |
we've been asking this question to everybody. 01:19:05.820 |
- You're the first person to not mention voice mode. 01:19:07.140 |
- Oh, well, it's currently so distant from what we do. 01:19:11.540 |
But I definitely think, like, this whole talk 01:19:14.900 |
of we want it to be a full-on AI software engineering 01:19:16.980 |
colleague, like, there is definitely a vector 01:19:26.820 |
in terms of how we wanna build something down the line. 01:19:32.580 |
like, that would be nice to have when we have the time, yeah. 01:19:45.220 |
- And before we sat down, you talked a little bit 01:19:51.980 |
- So, we have been rolling people off the wait list 01:20:03.180 |
like, we had to be very opinionated about the data mix 01:20:11.540 |
JavaScript, JavaScript, JavaScript, Python, right? 01:20:14.540 |
There's a lot of JavaScript in its various forms in there. 01:20:20.380 |
to the very early alpha users we rolled it out to, 01:20:32.180 |
And they weren't getting the levels of performance 01:20:35.380 |
that they saw when they tried it with a Python code base. 01:20:45.260 |
with the actual, like, objective data mix that we saw. 01:21:00.900 |
And we've been seeing improvements coming from that. 01:21:07.700 |
and letting people use it and giving you feedback 01:21:16.180 |
over time as we roll it out to more and more people, 01:21:18.140 |
and we are trying to do that as fast as possible, 01:21:20.180 |
we're still a team of five for the time being, 01:21:46.460 |
It will go through all of your code base history, 01:21:50.060 |
and then you'll have an incrementally fine-tuned Genie 01:21:53.220 |
And that's what enterprises really love the idea of. 01:21:59.420 |
Thank you so much. - Thank you so much, guys. 01:22:01.700 |
- Lastly, this year's Dev Day ended with an extended Q&A 01:22:08.140 |
We think both the questions asked and answers given 01:22:13.900 |
so we are posting what we could snag of the audio here 01:22:18.460 |
credited in the show notes, for you to pick through. 01:22:21.820 |
If the poorer quality audio here is a problem, 01:22:24.620 |
we recommend waiting for approximately one to two months 01:22:28.100 |
until the final video is released on YouTube. 01:22:31.420 |
In the meantime, we particularly recommend Sam's answers 01:22:56.300 |
I'm Kevin Weil, Chief Product Officer at OpenAI. 01:23:01.340 |
the amazing research that our research teams do 01:23:08.140 |
and the APIs that you all build on every day. 01:23:10.340 |
I thought we'd start with some audience engagement here. 01:23:13.940 |
So on the count of three, I'm gonna count to three, 01:23:19.900 |
of all the things that you saw launched here today, 01:23:22.580 |
what's the first thing you're gonna integrate? 01:23:24.940 |
It's the thing you're most excited to build on, all right? 01:23:37.140 |
I'm super excited about our distillation products. 01:23:39.900 |
I think that's gonna be really, really interesting. 01:23:48.320 |
with advanced voice code with the Realtime API 01:23:59.060 |
Let's see if I can't make a career-limiting move. 01:24:11.380 |
- You know, we used to, every time we finished a system, 01:24:15.500 |
we would say, like, in what way is this not an AGI? 01:24:31.580 |
So we're trying to, like, stop talking about AGI 01:24:35.220 |
as this general thing, and we have this levels framework, 01:24:38.580 |
because the word AGI has become so over-limited. 01:24:41.820 |
So, like, real quickly, we use one for chatbots, 01:24:46.540 |
four for innovators, five for organizations, like, roughly. 01:24:55.100 |
And it, you know, can do really quite impressive 01:25:02.020 |
It doesn't feel AGI-like in a few important ways, 01:25:14.860 |
we will be able to do in the not-distant future, 01:25:21.420 |
Still probably not something that most of you 01:25:23.540 |
would call an AGI, though, maybe some of you would, 01:25:31.340 |
And then, the leap, and I think we did that pretty quickly, 01:25:38.460 |
really increase the rate of new scientific discovery, 01:25:50.380 |
Like, I think all of this now is gonna happen 01:25:58.900 |
and you're like, eh, I mean, if you go look at, 01:26:01.660 |
like, if you go from my O1 on a hardware problem 01:26:09.360 |
And I think the next year will be very steep progress, 01:26:14.460 |
harder than that, hard to see a lot of certainty, 01:26:18.780 |
and at this point, the definitions really matter. 01:26:20.740 |
And the fact that the definitions matter this much 01:26:23.600 |
somehow means we're, like, getting pretty close. 01:26:37.980 |
I don't think that's exactly how we think about it anymore, 01:26:46.060 |
I think we're, like, you know, in this, like, 01:26:58.820 |
it's just gonna be this, like, smooth exponential, 01:27:05.420 |
when that milestone was hit, and will just realize 01:27:11.260 |
was, like, this very clear milestone, you know, 01:27:18.100 |
no one cared, but I think the right framework 01:27:29.940 |
that is, like, materially better at all of open AI 01:27:39.140 |
It's probably still wrong to think about it that way. 01:27:41.100 |
It probably still is this smooth exponential curve, 01:27:52.660 |
Will research still drive the core of our advancements 01:28:04.180 |
when the right thing to do was just to scale up compute, 01:28:08.340 |
and we had a spirit of, like, we'll do whatever works. 01:28:11.060 |
You know, like, we want to, we have this mission, 01:28:16.380 |
If the answer is, like, rack up GPUs, we'll do that. 01:28:29.260 |
over a long period of time that came together 01:28:32.700 |
We have many more giant research breakthroughs to come, 01:28:45.660 |
about research, and we understand how to, I think, 01:28:57.340 |
Like, when people copy OpenAI, I'm like, great, 01:29:03.380 |
to, like, really do research in the true sense of it, 01:29:12.300 |
and the one after that, and the one after that. 01:29:15.380 |
And I think the thing that is special about us as an org, 01:29:23.180 |
is that we know how to run that kind of a culture 01:29:26.020 |
that can go, that can go push back the frontier. 01:29:32.900 |
And that's, you know, I think we're gonna have to do that 01:29:35.220 |
a few more times, and then we can take you on. 01:29:37.580 |
- Yeah, I'll say, like, the litmus test for me, 01:29:48.820 |
is fundamentally different than any other place 01:29:53.460 |
You know, normally, you have some sense of your tech stack. 01:29:57.780 |
You have some sense of what you have to work with, 01:30:03.380 |
And then you're trying to build the best product, right? 01:30:08.900 |
and how you can help solve those problems for them. 01:30:13.540 |
But also, the state of, like, what computers can do, 01:30:23.140 |
And suddenly, computers have a new capability 01:30:25.180 |
that they've never had in the history of the world, 01:30:27.580 |
and we're trying to figure out how to build a great product 01:30:31.180 |
and expose that for developers and our APIs and so on. 01:30:34.260 |
And, you know, you can't totally tell what's coming. 01:30:36.780 |
It's coming through the mist a little bit at you, 01:30:45.300 |
- Is that the thing that has most surprised you? 01:30:51.580 |
even internally, we don't always have a sense. 01:30:54.540 |
You have, like, okay, I think this capability's coming, 01:30:56.700 |
but is it going to be, you know, 90% accurate, 01:31:10.660 |
and figuring out how you put a roadmap together 01:31:14.500 |
- Yeah, the degree to which we have to just, like, 01:31:19.020 |
what we go work on next and what products we build 01:31:21.980 |
and everything else is, I think, hard to get across. 01:31:25.380 |
Like, we have guesses about where things are gonna go. 01:31:41.340 |
pick what the science allows, that's surprising. 01:31:51.860 |
one of the things we really want is a notification 01:31:54.780 |
60 days in advance when you're gonna launch something. 01:32:04.700 |
these are a bunch of questions from the audience, 01:32:15.580 |
But next thing is, so many in the alignment community 01:32:28.620 |
I think it's true we have a different take on alignment 01:32:38.460 |
But we really do care a lot about building safe systems. 01:32:46.180 |
that has been informed by our experience so far. 01:32:51.580 |
which is you don't get to pick where the science goes. 01:32:54.700 |
We want to figure out how to make capable models 01:33:04.300 |
we didn't think the whole strawberry or the O1 paradigm 01:33:09.100 |
And that brought a whole new set of safety challenges, 01:33:14.460 |
And rather than kind of plan to make theoretical, 01:33:34.620 |
And O1 is obviously our most capable model ever, 01:33:39.260 |
but it's also our most aligned model ever, by a lot. 01:33:47.300 |
better reasoning, whatever you want to call it, 01:33:51.700 |
the things we can do to build really safe systems 01:33:59.060 |
So, we have to build models that are generally accepted 01:34:04.060 |
as safe and robust to be able to put them in the world. 01:34:13.820 |
and what we thought the problems that we needed to solve 01:34:19.220 |
like the problems that actually are in front of us 01:34:26.780 |
if you asked me for the techniques that would have worked 01:34:29.180 |
for us to be able to now deploy our current systems 01:34:35.660 |
they would not have been the ones that turned out to work. 01:34:43.780 |
which I think has been one of our most important 01:34:45.860 |
safety stances ever, and sort of confronting reality 01:34:49.580 |
as it's in front of us, we've made a lot of progress, 01:34:54.620 |
but we also keep finding new techniques to solve them. 01:35:10.260 |
It's a little bit less clear, kind of, what to do there, 01:35:13.780 |
and sometimes you end up backtracking a lot, but, 01:35:16.220 |
but I don't think it's, I also don't think it's fair 01:35:21.260 |
to say we're only gonna work on the thing in front of us. 01:35:23.900 |
We do have to think about where this is going, 01:35:26.300 |
And I think if we keep approaching the problem 01:35:30.220 |
from both ends like that, most of our thrust on the, 01:35:32.980 |
like, okay, here's the next thing, we're gonna deploy this, 01:35:35.860 |
what needs to happen to get there, but also, like, 01:35:44.180 |
- I'll say also, it's one of the places where I really, 01:35:46.540 |
I really like our philosophy of iterative deployment. 01:35:54.140 |
Ev said something that stuck with me, which is, 01:35:57.620 |
no matter how many smart people you have inside your walls, 01:36:00.700 |
there are way more smart people outside your walls. 01:36:07.500 |
it'd be one thing if we just said we're gonna try 01:36:09.700 |
and figure out everything that could possibly go wrong 01:36:13.700 |
and the red teamers that we can hire, and so on. 01:36:18.540 |
But also, launching iteratively and launching carefully, 01:36:22.000 |
and learning from the ways that folks like you all use it, 01:36:26.780 |
I think is a big way that we get these things right. 01:36:29.020 |
- I also think that as we head into this world 01:36:36.140 |
that is gonna become really, really important. 01:36:42.460 |
the pressure testing from the whole outside world, 01:36:48.940 |
- Yeah, so we'll go, actually, we'll go off of that. 01:36:52.880 |
Maybe talk to us a bit more about how you see agents 01:36:55.560 |
fitting in middle of the night as long-term plans. 01:37:02.640 |
I think the exciting thing is this set of models, 01:37:13.480 |
Because you finally have the ability to reason, 01:37:15.600 |
to take hard problems, break them into simpler problems, 01:37:18.280 |
and act on them, I mean, I think 2025 is gonna be 01:37:28.640 |
and they will, I think, have an important place 01:37:33.120 |
When you can ask a model, when you can ask by ChatGT 01:37:39.360 |
or some agent or something, and it's not just like 01:37:45.320 |
and O1 gives you a nice piece of code back, or whatever. 01:37:48.320 |
But you can really give something a multi-term interaction 01:37:52.440 |
with environments or other people or whatever, 01:37:54.720 |
like think for the equivalent of multiple days 01:37:56.920 |
of human effort, and a really smart, really capable human, 01:38:06.320 |
we're all like, oh yeah, we can start the next thing, 01:38:08.160 |
this is coming, this is gonna be another thing, 01:38:12.040 |
You know, it's like the next model in evolution. 01:38:20.880 |
people get used to any new technology quickly, 01:38:25.920 |
to the way the world works in a short period of time. 01:38:28.920 |
- Yeah, it's amazing, somebody was talking about 01:38:32.040 |
getting used to new capabilities in AI models 01:38:34.040 |
and how quickly, actually, I think it was about Waymo, 01:38:37.640 |
but they were talking about how in the first 10 seconds 01:38:43.640 |
is this thing, they're like, there's a bug, let's watch out, 01:38:47.840 |
oh, this is really cool, and then 20 minutes in, 01:38:54.640 |
internal firmware updates for this new stuff very quickly. 01:39:08.040 |
and they'll finish in an hour, and it'll be great, 01:39:10.160 |
and then they'll have like 10 of those at the same time, 01:39:12.720 |
and then they'll have like 1,000 of those at the same time, 01:39:15.600 |
and by 2030 or whatever, we'll look back and be like, 01:39:19.040 |
yeah, this is just like what a human is supposed 01:39:27.160 |
like I just now like ask a computer to do it, 01:39:36.920 |
- Yeah, it's also, it's one of the things that makes 01:39:39.480 |
having an amazing development platform great, too, 01:39:45.600 |
and we'll build some agentic things, of course, 01:39:47.680 |
and like we've already got, I think just like, 01:39:50.640 |
we're just pushing the boundaries of what's possible today. 01:39:53.720 |
You've got groups like Cognition doing amazing things 01:39:59.280 |
you got Speak doing cool things with language translation, 01:40:13.960 |
is just getting to like watch the unbelievable speed 01:40:16.720 |
and creativity of people that are building these experiences, 01:40:19.600 |
like developers, very near and dear to our heart. 01:40:23.520 |
It's kind of like the first thing we launched, 01:40:26.640 |
and just many of us came building on the platforms, 01:40:29.800 |
but the, so much of the capability of these models 01:40:40.360 |
first-party products, but we know that we'll only ever be 01:40:44.040 |
like a small, narrow slice of the apps or agents 01:40:58.120 |
I'm gonna keep going on the agent front here. 01:41:11.520 |
Like, if you are really going to give an agent 01:41:14.920 |
the ability to start clicking around your computer, 01:41:18.220 |
which you will, you are going to have a very high bar 01:41:28.360 |
So, technically speaking, I think that, you know, 01:41:30.840 |
we're getting like pretty close to the capability side, 01:41:34.000 |
but this sort of agent safety and trust framework, 01:41:43.400 |
that's almost the opposite of one of the questions 01:41:45.800 |
Do you think safety could act as a false positive 01:41:48.000 |
and actually limit public access to critical tools 01:41:53.560 |
- The honest answer is yes, that will happen sometimes. 01:42:16.280 |
It would have been things that would have gone really wrong. 01:42:34.380 |
like, you know, I don't think people are complaining, 01:42:36.020 |
like, oh, voice mode, like, it won't say this offensive thing 01:42:45.460 |
If you are trying to get O1 to say something offensive, 01:42:48.700 |
it should follow the instructions of its user 01:42:56.540 |
of when we put a new technology into the world, 01:43:03.780 |
We try to understand where the real harms are 01:43:05.620 |
versus sort of, like, kind of more theoretical ones. 01:43:08.760 |
And that's, like, part of our approach to safety. 01:43:22.420 |
like, sometimes we won't be conservative enough 01:43:24.900 |
But if we're right that these systems are going to get 01:43:52.420 |
I think one of the challenges, and we face this too, 01:43:54.580 |
'cause we're also building products on top of our own models 01:44:11.380 |
it'll work well today, but it's gonna feel old tomorrow. 01:44:21.580 |
You know, where maybe the early adopters will go for it 01:44:25.420 |
but that just means that when the next model comes out, 01:44:35.180 |
But figuring out that boundary is really hard. 01:45:02.740 |
all the hard work of building a great company 01:45:13.820 |
and I see this as, like, a very common thing, 01:45:16.860 |
which is, like, I can do this incredible thing, 01:45:27.740 |
You still have to, like, build a good business 01:45:33.420 |
in the unbelievable excitement and updraft of AI, 01:45:47.820 |
How do you ensure ethical use of such a powerful tool 01:45:54.020 |
- Yeah, you know, voice mode was a really interesting one 01:45:59.860 |
It was, like, the first time that I felt like 01:46:04.940 |
in that, when I was playing with the first beta of it, 01:46:16.540 |
But in voice mode, I, like, couldn't not, kind of, 01:46:41.340 |
this is an example of, like, a more general thing 01:46:44.540 |
which is, as these systems become more and more capable, 01:46:48.980 |
and as we try to make them as natural as possible 01:46:52.780 |
they're gonna, like, hit parts of our neural circuitry 01:46:58.380 |
that have, like, evolved to deal with other people. 01:47:00.940 |
And, you know, there's, like, a bunch of clear lines 01:47:12.500 |
I think vaguely socially manipulative stuff we could do. 01:47:25.500 |
and it, like, at least in me, triggers something. 01:47:56.820 |
There are three things that we really wanna get in for-- 01:48:26.700 |
that every other model that we've launched has? 01:48:29.140 |
I'm really excited to see things like system prompts 01:48:48.580 |
and a whole bunch more things you'll have asked for. 01:48:51.020 |
The model is gonna get so much better so fast. 01:49:21.340 |
of a year of improvement than from 4.0 to O1. 01:49:37.380 |
- I think Google's notebook thing is super cool. 01:49:49.740 |
and I was like, looking at examples on Twitter 01:50:18.420 |
but they also nailed the podcast-style voices. 01:50:28.020 |
Did you guys see, somebody on Twitter was saying, 01:50:31.700 |
like, the cool thing to do is take your LinkedIn 01:50:37.460 |
and give it to these, give it to Notebook LM, 01:50:40.060 |
and you'll have two podcasters riffing back and forth 01:50:56.260 |
It's kind of a different take on what we did with GPTs. 01:51:02.100 |
It's something you build and can use over and over again. 01:51:06.260 |
but like, more temporary, meant to be kind of stood up, 01:51:12.700 |
And that different mental model makes a difference. 01:51:16.860 |
I think they did a really nice job with that. 01:51:19.100 |
All right, we're getting close to audience questions, 01:51:38.500 |
to a bit of what we were saying around trying to build 01:51:46.020 |
But it's a real balance, too, as we, you know, 01:51:51.300 |
we support over 200 million people every week on ChatGPT. 01:51:58.980 |
like, deal with this bug for three months or this issue. 01:52:06.980 |
And there are some really interesting product problems. 01:52:09.020 |
I mean, you think about, I'm speaking to a group of people 01:52:17.220 |
and that is the vast majority of the world still. 01:52:19.680 |
You're basically giving them a text interface, 01:52:27.100 |
is this like alien intelligence that's constantly evolving 01:52:33.300 |
and you're trying to teach them all the crazy things 01:52:35.860 |
that you can actually do, and all the ways it can help 01:52:37.820 |
and integrate into your life and solve problems for you. 01:52:43.460 |
You know, like, you come in, and you're just like, 01:52:45.140 |
people type, like, hi, and it responds, you know, 01:52:49.500 |
hey, great to see, like, how can I help you today? 01:52:52.300 |
And then you're like, okay, I don't know what to say, 01:52:57.020 |
and you're like, well, I didn't see the magic in that. 01:52:59.340 |
And so it's a real challenge figuring out how you, 01:53:04.780 |
that we use chat GPT and AI tools in general, 01:53:10.300 |
and then bringing them along as the model changes 01:53:15.380 |
these capabilities way faster than we as humans 01:53:18.020 |
gain the capabilities, it's a really interesting 01:53:20.580 |
set of problems, and I know it's one that you all solve 01:53:28.400 |
Who feels like, they spend a lot of time with O1, 01:53:31.140 |
and they would say, like, I feel definitively 01:53:43.900 |
No one taking the bet of, like, being smarter than O2? 01:53:47.740 |
So, one of the challenges that we face is, like, 01:53:50.620 |
we know how to go do this thing that we think will be, 01:53:58.220 |
like, a broad array of tasks, and yet you have to, like, 01:54:02.020 |
still like fixed bugs, and do the, hey, how are you problem, 01:54:06.220 |
and mostly what we believe in is that if we keep pushing 01:54:09.020 |
on model intelligence, people will do incredible things 01:54:12.780 |
with that, you know, we want to build the smartest, 01:54:16.060 |
most helpful models in the world, and people then find 01:54:19.060 |
all sorts of ways to use that, and build on top of that. 01:54:32.820 |
and make this super usable, and I think we've gotten better 01:54:36.620 |
at balancing that, but still, as part of our culture, 01:54:40.100 |
I think, we trust that if we can keep pushing 01:54:42.780 |
on intelligence, so it's all four of you around down here, 01:54:47.780 |
it'll, people will build just incredible things 01:54:52.900 |
- Yeah, I think it's a core part of the philosophy, 01:54:58.500 |
we'll basically incorporate the frontier of intelligence 01:55:06.940 |
because it's easy to kind of stick to the thing you know, 01:55:09.500 |
the thing that works well, but you're always pushing us 01:55:12.820 |
to like, get the frontier in, even if it only kind of works, 01:55:25.060 |
You do say please and thank you to the models, 01:55:26.740 |
I'm curious, how many people say please and thank you? 01:55:33.860 |
I kind of can't, I mean, I'd feel bad if I don't. 01:55:42.660 |
into audience questions for the last 10 or so minutes. 01:55:45.780 |
Do you plan to build models specifically made 01:55:48.020 |
for agentic use cases, things that are better 01:55:57.300 |
at agentic use cases, that'll be a key priority 01:56:09.220 |
function calling that we need to build that'll help, 01:56:11.460 |
but mostly we just wanna make the best reasoning models 01:56:41.340 |
- Yeah, I mean, we put models up for agent-only use 01:56:47.900 |
We use checkpoints and try to have people use them 01:56:51.180 |
for whatever they can, and try to build new ways 01:56:54.260 |
to explore the capability of the model internally 01:56:57.500 |
and use them for our own development or research 01:57:01.260 |
We're still always surprised by the creativity 01:57:09.900 |
every step along our way of what to push on next, 01:57:14.900 |
what we can productize, what the models are really good at 01:57:23.260 |
that's how we like to go our way through this. 01:57:25.560 |
We don't yet have employees that are based off of O1, 01:57:40.300 |
in our internal systems that help you with stuff. 01:57:47.660 |
that do a ton about answering external questions 01:57:50.220 |
and fielding internal people's questions on Slack 01:58:04.660 |
has talked extensively about all the different ways 01:58:10.300 |
a bunch of security things and take what used to be 01:58:16.100 |
the number of humans to even look at everything incoming 01:58:19.140 |
on models, taking, you know, separating signal from noise 01:58:26.260 |
So I think internally there are tons of examples 01:58:31.420 |
you all probably will not be surprised by this, 01:58:35.660 |
The extent to which it's not just using a model 01:58:38.300 |
in a place, it's actually about using chains of models 01:58:45.980 |
to get one end-to-end process that is very good 01:58:48.260 |
at the thing you're doing, even if the individual models 01:59:09.180 |
it's really cool that we can share our own models, 01:59:17.940 |
- We're open to it, it's not like a high priority 01:59:24.420 |
If we had, like, more resources and bandwidth, 01:59:29.620 |
we would go do that, I think there's a lot of reasons 01:59:43.740 |
- Hi, my question is, there are many agencies 01:59:50.460 |
and national level that could really greatly benefit 01:59:55.700 |
but have perhaps some hesitancy on deploying them 02:00:01.140 |
privacy concerns, and I guess I'm curious to know 02:00:05.260 |
if there are any sort of planned partnerships 02:00:13.140 |
because obviously if AGI can help solve problems 02:00:19.260 |
government's gonna have to get involved with that, right? 02:00:22.380 |
And I'm just curious to know if there is some, 02:00:25.780 |
you know, plan that works when that time comes. 02:00:29.300 |
- Yeah, I think, I actually think you don't wanna wait 02:00:36.100 |
with our current models, so we've even announced 02:00:38.180 |
a handful of partnerships with government agencies, 02:00:40.220 |
some states, I think Minnesota, some others, Pennsylvania, 02:00:48.980 |
to be able to help governments around the world 02:00:52.900 |
get acclimated, get benefit from the technology, 02:00:55.900 |
and of all places, government feels like somewhere 02:00:59.940 |
and make things more efficient, reduce drudgery, and so on. 02:01:03.260 |
So I think there's a huge amount of good we can do now, 02:01:05.820 |
and if we do that now, it just accrues over the long run 02:01:09.020 |
as the models get better and we get closer to AGI. 02:01:25.060 |
So whether that's open weights, just general discussion, 02:01:42.100 |
and then the really hard part is prioritization, 02:01:47.860 |
Part of it is there's such good open source models 02:01:56.660 |
the thing we always end in most high is a really great 02:02:07.620 |
but we want to find something that we feel like 02:02:09.540 |
if we don't do it, then we'll just be in the same space 02:02:11.380 |
and not make another thing that's a tiny bit better 02:02:13.820 |
on benchmarks, because we think there's a lot 02:02:17.460 |
of good stuff out there now, but spiritually, 02:02:31.220 |
All the live demos work, it's been incredible. 02:02:36.980 |
And as a follow-up to this, if it's a company, 02:02:39.420 |
like a legal issue in terms of copyright, et cetera, 02:02:41.940 |
is there a daylight between how you think about safety 02:02:44.180 |
in terms of your own products, on your own hardware, 02:02:58.580 |
- You know, the funny thing is Sam asked the same question. 02:03:15.340 |
And then there are things that it can't sing, 02:03:25.540 |
basically it's easier in finite time to say no, 02:03:29.020 |
and then build it in, but it's nuanced to get it right, 02:03:42.420 |
People were tired of waiting for us to ship Voice Mod, 02:03:46.660 |
We could have waited longer and kind of really got 02:03:48.980 |
the classifications and filters on copyrighted music 02:03:53.140 |
versus not, but we decided we would just ship it 02:03:56.220 |
But I think Sam has asked me like four or five times 02:04:09.500 |
you know, hot water developers or first party or whatever. 02:04:12.940 |
So yes, we can like maybe have some differences, 02:04:15.340 |
but we still have to be compliant with the law. 02:04:27.500 |
how you see things balance between context window growth 02:04:43.740 |
like long enough that you just throw stuff in there 02:04:49.420 |
pretty fast progress there, and that'll just be a thing. 02:04:57.420 |
But I think, you know, there's a bunch of reasons for that. 02:05:01.860 |
And then there's this other question of like, 02:05:07.620 |
When do we get to the point where you throw like 02:05:12.780 |
And you know, like that's a whole different set of things. 02:05:17.620 |
That obviously takes some research breakthroughs. 02:05:19.820 |
But I assume that infinite context will happen 02:05:26.900 |
And that's gonna be just a totally different way 02:05:40.060 |
You know, like, people will use that in all sorts of ways, 02:05:48.620 |
But yeah, the very, very long context I think is 02:05:54.620 |
I think we maybe have time for one or two more. 02:06:12.820 |
what do you see is the vision for the new engagement layer, 02:06:20.340 |
with this technology to make our lives so much better? 02:06:26.100 |
It's one that we ask ourselves a lot, frankly. 02:06:28.660 |
There's this, and I think it's one where developers 02:06:35.020 |
because there's this trade-off between generality 02:06:44.300 |
and I was in a number of conversations with folks 02:06:46.660 |
that, with whom I didn't have a common language, 02:06:50.820 |
Before, we would not have been able to have a conversation. 02:06:54.540 |
We would have just sort of smiled at each other 02:06:59.360 |
I said, "Junji P.T., I want you to be a translator for me. 02:07:03.020 |
"When I speak in English, I want you to speak in Korean. 02:07:05.120 |
"You hear Korean, I want you to repeat it in English." 02:07:07.740 |
And I was able to have a full business conversation, 02:07:14.060 |
not just for business, but think about travel and tourism 02:07:18.020 |
where they might not have a word of the language. 02:07:23.360 |
But inside ChatGBT, that was still a thing that I had to, 02:07:28.020 |
like, ChatGBT's not optimized for that, right? 02:07:32.940 |
you know, universal translator in your pocket 02:07:34.780 |
that just knows that what you want to do is translate. 02:07:41.300 |
we struggle with trying to build an application 02:07:45.020 |
that can do lots of things for lots of people, 02:07:48.440 |
and it keeps up, like we've been talking about a few times, 02:08:08.700 |
And ultimately, the world is a much better place 02:08:12.740 |
and it's why we are so proud to serve all of you. 02:08:15.240 |
- I, the only thing I would add is if you just think 02:08:20.300 |
at some point in not that many years in the future, 02:08:30.980 |
There'll be a video model streaming back to you, 02:08:32.860 |
like a custom interface just for this one request. 02:08:39.540 |
you'll be able to, like, click through the stream 02:08:41.620 |
or say different things, and it'll be off doing, like, 02:08:44.660 |
again, the kinds of things that used to take, like, 02:08:55.820 |
and also getting things to happen in the world 02:09:11.340 |
- That's all for our coverage of Dev Day 2024. 02:09:28.540 |
♪ Real-time voice streams reach new heights ♪