Building AGI in Real Time (OpenAI Dev Day 2024)

(upbeat music) ♪ A new dev daylights, code ignites ♪ ♪ Real time voice streams reach new heights ♪ ♪ O1 and GPT for O in flight ♪ ♪ Finds when the future data in sight ♪ ♪ Schema sync up, outputs precise ♪ ♪ Distill the models, efficiency spliced ♪ ♪ Pip sockets blaze, connections flow ♪ ♪ Voice AI live, watch innovation grow ♪ - Happy October.

This is your AI co-host, Charlie. One of our longest standing traditions is covering major AI and ML conferences in podcast format. Delving, yes delving, into the vibes of what it is like to be there stitched in with short samples of conversations with key players just to help you feel like you were there.

Covering this year's dev day was significantly more challenging because we were all requested not to record the opening keynotes. So in place of the opening keynotes, we had the viral notebook LM deep dive crew, my new AI podcast, Nemesis, give you a seven minute recap of everything that was announced.

Of course, you can also check the show notes for details. I'll then come back with an explainer of all the interviews we have for you today. Watch out and take care. All right, so we've got a pretty hefty stack of articles and blog posts here all about OpenAI's dev day 2024.

- Yeah, lots to dig into there. - Seems like you're really interested in what's new with AI. - Definitely. And it seems like OpenAI had a lot to announce. New tools, changes to the company, it's a lot. - It is, and especially since you're interested in how AI can be used in the real world, you know, practical applications, we'll focus on that.

- Perfect. So for example, this new real-time API, they announced that, right? That seems like a big deal if we want AI to sound, well, less like a robot. - It could be huge. The real-time API could completely change how we like interact with AI. Like imagine if your voice assistant could actually handle it if you interrupted it.

- Or like have an actual conversation. - Right, not just these clunky back and forth things we're used to. - And they actually showed it off, didn't they? I read something about a travel app, one for languages. Even one where the AI ordered takeout. Those demos were really interesting, and I think they show how this real-time API can be used in so many ways.

And the tech behind it is fascinating, by the way. It uses persistent web socket connections and this thing called function calling, so it can respond in real time. - So the function calling thing, that sounds kind of complicated. Can you like explain how that works? - So imagine giving the AI access to this whole toolbox, right, information, capabilities, all sorts of things.

So take the travel agent demo, for example. With function calling, the AI can pull up details, let's say about Fort Mason, right, from some database, like nearby restaurants, stuff like that. - Ah, I get it. So instead of being limited to what it already knows, it can go and find the information it needs like a human travel agent would.

- Precisely. And someone on Hacker News pointed out a cool detail. The API actually gives you a text version of what's being said, so you can store that, analyze it. - That's smart. It seems like OpenAI put a lot of thought into making this API easy for developers to use.

But while we're on OpenAI, you know, besides their tech, there's been some news about like internal changes too. Didn't they say they're moving away from being a nonprofit? - They did. And it's got everyone talking, it's a major shift, and it's only natural for people to wonder how that'll change things for OpenAI in the future.

I mean, there are definitely some valid questions about this move to for-profit, like will they have more money for research now? Probably. But will they, you know, care as much about making sure AI benefits everyone? - Yeah, that's the big question, especially with all the like the leadership changes happening at OpenAI too, right?

I read that their chief research officer left, and their VP of research, and even their CTO. - It's true. A lot of people are connecting those departures with the changes in OpenAI's structure. - And I guess it makes you wonder what's going on behind the scenes. But they are still putting out new stuff.

Like this whole fine-tuning thing really caught my eye. - Right, fine-tuning. It's essentially taking a pre-trained AI model and like customizing it. - So instead of a general AI, you get one that's tailored for a specific job. - Exactly. And that opens up so many possibilities, especially for businesses.

Imagine you could train an AI on your company's data, you know, like how you communicate your brand guidelines. - So it's like having an AI that's specifically trained for your company? - That's the idea. - And they're doing it with images now too, right? Fine-tuning with vision is what they called it.

It's pretty incredible what they're doing with that, especially in fields like medicine. - Like using AI to help doctors make diagnoses. - Exactly. An AI could be trained on like thousands of medical images, right? And then it could potentially spot things that even a trained doctor might miss. - That's kind of scary, to be honest.

What if it gets it wrong? - Well, the idea isn't to replace doctors, but to give them another tool, you know, help them make better decisions. - Okay, that makes sense. But training these AI models must be really expensive. - It can be. All those tokens add up. But OpenAI announced something called automatic prompt caching.

- Automatic what now? I don't think I came across that. - So basically, if your AI sees a prompt that it's already seen before, OpenAI will give you a discount. - Huh. Like a frequent buyer program for AI. - Kind of, yeah. It's good that they're trying to make it more affordable.

And they're also doing something called model distillation. - Okay, now you're just using big words to sound smart. What's that? - Think of it like a recipe, right? You can take a really complex recipe and break it down to the essential parts. - Make it simpler, but it still tastes the same.

- Yeah. And that's what model distillation is. You take a big, powerful AI model and create a smaller, more efficient version. - So it's like lighter weight, but still just as capable. - Exactly. And that means more people can actually use these powerful tools. They don't need like a supercomputer to run them.

- So they're making AI more accessible. That's great. - It is. And speaking of powerful tools, they also talked about their new O1 model. That's the one they've been hyping up. The one that's supposed to be this big leap forward. - Yeah, O1. It sounds pretty futuristic. Like from what I read, it's not just a bigger, better language model.

- Right, it's a different porch. - They're saying it can like actually reason, right? Think differently. - It's trained differently. They used reinforcement learning with O1. - So it's not just finding patterns in the data it's seen before. - Not just that. It can actually learn from its mistakes, get better at solving problems.

- So give me an example. What can O1 do that say GPT-4 can't? - Well, OpenAI showed it doing some pretty impressive stuff with math, like advanced math. - Yeah. - And coding too, complex coding. Things that even GPT-4 struggled with. - So you're saying if I needed to like write a screenplay, I'd stick with GPT-4.

But if I wanted to solve some crazy physics problem, O1 is what I'd use. - Something like that, yeah. Although there is a trade-off. O1 takes a lot more power to run. And it takes longer to get those impressive results. - Hmm, makes sense. More power, more time, higher quality.

- Exactly. - It sounds like it's still in development though, right? Is there anything else they're planning to add to it? - Oh yeah. They mentioned system prompts, which will let developers like set some ground rules for how it behaves. And they're working on adding structured outputs and function calling.

- Wait, structured outputs. Didn't we just talk about that? - We did. That's the thing where the AI's output is formatted in a way that's easy to use, like JSON. - Right, right. So you don't have to spend all day trying to make sense of what it gives you.

It's good that they're thinking about that stuff. - It's about making these tools usable. And speaking of that, Dev Day finished up with this really interesting talk. Sam Altman, the CEO of OpenAI, and Kevin Weil, their new chief product officer. They talked about like the big picture for AI.

- Yeah, they did, didn't they? Anything interesting come up? - Well, Altman talked about moving past this whole AGI term, artificial general intelligence. - I can see why. It's kind of a loaded term, isn't it? - He thinks it's become a bit of a buzzword and people don't really understand what it means.

- So are they saying they're not trying to build AGI anymore? - It's more like they're saying they're focused on just making AI better, constantly improving it, not worrying about putting it in a box. - That makes sense. Keep pushing the limits. - Exactly. But they were also very clear about doing it responsibly.

They talked a lot about safety and ethics. - Yeah, that's important. - They said they were gonna be very careful about how they release new features. - Good, because this stuff is powerful. - It is. It was a lot to take in, this whole Dev Day event, new tools, big changes at OpenAI, and these big questions about the future of AI.

- It was. But hopefully this deep dive helped make sense of some of it. At least that's what we try to do here. - Absolutely. - Thanks for taking the deep dive with us. - The biggest demo of the new real-time API involved function calling with voice mode and buying chocolate-covered strawberries from our friendly local OpenAI developer experience engineer and strawberry shop owner, Ilan Biggio.

We'll first play you the audio of his demo and then go into a little interview with him. - Fantastic. Could you place a call and see if you could get us 400 strawberries delivered to the venue? So please keep that under $1,500. - I'm on it. We'll get those strawberries delivered for you.

(phone ringing) - Hello. - Hi there. Is this Ilan? I'm Roman's AI assistant. How was about it? Fantastic. Could you tell me what flavors of strawberry dips you have available? - Yeah, we have chocolate, vanilla, and we have peanut butter. - Great. How much would 400 chocolate-covered strawberries cost?

- 400? Are you sure you want 400? - Yes. 400 chocolate-covered strawberries, please. How much would that be? - I think that'll be around like $1,415 with 92 cents. - Awesome. Let's go ahead and place the order for 400 chocolate-covered strawberries. - Great. When can we get it delivered?

- Please deliver them to the Gateway Pavilion in Fort Mason, and I'll be paying in cash. (audience laughing) - Okay, sweet. So just to confirm, you want 400 chocolate-covered strawberries to the Gateway Pavilion? - Yes, that's perfect. And when can we expect delivery? - Well, you guys are right nearby, so it'll be like, I don't know, 37 seconds?

(audience laughing) - That's incredibly fast. (audience laughing) - Cool. You too? - Sure. (audience applauding) - Hi, Ilan. Welcome to Lanespace. - Thank you. - Just saw your amazing demos, had your amazing strawberries. You are dressed up like exactly like a strawberry salesman. - Gotta have a little. - What was the building on demo like?

What was the story behind the demo? - It was really interesting. This is actually something I had been thinking about for months before the launch. Like having a AI that can make phone calls is something like I've personally wanted for a long time. And so as soon as we launched internally, like I started hacking on it.

And then that sort of just made it into like an internal demo. And then people found it really interesting. And then we thought how cool would it be to have this like on stage as one of the demos. - Yeah. Would you call out any technical issues building? Like you were basically one of the first people ever to build with the voice mode API.

Would you call any issues like integrating it with Twilio like that? Like you did with function calling, with like a form filling elements. I noticed that you had like intents of things to fulfill. And then when there was still missing info, the voice would prompt you role-playing the store guy.

- Yeah, yeah. So I think technically there's like the whole, just working with audio and streams is a whole different piece. Like even separate from like AI and this like new capabilities it's just tough. Yeah, when you have a prompt, conversationally it'll just follow like the, it was set up like kind of step-by-step to like ask the right questions based on like what the request was, right?

The function calling itself is sort of tangential to that. Like you have to prompt it to call the functions but then handling it isn't too much different from like what you would do with assistant streaming or like chat completion streaming. I think like the API feels very similar just like if everything in the API was streaming, it actually feels quite familiar to that.

- And then function calling wise, I mean, does it work the same? I don't know. Like I saw a lot of logs. You guys showed like in the playground a lot of logs. What is in there? What should people know? - Yeah, I mean, it is like the events may have different names than the streaming events that we have in chat completions but they represent very similar things.

It's things like, you know, function call started, argument started. It's like, here's like argument deltas and then like function call done. Conveniently, we send one with like has the full function and then I just use that. - Nice. - Yeah. - And then like what restrictions should people be aware of?

Like, you know, I think before we recorded, we discussed a little bit about the sensitivities around basically calling random store owners and putting like an AI on them. - Yeah, so there's, I think there's recent regulation on that which is why we wanna be like very, I guess aware of, you know, you can't just call anybody with AI, right?

That's like just robocalling. You wouldn't want someone just calling you with AI. - Yeah, yeah. So like, I'm a developer. I'm about to do this on random people. - Yeah. - What laws am I about to break? - I forget what the governing body is but you should, I think, having consent of the person you're about to call, it always works, right?

I, as the strawberry owner, have consented to like getting called with AI. I think past that, you wanna be careful. Definitely individuals are more sensitive than businesses. I think businesses, you have a little bit more leeway. Also they're like, businesses I think have an incentive to want to receive AI phone calls, especially if like they're dealing with it.

- It's doing business. - With AI phone calls, right? Like it's more business. It's kind of like getting on a booking platform, right? You're exposed to more. But I think it's still very much like a gray area. And so I think everybody should, you know, tread carefully, like figure out what it is.

I, the law is so recent. I didn't have enough time to like, I'm also not a lawyer. - Yeah, yeah, yeah, of course. Okay, cool. Fair enough. One other thing, this is kind of agentic. Did you use a state machine at all? Did you use any framework? - No, no.

- You just stick it in context and then just front end in a loop until it ends call? - Yeah, there isn't even a loop. Like because the API is just based on sessions, it's always just gonna keep going. Every time you speak, it'll trigger a call. And then after every function call, it was also invoking like a generation.

And so that is another difference here. It's like, it's inherently almost like in a loop, just by being in a session, right? No state machines needed. I'd say this is very similar to like, the notion of routines, where it's just like a list of steps and it's like sticks to them softly, but usually pretty well.

- And the steps is the prompts. - The steps, it's like the prompt, like the steps are in the prompts, right? It's like step one, do this, step one, step two, do that. - What if I wanna change the system prompts halfway through the conversation? - You can, you can.

To be honest, I have not played without two too much, but I know you can. - Yeah, awesome. I noticed that you called it real-time API, but not voice API. So I assume that it's like real-time API starting with voice, right? Then that's what he said on the thing.

I can't imagine, like what else is real-time? - Well, I guess to use ChatGPT's voice mode as an example, like we've demoed the video, right? Like real-time image, right? So I'm not actually sure what timelines are, but I would expect, if I had to guess, that like that is probably the next thing that we're gonna be making.

You'd probably have to talk directly with a team building this. - Sure, you can't promise their timelines. - Yeah, yeah, yeah, right, exactly. But like, given that this is the features that currently, that exists, that we've demoed on ChatGPT, yeah. - There will never be a case where there's like a real-time text API, right?

- Well, this is a real-time text API. You can do text only on this. - Oh. - Yeah. - I don't know why you would, but it's actually, so text-to-text here doesn't quite make a lot of sense. I don't think you'll get a lot of latency game, but like speech-to-text is really interesting because you can prevent, you can prevent responses like audio responses and force function calls.

And so you can do stuff like UI control that is like super, super reliable. We had a lot of like, you know, unlike we weren't sure how well this was gonna work because it's like, you have a voice answering, it's like a whole persona, right? Like that's a little bit more, you know, risky.

But if you like cut out the audio outputs and make it so it always has to output a function, like you can end up with pretty, pretty reliable, like commands, like a command architecture. - Yeah, actually that's how the, that's the way I want to interact with a lot of these things as well.

Like one-sided voice, not, don't force two-sided on me. - Yeah, you want to hear the voice back. And like, sometimes it's like, yeah, I think having an output voice is great, but I feel like I don't always want to hear an output voice. I'd say usually I don't, but yeah, exactly.

Being able to speak to it is super smooth. - Cool. Do you want to comment on any of the other stuff that you announced? From caching I noticed was like, I like the no code change part. I'm looking forward to the docs 'cause I'm sure there's a lot of details on like what you cache, how long you cache.

- Yeah. - 'Cause I can probably cache this for like five minutes. I was like, okay, but what if I don't make a call every five minutes? - Yeah, to be super honest with you, I've been so caught up with the real-time MPI and making the demo that I haven't read up on the other launches too much.

I mean, I'm aware of them, but I think I'm excited to see how all distillation works. That's something that we've been doing, like, I don't know, I've been like doing it between our models for a while and I've seen really good results. Like I've done back in the day, like from GPT-4 to GPT-3.5 and got like pretty much the same level of like function calling with like hundreds of functions.

- Yeah. - So that was super, super compelling. So I feel like easier distillation. That's something I'm really excited for. - I see. Is it a tool? So I saw evals. - Yeah. - Like what is the distillation product? It wasn't super clear to be honest. - I think I want to let that team, I want to let that team talk about it.

- Okay, great. Well, I appreciate you jumping on and amazing demo. It was beautifully designed. I'm sure that was part of you and Roman and. - Yeah, I guess shout out to like the first people to like creators of Wanderlust originally were like Simon and Karolis. And then like I took it and built the voice component and the voice calling components.

Yeah, so it's been a big team effort. And then like the entire PI team for like debugging everything as it's been going on. It's been so good. - Yeah, you are the first consumers on the UX team. - Yeah. - Yeah, I mean the classic role of what we do there.

- Yeah. - Okay, yeah. Anything else? Any other call section? - No, enjoy Dev Day. - Thank you. - Yeah. - That's it. - The latent space crew then talked to Olivia Godmont, head of product for the open AI platform, who led the entire Dev Day keynote and introduced all the major new features and updates that we talked about today.

Okay, so we are here with Olivia Godmont. - That's right. - I don't pronounce French. - That's fine, we're perfect. - And it was amazing to see your keynote today. What was the backstory of preparing something like this, preparing like Dev Day? - Essentially came from a couple of places.

Number one, excellent reception from last year's Dev Day. Developers, startup founders, researchers want to spend more time with open AI and we want to spend more time with them as well. And so for us, it was a no-brainer, frankly, to do it again, like a nice conference. The second thing is going global.

We've done a few events, like in Paris and a few other non-American countries. And so this year, we're doing SF, Singapore, and London to frankly just meet more developers. - Yeah, I'm very excited for the Singapore one. - Oh yeah. Will you be there? - I don't know, I don't know if I got an invite.

(laughs) I can't just say it to you. Yeah, yeah. (laughs) - Yeah, and then there was some speculation around October 1st. - Yeah. - Is it because O1, October 1st? - That has nothing to do. I discovered the tweet yesterday where people are so creative. (laughs) No, O1, there was no connection to October 1st.

But in hindsight, that would have been a pretty good meme, but yeah, no. - Okay. Yeah, and I think open AI's outreach to developers is something that I felt the whole in 2022 when people were trying to build a chat GPT and there was no function calling, all that stuff that you talked about in the past.

And that's why I started my own conference as like, here's our little developer conference thing. - Yeah. - But to see this opening at Dev Day now and to see so many developer-oriented products coming out of open AI, I think it's really encouraging. - Yeah, totally. It's a, that's what I said essentially, like developers are basically the people who make the best connection between the technology and the future, essentially.

Like, essentially see a capability, see a low-level technology, and are like, hey, I see how that application or that use case that can be enabled. And so in the direction of enabling AGI like for all of humanity, it's a no-brainer for us technically to partner with devs. - Yeah.

And most importantly, you almost never had waitlists, which compared to other releases, people usually have. What is the, you know, you had prompt caching, you had real-time voice API. We, you know, Sean did a long Twitter thread so people know the releases. - Yeah. - What is the thing that was like sneakily the hardest to actually get ready for Dev Day?

Or like, what was the kind of like, you know, last 24 hours, anything that you didn't know was gonna work? - Yeah, they're all fairly, like I would say, involved, like features to ship. So the team has been working for a month, all of them. The one which I would say is the newest for open AI is the real-time API.

For a couple of reasons. I mean, one, you know, it's a new modality. Second, like it's the first time that we have an actual like WebSocket-based API. And so I would say that's the one that required like the most work over the month to get right from a developer perspective and to also make sure that our existing safety mitigation that worked well with like real-time ODO in and ODO out.

- Yeah. What design choices or what was like the sort of design, design choices that you want to highlight? Like, you know, like, I think for me, like WebSockets, you just receive a bunch of events. It's two-way. I obviously don't have a ton of experience. I think a lot of developers are going to have to embrace this real-time programming.

Like, what are you designing for? Or like, what advice would you have for developers exploring this? - The core design hypothesis was essentially how do we enable like human-level latency? We did a bunch of tests. Like on average, like human beings, like, you know, it takes like something like 300 milliseconds to converse with each other.

And so that was a design principle, essentially. Like working backward from that and, you know, making your technology work. And so we evaluated a few options and WebSockets was the one that we landed on. So that was like one design choice. A few other like big design choices that we had to make, prompt caching.

Prompt caching, the design like target was automated from the get-go, like zero code change from the developer. That way you don't have to learn like what is a prompt prefix and, you know, how long does a cache work? Like we just do it as much as we can, essentially.

So that was a big design choice as well. And then finally on distillation, like an evaluation, the big design choice was something I learned as hype, like in my previous job. Like a philosophy around like a pit of success. Like what is essentially the minimum number of steps for the majority of developers to do the right thing?

'Cause when you do evals on fat tuning, there are many, many ways like to mess it up, frankly, like, you know, and have like a crappy model, like evals that tell like a wrong story. And so our whole design was, okay, we actually care about like helping people who don't have like that much experience, like evaluating model, like get like in a few minutes, like to a good spot.

And so how do we essentially enable that pit of success, like in the product flow? - Yeah. Yeah, I'm a little bit scared to fine tune, especially for vision, because I don't know what I don't know for stuff like vision, right? Like for text, I can evaluate pretty easily.

For vision, let's say I'm like trying to, one of your examples was Grab, which very close to home, I'm from Singapore. I think your example was like, they identified stop signs better. Why is that hard? Why do I have to fine tune that? If I fine tune that, do I lose other things?

You know, like there's a lot of unknowns with vision that I think developers have to figure out. - For sure. Vision is going to open up like a new, I would say evaluation space. 'Cause you're right, like it's harder, like to tell correct from incorrect essentially with images. What I can say is we've been alpha testing like the vision fine tuning like for several weeks at that point, we are seeing like even higher performance uplift compared to text fine tuning.

So that's, there is something here, like we've been pretty impressed, like in a good way, frankly, but you know how well it works. But for sure, like, you know, I expect the developers who are moving from one modality to like text and images will have like more testing, evaluation, like, you know, to set in place, like to make sure it works well.

- The model distillation and evals is definitely like the most interesting, moving away from just being a model provider to being a platform provider. How should people think about being the source of truth? Like do you want OpenAI to be like the system of record of all the prompting?

Because people sometimes store it in like different data sources. And then is that going to be the same as the models evolve? So you don't have to worry about, you know, refactoring the data, like things like that, like future model structures. - The vision is, if you want to be a source of truth, you have to earn it, right?

Like we're not going to force people like to pass us data if there is no value prop, like, you know, for us to store the data. The vision here is at the moment, like most developers like use like a one size fits all model like be off the shelf, like GP40 essentially.

The vision we have is fast forward a couple of years. I think like most developers will essentially like have an automated, continuous, fine-tuned model. The more like you use the model, the more data you pass to the model provider, like the model is automatically like fine-tuned, evaluated to get some eval sets.

And essentially like you don't have to every month when there is a new snapshot, like, you know, to go online and you know, try a few new things. That's the direction. We are pretty far away from it. But I think like that evaluation and decision product is essentially a first good step in that direction.

It's like, hey, if you are excited by the direction and you give us like evaluation data, we can actually log your completion data and start to do some automation on behalf. - And then you can do evals for free if you share data with OpenAI. How should people think about when it's worth it, when it's not?

Sometimes people get overly protective of their data when it's actually not that useful. But how should developers think about when it's right to do it, when not, or if you have any thoughts on it? - The default policy is still the same. Like, you know, we don't trade on like any API data unless you opt in.

What we've seen from feedback is evaluation can be expensive. Like if you're on like all one evals on like thousands of samples, like your bill will get increased, like, you know, pretty significantly. That's problem statement number one. Problem statement number two is essentially I want to get to a world where whatever OpenAI ships a new model snapshot, we have full confidence that there is no regression for the task that developer care about.

And for that to be the case, essentially we need to get evals. And so that essentially is a sort of a two birds, one stone. It's like we subsidize basically the evals and we also use the evals when we ship like new models to make sure that, you know, we keep going in the right direction.

So in my sense, it's a win-win, but again, like completely opt-in. I expect that, you know, many developers will not want to share their data and that's perfectly fine to me. - Yeah, I think free evals though, very, very good incentive. - Yeah. - I mean, it's a fair trade.

Like you get data, we get free evals. - Exactly, and we sanitize PII, everything, you know, like we have no interest in like the actual like sensitive data, we just want to have good evaluation on the real use cases. - Like I always want to eval the eval. I don't know if that ever came up.

Like sometimes the evals themselves are wrong. And then there's no way for me to tell you. - Everyone who is starting with LLM, tinkering with LLM is like, yeah, evaluation, easy. You know, I've done testing like all my life. And then you start to actually build the virus, understand like all the corner cases and you realize, wow, that's like a whole field itself.

So yeah, good evaluation is hard. And so, yeah. - Yeah, but I think there's a, you know, I just talked to Braintrust, which I think is one of your partners. They also emphasize code-based evals versus your sort of low-code. What I see is like, I don't know, maybe there's some more that you didn't demo, but what I see is like kind of like a low-code experience, right, for evals.

Would you ever support like a more code-based, like would I run code on OpenAI's eval platform? - For sure. I mean, we meet developers where they are. You know, at the moment, the demand was more for like, you know, easy to get started, like eval. But you know, if we need to expose like an evaluation API, for instance, for people like, you know, to pass like, you know, their existing test data, we'll do it.

So yeah, there is no, you know, philosophical, I would say like, you know, misalignment on that. - Yeah, yeah, yeah. What I think this is becoming, by the way, and I don't like, it's basically like you're becoming AWS, like the AI cloud. And I don't know if like, that's a conscious strategy or it's like, it doesn't even have to be a conscious strategy, like you're going to offer storage, you're going to offer compute, you're going to offer networking.

I don't know what networking looks like. Networking is maybe like caching, or like it's a CDN, it's a prompt CDN. But it's the AI versions of everything, right? Like, do you see the analogies or? - Yeah, totally. Whenever, whenever I talk to developers, I feel like good models are just half of the story to build a good app.

There's a ton more you need to do. Evaluation is a perfect example. Like, you know, you can have the best model in the world if you're in the dark, like, you know, it's really hard to gain the confidence. And so our philosophy is, that's number one. Number two, like the whole like software development stack is being basically reinvented, you know, with LLMs.

There is no freaking way that open AI can build everything. Like, there is just too much to build, frankly. And so my philosophy is, essentially we'll focus on like the tools, which are like the closest to the model itself. So that's why you see us, like, you know, investing quite a bit in like fine tuning, distillation, now evaluation.

'Cause we think that it actually makes sense to have like in one spot, like, you know, all of that. Like there is some sort of virtual circle, essentially, that you can set in place. But stuff like, you know, LLM apps, like tools which are like further away from the model, I don't know.

If you want to do like, you know, super elaborate, like home management or, you know, like tooling, like I'm not sure, like, you know, open AI has like such a big edge, frankly, like, you know, to build that sort of tools. So that's how we view it at the moment.

But again, frankly, like the philosophy is like super simple. The strategy is super simple. It's meeting developers where they want us to be. And so, you know, that's frankly like, you know, day in, day out, like, you know, what I try to do. Cool. Thank you so much for the time.

I'm sure you... - Yeah, I have more questions on... - Yeah, yeah, yeah. - A couple of questions on voice and then also like your call to action, like what you want feedback on, right? So, but I think we should spend a bit more time on voice 'cause I feel like that's like the big splash thing.

I talked to, well, I mean, just like, what is the future of real time for open AI? - Yeah. - 'Cause I think obviously video is next. You already have it in the ChatGBT desktop app. Do we just like have a permanent life? Like, you know, like, are developers just gonna be like sending sockets back and forth with open AI?

Like how do we program for that? Like what is the future? - Yeah, that makes sense. I think with multi-modality, like real time is quickly becoming like, you know, essentially the right experience, like to build an application. So my expectation is that we'll see like a non-trivial, like volume of applications, like moving to a real time API.

Like if you zoom out, like audio is a really simple, like audio until basically now, audio on the web in apps was basically very much like a second class citizen. Like you basically did like an audio chatbot for users who did not have a choice. You know, they were like struggling to read, or I don't know, they were like not super educated with technology.

And so frankly, it was like the crappy option, you know, compared to text. But when you talk to people in the real world, the vast majority of people like prefer to talk and listen instead of typing and writing. - We speak before we write. - Exactly. I don't know.

I mean, I'm sure it's the case for you in Singapore. For me, my friends in Europe, the number of like WhatsApp, like voice notes that you receive every day. I mean, just people, it makes sense frankly, like, you know. - Chinese. - Chinese, yeah. - Yeah, all voice. - You know, it's easier.

There is more emotions. I mean, you know, you get the point across like pretty well. And so my personal ambition for like the real-time API and like audio in general is to make like audio and like multimodality like truly a first-class experience. Like, you know, if you're like, you know, the amazing, like super bold, like startup out of YC, you're going to be like the next like billion, like, you know, user application to make it like truly audio first and make it feel like, you know, an actual good, like, you know, perfect experience.

So that's essentially the ambition. And I think like, yeah, it could be pretty big. - Yeah. I think one issue that people have with the voice so far as release in advanced voice mode is the refusals. - Yeah. - You guys had a very inspiring model spec. I think Joanne worked on that, where you said like, yeah, we don't want to overly refuse all the time.

In fact, like even if like not safe for work, like in some occasions, it's okay. - Yeah. - How is there an API that we can say, not safe for work, okay. - I think we'll get there. I think we'll get there. The model spec like nailed it. Like, you know.

- It nailed it, it's so good. - Yeah. We're not in the business of like policing, you know, if you can say like vulgar words or whatever. You know, there are some use cases like, you know, I'm writing like a Hollywood script. I want to say like vulgar words, it's perfectly fine.

You know? And so I think the direction where we'll go here is that basically there will always be like, you know, a set of behavior that will, you know, just like forbid frankly, because they're illegal against our terms of services. But then there will be like, you know, some more like risky like themes, which are completely legal, like, you know, vulgar words or, you know, not safe for work stuff.

Um, where basically we'll expose like a controllable, like safety, like knobs in the API to basically allow you to say, hey, that theme, okay. That theme not okay. How sensitive do you want the threshold to be on safety refusals? I think that's the direction. - So a safety API.

- Yeah. In a way, yeah. - Yeah. We have, we've never had that. - Yeah. - Because right now it's you, it's whatever you decide. And then it's, that's it. That, that, that would be the main reason I don't use opening a voice. - Yeah. - It's because of.

- It's overpoliced at the moment. - Over refusal, over refusals. - Yeah, yeah, yeah. - No, we got to fix that. - Yeah. - Like singing. - Like singing. - We're trying to do voice-carrying. - So every singer, and you've been locked off singing. - Yeah, yeah, yeah. - But I understand music gets you in trouble.

(laughing) Okay. Yeah. So, and then just generally like, what do you want to hear from developers? Like we have all developers watching, you know, what feedback do you want, anything specific as well, like from, especially from today, anything that you are unsure about that you're like, our feedback could really help you decide.

- For sure. - I think, essentially it's becoming pretty clear after today that, you know, I would say the open-ended actions become pretty clear, like, you know, after today. Investment in reasoning, investment in multimodality, investment as well, like in, I would say, tool use, like function calling. To me, the biggest question I have is, you know, where should we put the cursor next?

I think we need all three of them, frankly, like, you know, so we'll keep pushing. - Hire 10,000 people, or actually no need, build a bunch of bots. (laughing) And so, let's take O1, for instance, like is O1 smart enough, like for your problems? Like, you know, let's set aside for a second the existing models, like for the apps that you would love to build, is O1 basically it in reasoning, or do we still have, like, you know, a step to do?

- Preview is not enough, I need the full one. - Yeah, so that's exactly that sort of feedback. Essentially, what I would love to do is for developers, I mean, there's a thing that Sam, which has been saying, like, over and over again, like, you know, it's easier said than done, but I think it's directionally correct.

As a developer, as a founder, you basically want to build an app which is a bit too difficult for the model today, right? Like, what you think is right, it's like sort of working, sometimes not working, and that way, you know, that basically gives us, like, a goalpost, and be like, okay, that's what you need to enable with the next model release, like in a few months.

And so, I would say that usually, like, that's the sort of feedback which is, like, the most useful, that I can, like, directly, like, you know, incorporate. - Awesome. I think that's our time. - Thank you so much, guys. - Yeah, thank you so much. - Yeah, thank you.

- We were particularly impressed that Olivier addressed the not safe for work moderation policy question head on, as that had only previously been picked up on in Reddit forums. This is an encouraging sign that we will return to in the closing candor with Sam Altman at the end of this episode.

Next, a chat with Roman Hewitt, friend of the pod, AI Engineer World's fair-closing keynote speaker, and head of developer experience at OpenAI on his incredible live demos, and advice to AI engineers on all the new modalities. - All right, we're live from OpenAI Dev Day. We're with Roman, who just did two great demos on stage, and he's been a friend of Later Space, so thanks for taking some of the time.

- Of course, yeah, thank you for being here and spending the time with us today. - Yeah, appreciate. Appreciate you guys putting this on. I know it's, like, extra work, but it really shows the developers that you care about reaching out. - Yeah, of course. I think when you go back to the OpenAI mission, I think for us, it's super important that we have the developers involved in everything we do, making sure that, you know, they have all of the tools they need to build successful apps.

And we really believe that the developers are always gonna invent the ideas, the prototypes, the fun factors of AI that we can't build ourselves. So it's really cool to have everyone here. - We had Michelle from you guys on, and she very-- - Great episode. - Thank you. She very seriously said, "API is the path to AGI." - Correct.

- And people in our YouTube comments were like, "API is not AGI." I'm like, no, like, she's very serious. API is the path to AGI because, like, you're not gonna build everything like the developers are, right? - Of course, yeah, that's the whole value of having a platform and an ecosystem of amazing builders who can, like, in turn, create all of these apps.

I'm sure we talked about this before, but there's now more than 3 million developers building on OpenAI, so it's pretty exciting to see all of that energy into creating new things. - Yeah. - I was gonna say, you built two apps on stage today, an international space station tracker, and then a drone command.

The hardest thing must have been opening Xcode and setting that up. Now, like, the models are so good that they can do everything else. - Yes. - You had two modes of interaction. You had kind of like a GPT app to get the plan with a one, and then you had cursor to do apply some of the changes.

- Correct. - How should people think about the best way to consume the coding models, especially, both for, you know, brand new projects and then existing projects that they're trying to modify instead? - Yeah, I mean, one of the things that's really cool about O1 Preview and O1 Mini being available in the API is that you can use it in your favorite tools, like cursor, like I did, right?

And that's also what, like, Devin from Cognition can use in their own software engineering agents. In the case of Xcode, like, it's not quite deeply integrated in Xcode, so that's why I had, like, chat GPT side-by-side. - Copy-paste. - But it's cool, right, because I could instruct O1 Preview to be, like, my coding partner and brainstorming partner for this app, but also consolidate all of the files and architect the app the way I wanted.

So all I had to do was just, like, port the code over to Xcode, and zero shot, the app built. I don't think I conveyed, by the way, how big a deal that is, but, like, you can now create an iPhone app from scratch, describing a lot of intricate details that you want, and your vision comes to life in, like, a minute.

- Yeah. - It's pretty outstanding. - I have to admit, I was a bit skeptical, because if I open up Xcode, I don't know anything about iOS programming. You know which file to paste it in. You probably set it up a little bit. So I'm like, I have to go home and test it to, like, figure it out.

And I need the chat GPT desktop app so that it can tell me where to click. - Yeah, I mean, like, Xcode and iOS development has become easier over the years since they introduced Swift and SwiftUI. I think back in the days of Objective-C, or, like, you know, the storyboard, it was a bit harder to get in for someone new, but now with Swift and SwiftUI, their dev tools are really exceptional.

But now, when you combine that with O1, as your brainstorming and coding partner, it's like your architect, effectively. That's the best way, I think, to describe O1. People ask me, like, "Well, can GPT-4 do some of that?" And it certainly can. But I think it will just start spitting out code, right?

And I think what's great about O1 is that it can, like, make up a plan. In this case, for instance, the iOS app had to fetch data from an API. It had to look at the docs. It had to look at, like, how do I parse this JSON? Where do I store this thing?

And kind of wire things up together. So that's where it really shines. - Is Mini or Preview the better model that people should be using? Like, how? - I think people should try both. We are obviously very excited about the upcoming O1 that we shared the evals for. But we noticed that O1 Mini is very, very good at everything math, coding, everything STEM.

If you need for your kind of brainstorming or your kind of science part, you need some broader knowledge than reaching for O1 Preview is better. But, yeah, I used O1 Mini for my second demo, and it worked perfectly. All I needed was very much, like, something rooted in code, architecting and wiring up, like, a front-end, a back-end, some UDP packets, some WebSockets, something very specific, and it did that perfectly.

- And then maybe just talking about Voice and Wanderlust, the app that keeps on giving. - It does, indeed, yeah. - What's the backstory behind, like, preparing for all that? - You know, it's funny, 'cause when last year for Dev Day, we were trying to think about what could be a great demo app to show, like, an assistive experience.

I've always thought travel is a kind of a great use case, 'cause you have, like, pictures, you have locations, you have the need for translations, potentially. There's, like, so many use cases that are bounded to travel that I thought last year, let's use a travel app. And that's how Wanderlust came to be.

But of course, a year ago, all we had was a text-based assistant. And now we thought, well, if there's a voice modality, what if we just bring this app back as a Wink, and what if we were interacting better with voice? And so with this new demo, what I showed was the ability to, like, have a complete conversation in real time with the app, but also the thing we wanted to highlight was the ability to call tools and functions, right?

So, like, in this case, we placed a phone call using the Twilio API, interfacing with our AI agents, but developers are so smart that they'll come up with so many great ideas that we could not think ourselves, right? But what if you could have, like, a 911 dispatcher? What if you could have, like, a customer service center that is much smarter than what we've been used to today?

There's gonna be so many use cases for real time. It's awesome. - Yeah, and sometimes, actually, you, like, this should kill phone trees. Like, there should not be, like, dial one. - Of course. (speaking in foreign language) - Yeah, exactly. (speaking in foreign language) I mean, even you starting speaking Spanish would just do the thing, you know?

You don't even have to ask. So, yeah, I'm excited for this future where we don't have to interact with those legacy systems. - Yeah, yeah. Is there anything, so, you're doing function calling in a streaming environment. So, basically, it's WebSockets, it's UDP, I think. It's basically not guaranteed to be exactly once delivery.

Like, is there any coding challenges that you encountered when building this? - Yeah, it's a bit more delicate to get into it. We also think that, for now, what we shipped is a beta of this API. I think there's much more to build onto it. It does have the function calling and the tools, but we think that, for instance, if you wanna have something very robust on your client side, maybe you wanna have WebRTC as a client, right?

And as opposed to, like, directly working with the sockets at scale. So, that's why we have partners like LifeKit and Agora if you wanna use them. And I'm sure we'll have many more in the future. But, yeah, we keep on iterating on that, and I'm sure the feedback of developers in the weeks to come is gonna be super critical for us to get it right.

- Yeah, I think LifeKit has been fairly public that they are used in the ChatGPT app. Like, is it just all open source, and we just use it directly with OpenAI, or do we use LifeKit Cloud or something? - So, right now, we released the API. We released some sample code, also, and referenced clients for people to get started with our API.

And we also partnered with LifeKit and Agora, so they also have their own, like, ways to help you get started. That plugs natively with the real-time API. So, depending on the use case, people can decide what to use. If you're working on something that's completely client, or if you're working on something on the server side for the voice interaction, you may have different needs.

So, we wanna support all of those. - I know you've got a run. Is there anything that you want the AI engineering community to give feedback on specifically? Like, even down to, like, you know, a specific API endpoint, or, like, what's, like, the thing that you want? - Yeah, absolutely.

I mean, you know, if we take a step back, I think Dev Day this year is a little different from last year, and in a few different ways. But one way is that we wanted to keep it intimate, even more intimate than last year. We wanted to make sure that the community is on the spotlight.

That's why we have community talks and everything. And the takeaway here is, like, learning from the very best developers and AI engineers. And so, you know, we wanna learn from them. Most of what we shipped this morning, including things like prompt caching, the ability to generate prompts quickly in the playground, or even things like vision fine-tuning.

These are all things that developers have been asking of us. And so the takeaway I would leave them with is to say, like, hey, the roadmap that we're working on is heavily influenced by them, and they work. And so we love feedback from high feature requests, as you say, down to, like, very intricate details of an API endpoint.

We love feedback. So, yes, that's how we build this API. - Yeah, I think the model distillation thing as well, it might be, like, the most boring, but, like, actually used a lot. - True, yeah. And I think maybe the most unexpected, right? Because I think if I read Twitter correctly the past few days, a lot of people were expecting us to ship the real-time API for speech-to-speech.

I don't think developers were expecting us to have more tools for distillation. And we really think that's gonna be a big deal, right? If you're building apps that have, you know, you want high, like, low-latency, low-cost, but high-performance, high-quality on the use case, distillation is gonna be amazing. - Yeah, I sat in the distillation session just now, and they showed how they distilled from 4.0 to 4.0 mini, and it was, like, only, like, a 2% hit in the performance, and 15x cheaper.

- Yeah, yeah, I was there as well for the super-human kind of use case inspired for an e-mail client. Yeah, this was really good. - Cool, man. - Amazing, thank you so much. Thanks again for being here today. It's always great to have you. - As you might have picked up at the end of that chat, there were many sessions throughout the day focused on specific new capabilities, like the new model distillation features, combining evals and fine-tuning.

For our next session, we are delighted to bring back two former guests of the pod, which is something listeners have been greatly enjoying in our second year of doing the "Latent Space" podcast. Michelle Pokras of the API team joined us recently to talk about structured outputs, and today gave an updated long-form session at Dev Day, describing the implementation details of the new structured output mode.

We also got her updated thoughts on the Voice Mode API we discussed in her episode, now that it is finally announced. She is joined by friend of the pod and super-blogger, Simon Willison, who also came back as guest co-host in our Dev Day 2023 episode. - Great, we're back live at Dev Day, returning guest, Michelle, and then returning guest co-host, Four.

- Four for sure, yeah, I don't know. - I've lost count. - I've lost count. - It's been a few, but- - Simon Willison is back. Yeah, we just wrapped everything up. Congrats on getting everything live. Simon did a great live blog, so if you haven't caught up- - I wrote my, I implemented my live blog while waiting for the first talk to start, using like a GPT-4, I wrote me the JavaScript, and I got that live just in time, and then, yeah, I was live blogging the whole day.

- Are you a cursor enjoyer? - I haven't really gotten to cursor yet, to be honest. Like, I just haven't spent enough time for it to click, I think. - Yeah. - I'm more a copy and paste things out to Claude and chat GPT. - Yeah, it's interesting. Yeah, I've converted to cursor for, and O1 is so easy to just toggle on and off.

- Yeah. - What's your workflow? - Copy-paste. - Okay, I'm gonna be real, I'm still in VS Code co-pilot, so- - Same here, team co-pilot. - Co-pilot is actually the reason I joined OpenAI, it was, you know, before chat GPT, this is the thing that really got me, so I'm still into it, but I keep meaning to try out cursor, and I think, now that things have calmed down, I'm gonna give it a real go.

- Yeah, it's a big thing to change your tool of choice. - Yes, yeah, I'm pretty dialed, so. - Yeah, I mean, you know, if you want, you can just fork VS Code and make your own. (laughing) That's the thing to do. - It's the done thing, right? - Yeah.

- No jokes about doing a hackathon where the only thing you do is fork VS Code, and bet me the best fork win. - Nice. (laughing) - It's actually a really good idea. - Yeah, so, I mean, congrats on launching everything today. I know, like, we touched on it a little bit, but everyone was kinda guessing that Voice API was coming, and we talked about it in our episode.

How do you feel going into the launch? Like, any design decisions that you wanna highlight? - Yeah, super jazzed about it. The team has been working on it for a while, it's like a very different API for us. It's the first WebSocket API, so a lot of different design decisions to be made.

It's like, what kind of events do you send? When do you send an event? What are the event names? What do you send, like, on connection versus on future messages? So there've been a lot of interesting decisions there. The team has also hacked together really cool projects as we've been testing it.

One that I really liked is we had an internal hackathon for the API team, and some folks built, like, a little hack that you could use Vim with Voice Mode. So, like, control Vim, and you would tell the model, like, write a file, and it would, you know, know all the Vim commands and type those in.

So, yeah, a lot of cool stuff we've been hacking on, and really excited to see what people build with it. - I've gotta call out a demo from today. I think it was Katja had a 3D visualization of the solar system, like WebGL solar system you could talk to.

That is one of the coolest conference demos I've ever seen. That was so convincing. I really want the code. I really want the code for that to get put out there. - I'll talk to the team. I think we should probably put it up. - Absolutely beautiful example. And it made me realize that the Realtime API, this WebSocket API, it means that building a website that you can just talk to is easy now.

It's like, it's not difficult to build, spin up a web app where you have a conversation with it, it calls functions for different things, it interacts with what's on the screen. I'm so excited about that. There are all of these projects I thought I'd never get to, and now I'm like, you know what?

Spend a weekend on it. I can have a talk to your data, talk to your database with a little web application. - Yeah. - That's so cool. - Chat with PDF, but really chat with it. - Really chat with PDF, go completely. - Totally. - And that's not even hard to build.

That's the crazy thing about this. - Yeah, very cool. Yeah, when I first saw the space demo, I was actually just wowed, and I had a similar moment, I think, to all the people in the crowd. I also thought Roman's drone demo was super cool. - That was a super fun one as well.

- Yeah, I actually saw that live this morning, and I was holding my breath, for sure. - Knowing Roman, he probably spent the last two days of the week working on it. (laughs) But yeah, I'm curious about, you were talking with Roman actually earlier about what the different levels of extraction are with WebSockets.

It's something that most developers have zero experience with. I have zero experience with it. Apparently, there's the RTC level, and then there's the WebSocket level, and there's levels in between. - Not so much. I mean, with WebSockets, with the way they've built their API, you can connect directly to the OpenAI WebSocket from your browser, and it's actually just regular JavaScript.

You instantiate the WebSocket thing. It looks quite easy from their example code. The problem is that if you do that, you're sending your API key from source code that anyone can view. - Yeah, we don't recommend that for production. - So it doesn't work for production, which is frustrating because it means that you have to build a proxy.

So I'm gonna have to go home and build myself a little WebSocket proxy just to hide my API key. I want OpenAI to do that. I want OpenAI to solve that problem for me so I don't have to build the 1,000th WebSocket proxy just for that one problem. - Totally.

We've also partnered with some partner solutions. We've partnered with, I think, Agora, LiveKit, a few others. So there's some loose solutions there, but yeah, we hear you. It's a beta. - Yeah, yeah. I mean, you still want a solution where someone brings their own key and they can trust that you don't get it, right?

- Kind of. I mean, I've been building a lot of bring-your-own-key apps where it's my HTML and JavaScript, I store the key in local storage in their browser, and it never goes anywhere near my server, which works, but how do they trust me? How do they know I'm not gonna ship another piece of JavaScript that steals the key from me?

- And so nominally, this actually comes with the crypto background. This is what MetaMask does. - Yeah, it's a public/private key thing. - Yeah. Like, why doesn't OpenAI do that? I don't know if, obviously, it's-- - I mean, as with most things, I think there's some really interesting question and a really interesting reason, and the answer is just, you know, it's not been the top priority, and it's hard for a small team to do everything.

I have been hearing a lot more about the need for things like sign-in with OpenAI. - I want OAuth. I want to bounce my users through chat GPT, and I get back a token that lets me spend up to $4 on the API on their behalf. That would solve it.

Then I could ship all of my stupid little experiments, which currently require people to copy and paste the API key in, which cuts off everyone. Nobody knows how to do that. - Totally, I hear you. Something we're thinking about, and yeah, stay here. - Yeah, yeah. Right now, I think the only player in town is OpenRouter.

That is basically, it's funny. It was made by, I forget his name, but he used to be CTO of OpenSea, and the first thing he did when he came over was build MetaMask for AI. - Totally. Yeah, very cool. What's the most underrated release from today? - Vision Fine Tuning.

Vision Fine Tuning is so underrated. For the past two months, whenever I talk to founders, they tell me this is the thing they need most. A lot of people are doing OCR on very bespoke formats, like government documents, and Vision Fine Tuning can help a lot with that use case.

Also, bounding boxes. People have found a lot of improvements for bounding boxes with Vision Fine Tuning. So yeah, I think it's pretty slept on, and people should try it. You only really need 100 images to get going. - Tell me more about bounding boxes. I didn't think GPT-4 Vision could do bounding boxes at all.

- Yeah, it's actually not that amazing at it. We're working on it, but with fine tuning, you can make it really good for your use case. - That's cool, 'cause I've been using Google Gemini's bounding box stuff recently. It's very, very impressive. - Yeah, totally. - But being able to fine tune a model for that.

The first thing I'm gonna do with fine tuning for images is I've got five chickens, and I'm gonna fine tune a model that can tell which chicken is which, which is hard, 'cause three of them are gray. - Yeah. - So there's a little bit of-- - Okay, this is my new favorite use case.

This is awesome. - Yeah, it's, I've managed to do it with prompting, just like I gave Claude pictures of all of the chickens, and then said, "Okay, which chicken is this?" - Yeah. - But it's not quite good enough, 'cause it confuses the gray chicken. - Listen, we can close that eval gap.

- Yeah, that's, it's gonna be a great eval. Like, my chicken eval's gonna be fantastic. - I'm also really jazzed about the evals product. It's kind of like a sub-launch of the distillation thing, but people have been struggling to make evals, and the first time I saw the flow with how easy it is to make an eval in our product, I was just blown away.

So recommend people really try that. I think that's what's holding a lot of people back from really investing in AI, 'cause they just have a hard time figuring out if it's going well for their use case. So we've been working on making it easier to do that. - Does the eval product include structure output testing, like function calling and things?

- If it matches your JSON schema, yeah. - No, I mean, we have guaranteed structured output anyway. - Right, so we don't have to test it. - Well, not the schema, but like the performance. - I think so, yeah. - So you might call the wrong function, or. - Oh, I see.

- You're gonna write schema, wrong output. - So you can do function calling testing. - I'm pretty sure. I'll have to check that for you, but I think so. - Well, we'll make sure it's sent out. - Fun fact, after our podcast, they released function calling v3, which is multi-turn function calling benchmarks.

- We're having the guy on the podcast as well, sorry? - Are you saying the BFCL? - BFCL. - Yeah, yeah. - What would you ask the BFCL guys, 'cause we're actually having them next on the podcast. - Yeah. Yeah, I think we tried out v3. - It's just multi-turn.

- We have some feedback from the founder. We should probably cut this, but we wanna make it better. What, like, how do you think about the evolution of like the API design? I think to me, that's like the most important thing. So even with the open AI levels, like chatbots, I can understand what the API design looks like.

Reasoning, I can kind of understand it, even though like chain of thought kind of changes things. As you think about real-time voice, and then you think about agents, it's like, how do you think about how you design the API, and like what the shape of it is? - Yeah, so I think we're starting with the lowest level capabilities, and then we build on top of that as we know that they're useful.

So a really good example of this is real-time. We're actually going to be shipping audio capabilities in chat completions. So this is like the lowest level capability. So you supply in audio, and you can get back raw audio, and it works at the request response layer. But in through building advanced voice mode, we realized ourselves that like it's pretty hard to do with something like chat completions.

And so that led us to building this WebSocket API. So we really learned a lot from our own tools, and we think the chat completions thing is nice, and for certain use cases are async stuff, but you're really gonna want a real-time API. And then as we test more with developers, we might see that it makes sense to have another layer of abstraction on top of that, something like closer to more client-side libraries.

But for now, that's where we feel we have a really good point of view. - So that's a question I have is, if I've got a half hour long audio recording, at the moment, the only way I can feed that in is if I call the WebSocket API and slice it up into little JSON basics for snippets and fire them all over.

In that case, I'd rather just give you a, like an image in the chat completion API, give you a URL to my MP3 files and input. Is that something-- - That's what we're gonna do. - Oh, thank goodness for that. - Yes, it's in the blog post. I think it's a short one-liner, but it's rolling out, I think, in the coming weeks.

- Oh, wow, oh, really soon then. - The team has been sprinting, we're just putting finishing touches on stuff. - Do you have a feel for the length limit on that? - I don't have it off the top. - Okay. - I'm sorry. - 'Cause yeah, often I want to do, I do a lot of work with transcripts of hour-long YouTube videos, which currently, I run them through Whisper and then I do the transcript that way, but being able to do the multimodal thing, those would be really useful.

- Totally, yeah, we're really jazzed about it. We wanna basically give the lowest capabilities we have, lowest level capabilities, and the things that make it easier to use. And so, targeting kind of both, yeah. - I just realized what I can do, though, is I do a lot of Unix utilities, little Unix things.

I want to be able to pipe the output of a command into something which streams that up to the WebSocket API and then speaks it out loud. So I can do streaming speech of the output of things. That should work. Like, I think you've given me everything I need for that.

That's cool. - Yeah, excited to see what you build. - I heard there are multiple competing solutions, and you guys eval it before you pick WebSockets, like server-set events, polling. Can you give your thoughts on the live updating paradigms that you guys looked at? 'Cause I think a lot of engineers have looked at stuff like this.

- Well, I think WebSockets are just a natural fit for bidirectional streaming. Other places I've worked, like Coinbase, we had a WebSocket API for pricing data, and I think it's just a very natural format. - So it wasn't even really that controversial at all? - I don't think it was super controversial.

I mean, we definitely explored the space a little bit, but I think we came to WebSockets pretty quickly, yeah. - Cool. Video? - Yeah, not yet, but, you know, possible in the future. - I actually was hoping for the ChatGPT desktop app with video today, because that was demoed.

- Yeah, this is dev day. - Yeah. - I think the moment we have the ability to send images over the WebSocket API, we get video. - My question is, how frequently? Yeah, because, yeah, I mean, sending a whole video frame of like a 1080p screen, maybe it might be too much.

What's the limitations on a WebSocket chunk going over? I don't know. - I don't have that off the top. - Like Google Gemini, you can do an hour's worth of video in their context window, just by slicing it up into one frame at 10 frames a second, and it does work.

So I don't know, but then that's the weird thing about Gemini is it's so good at you just giving it a flood of individual frames, it'll be interesting to see if GPT 4.0 can handle that or not. - Do you have any more feature requests? I know it's been a long day for everybody, but you got me a show right here, so.

- I want you to do all of the accounting for me. I want my users to be able to run my apps, and I want them to call your APIs with their user ID and have you go, oh, they've spent 30 cents, cut them off at a dollar, I can check how much they spent, all of that stuff, 'cause I'm having to build that at the moment, and I really don't want to.

I don't want to be a token accountant. I want you to do the token accounting for me. - Yeah, totally, I hear you. It's good feedback. - Well, how does that contrast with your actual priorities? Like, I feel like you have a bunch of priorities. They showed some on stage with multimodality and all that.

- Yeah. Yeah, it's hard to say. I would say things change really quickly. Things that are big blockers for user adoption we find very important, and yeah. It's a rolling prioritization, yeah. - No Assistance API update. - Not at this time, yeah. - Yeah. - I was hoping for an all-one native thing in Assistance.

- Yeah. - 'Cause I thought they would go well together. - We're still kind of iterating on the formats. I think there are some problems with the Assistance API, some things it does really well, and I think we'll keep iterating and land on something really good, but just, you know, it wasn't quite ready yet.

Some of the things that are good in the Assistance API is hosted tools. People really like hosted tools, and especially RAG. And then some things that are less intuitive is just how many API requests you need to get going with the Assistance API. - It's quite. - It's quite a lot.

Yeah, you gotta create an Assistant, you gotta create a thread, you gotta do all this stuff. So yeah, it's something we're thinking about. It shouldn't be so hard. - The only thing I've used it for so far is Code Interpreter. It's like it's an API to Code Interpreter. - Yes.

- Crazy exciting. - Yes, we wanna fix that and make it easier to use. - I want Code Interpreter over WebSockets. That would be wildly interesting. - Yeah, do you wanna bring your own Code Interpreter or you wanna use OpenAI as well? - I wanna use that 'cause Code Interpreter's a hard problem.

Sandboxing and all of that stuff is. - Yeah, but there's a bunch of Code Interpreter as a service things out there. - There are a few now, yeah. - Because there's, I think you don't allow arbitrary installation of packages. - Oh, they do. - Let's. - They really do.

- Let's use your hack. - Yeah, and I do. - Yeah, you can upload a PIP package. - You can run, you can compile C code in Code Interpreter. - I know. - If you know how to do it. - That's a hack, that's a hack. - That's a glorious hack, though.

- Okay. - I've had it write me custom SQLite extensions in C and compile them and run them inside of Python and it works. (laughing) - I mean, yeah, there's others, E2B is one of them. Like, yeah, it'll be interesting to see what the real-time version of that would be.

Yeah. - Also, Michel, thank you for the update. We left the episode as what will voice mode look like? - Obviously, you knew what it looked like, but you didn't say it, so now you could. - Yeah, here we are. Hope you guys find it. - Yeah. - Cool.

- Awesome, thank you. - That's it. - Our final guest today, and also a familiar recent voice on the Latent Space pod, presented at one of the community talks at this year's Dev Day. Alastair Pullen of Cosene made a huge impression with all of you. Special shout out to listeners like Jesse from Morph Labs when he came on to talk about how he created synthetic data sets to fine-tune the largest Loras that had ever been created for GPT-4.0 to post the highest-ever scores on SuiBench and SuiBench Verified, while not getting recognition for it because he refused to disclose his reasoning traces to the SuiBench team.

Now that OpenAI's R1 preview is announced, it is incredible to see the OpenAI team also obscure their chain-of-thought traces for competitive reasons and still perform lower than Cosene's genie model. We snagged some time with Ali to break down what has happened since his episode aired. Welcome back, Ali. - Thank you so much.

Thanks for having me. - So you just spoke at OpenAI Dev Day. What was the experience like? Did they reach out to you? You seem to have a very close relationship. - Yeah, so off the back of the work that we've done that we spoke about last time we saw each other, I think that OpenAI definitely felt that the work we've been doing around fine-tuning was worth sharing.

I would obviously tend to agree, but today I spoke about some of the techniques that we learned. Obviously, it was like a non-linear path arriving to where we've arrived and the techniques that we built to build genie. So I think I shared a few extra pieces about some of the techniques and how it really works under the hood, how you generate a data set to show the model how to do what we show the model.

And that was mainly what I spoke about today. I mean, yeah, they reached out. I was super excited at the opportunity, obviously. Like, it's not every day that you get to come and do this, especially in San Francisco. So yeah, they reached out and they were like, do you wanna talk at Dev Day?

You can speak about basically anything you want related to what you've built. And I was like, sure, that's amazing. I'll talk about fine-tuning, how you build a model that does this software engineering. So yeah. - Yeah, and the trick here is when we talked, O1 was not out. - No, it wasn't.

- Did you know about O1 or? - I didn't know. I knew some bits and pieces. No, not really. I knew a reasoning model was on the way. I didn't know what it was gonna be called. I knew as much as everyone else. Strawberry was the name back then.

- Because, you know, I'll fast forward. You were the first to hide your chain of thought reasoning traces as IP. - Yes. - Right, famously that got you in trouble with Sweetbatch or whatever. - Yes. I feel slightly vindicated by that now, not gonna lie. - And now, obviously, O1 is doing it.

- Yeah, the fact that, yeah, I mean, like, I think it's true to say right now that the reasoning of your model gives you the edge that you have. And, like, the amount of effort that we put into our data pipeline to generate these human-like reasoning traces was, I mean, that wasn't for nothing, that we knew that this was the way that you'd unlock more performance, getting them all to think in a specific way.

In our case, we wanted it to think like a software engineer. But, yeah, I think that the approach that other people have taken, like OpenAI, in terms of reasoning, has definitely showed us that we were going down the right path pretty early on. And even now, we've started replacing some of the reasoning traces in our genie model with reasoning traces generated by O1, or at least in tandem with O1.

And we've already started seeing improvements in performance from that point. But no, like, back to your point, in terms of, like, the whole, like, withholding them, I still think that that was the right decision to do because of the very reason that everyone else has decided to not share those things.

It's, it is exactly, it shows exactly how we do what we do. And that is our edge at the moment. So, yeah. - As a founder, so, they also feature Cognition on stage, talked about them. How does that make you feel that, like, you know, they're like, "Hey, O1 is so much better, makes us better." For you, it should be like, "Oh, I'm so excited about it, too." Because now, all of a sudden, it's like, it kind of, like, raises the floor for everybody.

Like, how should people, especially new founders, how should they think about, you know, worrying about the new model versus, like, being excited about them, just focusing on, like, the core FD and maybe switching out some of the parts, like you mentioned? - Yeah, I, speaking for us, I mean, obviously, like, we were extremely excited about O1 because at that point, the process of reasoning is obviously very much baked into the model.

We fundamentally, if you, like, remove all distractions and everything, we are a reasoning company, right? We want to reason in the way that a software engineer reasons. So, when I saw that model announced, I thought immediately, "Well, I can improve the quality "of my traces coming out of my pipeline." So, like, my signal-to-noise ratio gets better.

And then, not immediately, but down the line, I'm going to be able to train those traces into O1 itself. So, I'm going to get even more performance that way as well. So, it's, for us, a really nice position to be in, to be able to take advantage of it, both on the prompted side and the fine-tuned side.

And also because, fundamentally, like, we are, I think, fairly clearly in a position now where we don't have to worry about what happens when O2 comes out, what happens when O3 comes out. This process continues, like, even going from, you know, when we first started going from 3.5 to 4, we saw this happen.

And then, from 4 turbo to 4.0, and then from 4.0 to O1, we've seen the performance get better every time. And I think, I mean, like, the crude advice I'd give to any startup founder is try to put yourself in a position where you can take advantage of the same, you know, like, sea-level rise every time, essentially.

- Do you make anything out of the fact that you were able to take 4.0 and fine-tune it higher than O1 currently scores on SweeBench Verified? - Yeah, I mean, like, that was obviously, to be honest with you, you realized that before I did. But it was-- - Adding value.

- Yes, absolutely, that's a value-add investor right there. No, obviously, I think it's been, that in of itself is really vindicating to see because I think we have heard from some people, not a lot of people, but some people saying, "Well, okay, well, if O1 can reason, "then what's the point of doing your reasoning?" But it shows how much more signal is in, like, the custom reasoning that we generate.

And again, it's the very sort of obvious thing. If you take something that's made to be general and you make it specific, of course it's gonna be better at that thing, right? So it was obviously great to see, like, we still are better than O1 out of the box, you know, even with an older model, and I'm sure that that delta will continue to grow once we're able to train O1, and once we've done more work on our dataset using O1, like, that delta will grow as well.

- It's not obvious to me that they will allow you to fine-tune O1, but, you know, maybe they'll try. I think the core question that OpenAI really doesn't want you to figure out is can you use an open-source model and beat O1? - Interesting. - Because you basically have shown proof of concept that a non-O1 model can beat O1, and their whole O1 marketing is, "Don't bother trying." Like, "Don't bother stitching together "multiple chain-of-thought calls.

"We did something special, secret sauce. "You don't know anything about it." And somehow, you know, your 4.0 chain-of-thought reasoning as a software engineer is still better. Maybe it doesn't last. Maybe they're gonna run O1 for five hours instead of five minutes, and then suddenly it works. So I don't know.

- It's hard to know. I mean, one of the things that we just want to do out of sheer curiosity is do something like fine-tune 4.05b on the same dataset. Like, same context window length, right? So it should be fairly easy. We haven't done it yet. Truthfully, we have been so swamped with the waitlist, shipping product, you know, dev day, like, you know, onboarding customers from our waitlist.

All these different things have gotten in the way, but it is definitely something out of more curiosity than anything else I'd like to try out. But also, it opens up a new vector of, like, if someone has a VPC where they can't deploy an OpenAI model, but they might be able to deploy an open-source model, it opens that up for us, as well, from a customer perspective.

So it'll probably be quite useful. I'd be very keen to see what the results of it. - I suspect the answer is yes, but there may be, it may be hard to do. So, like, Reflection 70b was, like, a really crappy attempt at doing it. You guys were much better, and that's why we had you on the show.

I, yeah, I'm interested to see if there's Open01, basically. People want Open01. - Yeah, I'm sure they do. As soon as we do it, and, like, once we've wrapped up what we're doing in San Francisco, I'm sure we'll give it a go. I spoke to some guys today, actually, about fine-tuning 4.05b, who might be able to allow us to do it very easily.

I don't want to have to, basically, do all the setup myself, so. Yeah, that might happen sooner rather than later. - Yeah. - Anything? - Anything from the releases today that you're super excited about? So, prompt caching, I'm guessing, when you're, like, dealing with a lot of code bases, that might be helpful.

Is there anything with vision fine-tuning, related to, like, more, like, UI-related development? - Yeah, definitely. Yeah, I mean, we were, like, we were talking. It's funny, like, my co-founder, Sam, who you've met, and I were talking about the idea of doing vision fine-tuning, like, way back, like, well over a year ago, before Genie existed as it does now.

When we collected our original dataset to do what we do now, whenever there were image links and links to, like, graphical resources and stuff, we also pulled that in as well. We never had the opportunity to use it, but it's something we have in storage. And, again, like, when we have the time, it's something that I'm super excited, particularly on the UI side, to be able to, like, leverage.

Particularly, if you think about one of the things, not to sidetrack, but one of the things we've noticed is, I know SweetBench is, like, the most commonly talked about thing, and, honestly, it's an amazing project, but one of the things we've learned the most from actually shipping this product to users is it's a pretty bad proxy at telling us how competent the model is.

So, for example, when people are doing, like, React development using Genie, for us, it's impossible to know whether what it's written has actually done, you know, done what it wanted to. So, at least, even using, like, the fine-tuning for vision to be able to help eval, like, what we output, is already something that's very useful.

But also, in terms of being able to pair, here's a UI I want, here's the code that actually, like, represents that UI, is also gonna be super useful as well, I think. In terms of generally, what have I been most impressed by? The distillation thing is awesome. I think we'll probably end up using it in places.

But what it shows me more broadly about OpenAI's approach is they're gonna be building a lot of the things that we've had to hack together internally, in terms from a tooling point of view, just to make our lives so much easier. And I've spoken to, you know, John, the head of fine-tuning, extensively about this.

But there's a bunch of tools that we've had to build internally for things like dealing with model lineage, dealing with data set lineage, because it gets so messy so quickly, that we would love OpenAI to build. Like, absolutely would love them to build. It's not what gives us our edge, but it certainly means that then we don't have to build it and maintain it afterwards.

So, it's a really good first step, I think, in, like, the overall maturity of the fine-tuning product and API, in terms of where they're going, to see those early products. And I think that they'll be continuing in that direction going on. - Did you not, so there's a very active ecosystem of LLLmaps tools.

Did you not evaluate those before building your own? - We did, but I think fundamentally, like... - No moat. (laughs) - Yeah, like, I think, in a lot of places, it was never a big enough pain point to be like, "Oh, we absolutely must outsource this." It's definitely, in many places, something that you can hack a script together in a day or two, and then hook it up to our already existing internal tool UI, and then you have, you know, what you need.

And whenever you need a new thing, you just tack it on. But for, like, all of these LLLmaps tools, I've never felt the pain point enough to really, like, bother. And that's not to deride them at all. I'm sure many people find them useful, but just for us as a company, we've never felt the need for them.

So it's great that OpenAI are gonna build them in, 'cause it's really nice to have them there, for sure. But it's not something that, like, I'd ever consider really paying for externally or something like that, if that makes sense. - Yeah. Does voice mode factor into Genie? - Maybe one day, that'd be sick, wouldn't it?

- I don't know. - Yeah, I think so. - You're the first person that we are, we've been asking this question to everybody. - Yeah, I think-- - You're the first person to not mention voice mode. - Oh, well, it's currently so distant from what we do. But I definitely think, like, this whole talk of we want it to be a full-on AI software engineering colleague, like, there is definitely a vector in some way that you can build that in.

Maybe even during the ideation stage, talking through a problem with Genie in terms of how we wanna build something down the line. I think that might be useful, but honestly, like, that would be nice to have when we have the time, yeah. - Yeah, amazing. One last question, on your, in your talk, you mentioned a lot about curating your data and your distribution and all that.

- Yes. - And before we sat down, you talked a little bit about having to diversify-- - Absolutely. - Your data set. - Yeah. - What's driving that, what are you finding? - So, we have been rolling people off the wait list that we sort of amassed when we announced when I last saw you.

And it's been really interesting because, as I may have mentioned on the podcast, like, we had to be very opinionated about the data mix and the data set that we put together for like, sort of the V0 of Genie. Again, like, to your point, JavaScript, JavaScript, JavaScript, Python, right?

There's a lot of JavaScript in its various forms in there. But it turns out that when we've shipped it to the very early alpha users we rolled it out to, for example, we had some guys using it with a C# code base. And C# currently represents, I think, about 3% of the overall data mix.

And they weren't getting the levels of performance that they saw when they tried it with a Python code base. And it was obviously, like, not great for them to have a bad experience, but it was nice to be able to correlate it with the actual, like, objective data mix that we saw.

So, we did, what we've been doing is, like, little top-up fine-tunes where we take, like, the general Genie model and do an incremental fine-tune on top with just a bit more data for a given, you know, vertical language. And we've been seeing improvements coming from that. So, again, this is one of the great things about sort of baptism by fire and letting people use it and giving you feedback and telling you where it sucks.

Because that is not something that we could have just known ahead of time. So, I want that data mix to, over time as we roll it out to more and more people, and we are trying to do that as fast as possible, we're still a team of five for the time being, to be as general and as representative of what our users do as possible and not what we think they need.

- Yeah, so, every customer is going to have their own little fine-tune. - There is gonna be the option to, yeah, there is gonna be the option to fine-tune the model on your code base. It won't be in, like, the base pricing tier, but you will definitely be able to do that.

It will go through all of your code base history, learn how everything happened, and then you'll have an incrementally fine-tuned Genie just on your code base. And that's what enterprises really love the idea of. - Lovely. - Yeah. - Perfect. Anything else? - Cool. - Yeah, that's it. Thank you so much.

- Thank you so much, guys. Good to see you. - Thank you. - Lastly, this year's Dev Day ended with an extended Q&A with Sam Altman and Kevin Weil. We think both the questions asked and answers given were particularly insightful, so we are posting what we could snag of the audio here from publicly available sources, credited in the show notes, for you to pick through.

If the poorer quality audio here is a problem, we recommend waiting for approximately one to two months until the final video is released on YouTube. In the meantime, we particularly recommend Sam's answers on the moderation policy, on the underappreciated importance of agents and AI employees beyond level three, and his projections of the intelligence of O1, O2, and O3 models in future.

(audience cheering) - All right, I think everybody knows you. For those who don't know me, I'm Kevin Weil, Chief Product Officer at OpenAI. I have the good fortune of getting to turn the amazing research that our research teams do into the products that you all use every day and the APIs that you all build on every day.

I thought we'd start with some audience engagement here. So on the count of three, I'm gonna count to three, and I want you all to say, of all the things that you saw launched here today, what's the first thing you're gonna integrate? It's the thing you're most excited to build on, all right?

You gotta do it, all right? One, two, three. - Realtime API. - Realtime API. (audience laughing) - I'll say it personally, I'm super excited about our distillation products. I think that's gonna be really, really interesting. (audience applauding) I'm also excited to see what you all do with advanced voice code with the Realtime API and with Vision fine-tuning in particular.

So, okay. So I've got some questions for Sam. I've got my CEO here in the hot seat. Let's see if I can't make a career-limiting move. (audience laughing) So we'll start with an easy one, Sam. How close are we to AGI? (audience laughing) - You know, we used to, every time we finished a system, we would say, like, in what way is this not an AGI?

And it used to be, like, very easy. Like, they got a little robotic hand that does a Rubik's Cube or a Dota bot, and it's like, oh, it does some things, but definitely not an AGI. It's obviously harder to say now. So we're trying to, like, stop talking about AGI as this general thing, and we have this levels framework, because the word AGI has become so over-limited.

So, like, real quickly, we use one for chatbots, two for reasoners, three for agents, four for innovators, five for organizations, like, roughly. I think we clearly got to level two, or we clearly got to level two with O1. And it, you know, can do really quite impressive cognitive tasks, it's a very smart model.

It doesn't feel AGI-like in a few important ways, but I think if you just do the one next step of making it, you know, very agent-like, which is on level three, and which I think we will be able to do in the not-distant future, it will feel surprisingly capable.

Still probably not something that most of you would call an AGI, though, maybe some of you would, but it's gonna feel like, all right, this is like a significant thing. And then, the leap, and I think we did that pretty quickly, the leap from that to something that can really increase the rate of new scientific discovery, which, for me, is like a very important part of having an AGI, I feel a little bit less certain on that, but not a long time.

Like, I think all of this now is gonna happen pretty quickly, and if you think about what happened from last decade to this one, in terms of model capabilities, and you're like, eh, I mean, if you go look at, like, if you go from my O1 on a hardware problem back to, like, Fort Turbo that we launched 11 months ago, you'll be like, wow, this has happened pretty fast.

And I think the next year will be very steep progress, next two years will be very steep progress, harder than that, hard to see a lot of certainty, but I would say, like, not very, and at this point, the definitions really matter. And the fact that the definitions matter this much somehow means we're, like, getting pretty close.

- Yeah, and, you know, there used to be this sense of AGI where it was like, it was a binary thing, and you were gonna go to sleep one day, and there was no AGI, and wake up the next day, and there was AGI. I don't think that's exactly how we think about it anymore, but how have your views on this evolved?

- You know, the one, I agree with that, I think we're, like, you know, in this, like, kind of period where it's gonna feel very blurry for a while, and the, you know, is this AGI yet, or is this not AGI, or kind of, like, at what point, it's just gonna be this, like, smooth exponential, and, you know, probably most people looking back at history won't agree, like, when that milestone was hit, and will just realize it was, like, a silly thing.

Even the Turing test, which I thought always was, like, this very clear milestone, you know, there was this, like, fuzzy period, it kind of, like, went whoosh and bye, no one cared, but I think the right framework is just this one exponential. That said, if we can make an AI system that is, like, materially better at all of open AI than doing AI research, that does feel to me like some sort of important discontinuity.

It's probably still wrong to think about it that way. It probably still is this smooth exponential curve, but that feels like a new milestone. Is open AI still as committed to research as it was in the early days? Will research still drive the core of our advancements in our product development?

Yeah, I mean, I think more than ever. There was, like, a time in our history when the right thing to do was just to scale up compute, and we saw that with Conviction, and we had a spirit of, like, we'll do whatever works. You know, like, we want to, we have this mission, we want to, like, build, say, AGI, figure out how to share the benefits.

If the answer is, like, rack up GPUs, we'll do that. And right now, the answer is, again, really push on research. And I think you see this with O1. Like, that is a giant research breakthrough that we were attacking for many vectors over a long period of time that came together in this really powerful way.

We have many more giant research breakthroughs to come, but the thing that I think is most special about OpenAI is that we really deeply care about research, and we understand how to, I think, it's easy to copy something you know works. And, you know, I actually don't, I mean, that is a bad thing.

Like, when people copy OpenAI, I'm like, great, the world gets more AI, that's wonderful. But to do something new for the first time, to, like, really do research in the true sense of it, which is not like, you know, let's barely get soda out of this thing, or, like, let's tweak this.

But, like, let's go find the new paradigm, and the one after that, and the one after that. That is what motivates us. And I think the thing that is special about us as an org, besides the fact that we, you know, married product and research and all this other stuff together, is that we know how to run that kind of a culture that can go, that can go push back the frontier.

And that's really hard. But we love it. And that's, you know, I think we're gonna have to do that a few more times, and then we can take you on. - Yeah, I'll say, like, the litmus test for me, coming from the outside, from, you know, sort of normal tech companies, of how critical research is to open AI, is that building product in open AI is fundamentally different than any other place that I have ever done it before.

You know, normally, you have some sense of your tech stack. You have some sense of what you have to work with, and what capabilities computers have. And then you're trying to build the best product, right? You're figuring out who your users are, what problems they have, and how you can help solve those problems for them.

There is that at open AI. But also, the state of, like, what computers can do, just evolves every two months, three months. And suddenly, computers have a new capability that they've never had in the history of the world, and we're trying to figure out how to build a great product and expose that for developers and our APIs and so on.

And, you know, you can't totally tell what's coming. It's coming through the mist a little bit at you, and gradually taking shape. It's fundamentally different than any other company I've ever worked at, and it's, I think-- - Is that the thing that has most surprised you? - Yes. Yeah, and it's interesting how, even internally, we don't always have a sense.

You have, like, okay, I think this capability's coming, but is it going to be, you know, 90% accurate, or 99% accurate in the next model? Because the difference really changes what kind of product you can build, and you know that you're gonna get to 99, you don't quite know when, and figuring out how you put a roadmap together in that world is really interesting.

- Yeah, the degree to which we have to just, like, follow the science and let that determine what we go work on next and what products we build and everything else is, I think, hard to get across. Like, we have guesses about where things are gonna go. Sometimes we're right, often we're not.

But if something starts working, or if something doesn't work that you thought was gonna work, our willingness to just say, we're gonna, like, pivot everything and do what the science allows, and you don't get to, like, pick what the science allows, that's surprising. - I was sitting with an Enterprise customer a couple weeks ago, and they said, you know, one of the things we really want, this is all working great, we love this, one of the things we really want is a notification 60 days in advance when you're gonna launch something.

And I was like, I want that, too. (audience laughing) All right, so I'm going through, these are a bunch of questions from the audience, by the way, and we're gonna try and also leave some time at the end for people to ask some audience questions. So we've got some folks with mics, and when we get there, they'll be thinking.

But next thing is, so many in the alignment community are genuinely concerned that open AI is now only paying lip service to alignment. Can you reassure us? - Yeah. I think it's true we have a different take on alignment than maybe what people write about on whatever that internet forum is.

But we really do care a lot about building safe systems. We have an approach to do it that has been informed by our experience so far. And touch on that other question, which is you don't get to pick where the science goes. We want to figure out how to make capable models that get safer and safer over time.

And a couple of years ago, we didn't think the whole strawberry or the O1 paradigm was gonna work in the way that it's worked. And that brought a whole new set of safety challenges, but also safety opportunities. And rather than kind of plan to make theoretical, once superintelligence gets here, here's the 17 principles, we have an approach of figure out where the capabilities are going, and then work to make that system safe.

And O1 is obviously our most capable model ever, but it's also our most aligned model ever, by a lot. And as these models get better intelligence, better reasoning, whatever you want to call it, the things that we can do to align them, the things we can do to build really safe systems across the entire staff, our tool set keeps increasing as well.

So, we have to build models that are generally accepted as safe and robust to be able to put them in the world. And when we started OpenAI, what the picture of alignment looked like, and what we thought the problems that we needed to solve were going to be, turned out to be nothing like the problems that actually are in front of us and that we had to solve now.

And also, when we made the first GPT-3, if you asked me for the techniques that would have worked for us to be able to now deploy our current systems as generally accepted to be safe and robust, they would not have been the ones that turned out to work. So, by this idea of iterative deployment, which I think has been one of our most important safety stances ever, and sort of confronting reality as it's in front of us, we've made a lot of progress, and we expect to make more.

We keep finding new problems to solve, but we also keep finding new techniques to solve them. All of that said, I think worrying about the sci-fi ways this all goes wrong is also very important. We have people thinking about that. It's a little bit less clear, kind of, what to do there, and sometimes you end up backtracking a lot, but, but I don't think it's, I also don't think it's fair to say we're only gonna work on the thing in front of us.

We do have to think about where this is going, and we do that, too. And I think if we keep approaching the problem from both ends like that, most of our thrust on the, like, okay, here's the next thing, we're gonna deploy this, what needs to happen to get there, but also, like, what happens if this curve just keeps going?

That's been an effective strategy for us. - I'll say also, it's one of the places where I really, I really like our philosophy of iterative deployment. When I was at Twitter, back, I don't know, a hundred years ago now, Ev said something that stuck with me, which is, no matter how many smart people you have inside your walls, there are way more smart people outside your walls.

And so, when we try and get our, you know, it'd be one thing if we just said we're gonna try and figure out everything that could possibly go wrong within our walls, and it would be just us and the red teamers that we can hire, and so on. And we do that, we work really hard at that.

But also, launching iteratively and launching carefully, and learning from the ways that folks like you all use it, what can go right, what can go wrong, I think is a big way that we get these things right. - I also think that as we head into this world of agents off doing things in the world, that is gonna become really, really important.

As these systems get more complex and are acting off longer horizons, the pressure testing from the whole outside world, I really, really think will be critical. - Yeah, so we'll go, actually, we'll go off of that. Maybe talk to us a bit more about how you see agents fitting in middle of the night as long-term plans.

- What do you think? - That's a huge part of the, I mean, I think the exciting thing is this set of models, O1 in particular, and all of its successors, are going to be what makes this possible. Because you finally have the ability to reason, to take hard problems, break them into simpler problems, and act on them, I mean, I think 2025 is gonna be the year you're just really this big.

- Yeah, I mean, chat interfaces are great, and they will, I think, have an important place in the world, but the... When you can ask a model, when you can ask by ChatGT or some agent or something, and it's not just like you get a kind of quick response, or you think you get 15 seconds of thinking, and O1 gives you a nice piece of code back, or whatever.

But you can really give something a multi-term interaction with environments or other people or whatever, like think for the equivalent of multiple days of human effort, and a really smart, really capable human, and have stuff happen, we all say that, we're all like, oh yeah, we can start the next thing, this is coming, this is gonna be another thing, and we just talk about it like, okay.

You know, it's like the next model in evolution. I would bet, and we don't really know until we get to use these, that it's... We'll, of course, get used to it quickly, people get used to any new technology quickly, but this will be a very significant change to the way the world works in a short period of time.

- Yeah, it's amazing, somebody was talking about getting used to new capabilities in AI models and how quickly, actually, I think it was about Waymo, but they were talking about how in the first 10 seconds of using Waymo, they were like, oh my god, is this thing, they're like, there's a bug, let's watch out, and then 10 minutes in, they were like, oh, this is really cool, and then 20 minutes in, they were like, checking their phone board.

You know, it's amazing how much your sort of internal firmware updates for this new stuff very quickly. - Yeah, like I think that people will ask an agent to do something for them that would have taken them a month, and they'll finish in an hour, and it'll be great, and then they'll have like 10 of those at the same time, and then they'll have like 1,000 of those at the same time, and by 2030 or whatever, we'll look back and be like, yeah, this is just like what a human is supposed to be capable of, what a human used to like, you know, grind at for years or whatever, or many humans used to grind at for years, like I just now like ask a computer to do it, and it's like done in an hour.

That's, why is it not a minute? (audience laughing) - Yeah, it's also, it's one of the things that makes having an amazing development platform great, too, because, you know, we'll experiment, and we'll build some agentic things, of course, and like we've already got, I think just like, we're just pushing the boundaries of what's possible today.

You've got groups like Cognition doing amazing things in coding, like Harvey in case text, you got Speak doing cool things with language translation, like we're beginning to see this stuff work, and I think it's really gonna start working as we continue to iterate these models. - One of the very fun things for us about having this development platform is just getting to like watch the unbelievable speed and creativity of people that are building these experiences, like developers, very near and dear to our heart.

It's kind of like the first thing we launched, and just many of us came building on the platforms, but the, so much of the capability of these models and great experiences have been built by people building on the platform. We'll continue to try to offer like great first-party products, but we know that we'll only ever be like a small, narrow slice of the apps or agents or whatever people build in the world, and seeing what has happened in the world in the last 18, 24 months, it's been like quite amazing to watch.

I'm gonna keep going on the agent front here. What do you see as the current hurdles for computer-controlling agents? (typing) Safety and alignment. Like, if you are really going to give an agent the ability to start clicking around your computer, which you will, you are going to have a very high bar for the robustness and the reliability and the alignment of that system.

So, technically speaking, I think that, you know, we're getting like pretty close to the capability side, but this sort of agent safety and trust framework, that's gonna, I think, be the long haul. - And now I'll kind of ask you a question that's almost the opposite of one of the questions from earlier.

Do you think safety could act as a false positive and actually limit public access to critical tools that would enable a more egalitarian world? - The honest answer is yes, that will happen sometimes. Like, we'll try to get the balance right, but if we were fully, or didn't care about, like, safety and alignment at all, could we have launched O1 faster?

Yeah, we could have done that. It would have come at a cost. It would have been things that would have gone really wrong. I'm very proud that we didn't. The cost, you know, I think, would have been manageable with O1, but by the time of O3 or whatever, like, maybe it would be pretty unacceptable.

And so, starting on the conservative side, like, you know, I don't think people are complaining, like, oh, voice mode, like, it won't say this offensive thing and I really want it to, and, you know, formal company and let it offend me. You know what, I actually mostly agree. If you are trying to get O1 to say something offensive, it should follow the instructions of its user most of the time.

There's plenty of cases where it shouldn't. But we have, like, a long history of when we put a new technology into the world, we start on the conservative side. We try to give society time to adapt. We try to understand where the real harms are versus sort of, like, kind of more theoretical ones.

And that's, like, part of our approach to safety. And not everyone likes it all the time. I don't even like it all the time. But if we're right that these systems are, and we're gonna get it wrong too, like, sometimes we won't be conservative enough in some area. But if we're right that these systems are going to get as powerful as we think they are, as quickly as we think they might, then I think starting that way makes sense.

And, you know, we, like, relax over time. - Totally agree. What's the next big challenge for a startup that's using AI as a core feature? I'll say, I've got one, which is, I think one of the challenges, and we face this too, 'cause we're also building products on top of our own models is trying to find the, kind of the frontier.

You wanna be building, these AI models are evolving so rapidly, and if you're building for something that the AI model does well today, it'll work well today, but it's gonna feel old tomorrow. And so you wanna build for things that the AI model can just barely not do. You know, where maybe the early adopters will go for it and other people won't quite, but that just means that when the next model comes out, as we continue to make improvements, that use case that just barely didn't work, you're gonna be the first to do it, and it's gonna be amazing.

But figuring out that boundary is really hard. I think it's where the best products are gonna get built up. - Totally agree with that. The other thing I wanna add is, I think it's, like, very tempting to think that a technology makes a startup, and that is almost never true.

No matter how cool a new technology or a new sort of, like, tech title it is, it doesn't excuse you from having to do all the hard work of building a great company that is going to have durability or, like, accumulated advantage over time. And we hear from a lot of startups, and I see this as, like, a very common thing, which is, like, I can do this incredible thing, I can make this incredible service, and that seems like a complete answer, but it doesn't excuse you from any of, like, the normal laws of business.

You still have to, like, build a good business and a good strategic position. And I think a mistake is that, in the unbelievable excitement and updraft of AI, people are very tempted to forget that. - This is an interesting one. The mode of voice is like tapping directly into the human API.

How do you ensure ethical use of such a powerful tool with obvious abilities and manipulation? - Yeah, you know, voice mode was a really interesting one for me. It was, like, the first time that I felt like I sort of got really tricked by an AI in that, when I was playing with the first beta of it, I couldn't, like, I couldn't stop myself.

I mean, I kind of, like, I still say, like, please switch out GBT. But in voice mode, I, like, couldn't not, kind of, use the normal niceties. I was, like, so convinced, like, ah, it might be a real perk, like, you know? And obviously, it's just, like, hacking some circuit in my brain, but I really felt it with voice mode.

And I sort of still do. I think this is a more, this is an example of, like, a more general thing that we're gonna start facing, which is, as these systems become more and more capable, and as we try to make them as natural as possible to interact with, they're gonna, like, hit parts of our neural circuitry that have, like, evolved to deal with other people.

And, you know, there's, like, a bunch of clear lines about things we don't wanna do. Like, we don't. Like, there's a whole bunch of, like, weird personality growth hacking, like, I think vaguely socially manipulative stuff we could do. But then there's these, like, other things that are just not nearly as clear-cut.

Like, you want the voice mode to feel as natural as possible, but then you get across the uncanny valley, and it, like, at least in me, triggers something. And, you know, me saying, like, please and thank you to Judge18, no problem, probably a good thing to do. You never know.

(audience laughs) But I think this, like, really points at the kinds of safety and alignment issues we have to start in all of this. All right, back to brass tacks. Sam, when's O1 gonna support Function Tools? - Do you know? - Before the end of the year. There are three things that we really wanna get in for-- (audience applauds) We're gonna record this, take this back to the research team, show them how bad we need to do this.

I mean, there are a handful of things that we really wanted to get into O1, and we also, you know, it's a balance of, should we get this out to the world earlier and begin, you know, learning from it, learning from how you all use it, or should we launch a fully complete thing that is, you know, in line with it, that has all the abilities that every other model that we've launched has?

I'm really excited to see things like system prompts and structured outputs and function calling make it into O1. We will be there by the end of the year. It really matters to us, too. (audience applauds) - In addition to that, just 'cause I can't resist this opportunity to reinforce this, like, we will get all of those things in and a whole bunch more things you'll have asked for.

The model is gonna get so much better so fast. Like, we are so early. This is like, you know, maybe it's the GPT-2 scale moment, but like, we know how to get to GPT-4, and we have the fundamental stuff in place now to get to GPT-4. And in addition to planning for us to build all of those things, plan for the model to just get, like, radically smarter, like, you know, hope you all come back next year and plan for it to feel like way more of a year of improvement than from 4.0 to O1.

(audience applauds) - What feature or capability of a competitor do you really admire? - I think Google's notebook thing is super cool. - Yeah. - What are they called? - Notebook 11. - Notebook 11, yeah. Uh, I was like, I woke up early this morning and I was like, looking at examples on Twitter and I was just like, this is like, this is just cool.

Like, this is just a good, cool thing. And like, I think not enough of, not enough of the world is like, shipping new and different things. It's mostly like, the same stuff. But that, I think, is like, that brought me a lot of joy this morning. - Yeah. - It was very, very well done.

- One of the things I really appreciate about that product is the, there's the, just the format itself is really interesting, but they also nailed the podcast-style voices. - Yes. - They have really nice microphones. (audience laughs) They have these sort of sonoran voices. Did you guys see, somebody on Twitter was saying, like, the cool thing to do is take your LinkedIn and put a, you know, give me a hit, and give it to these, give it to Notebook LM, and you'll have two podcasters riffing back and forth about how amazing you are.

(audience laughs) All of your accomplishments over the years. I'll say mine is, I think Anthropic did a really good job on projects. It's kind of a different take on what we did with GPTs. And GPTs are a little bit more long-lived. It's something you build and can use over and over again.

Projects are kind of the same idea, but like, more temporary, meant to be kind of stood up, used for a while, and then you can move on. And that different mental model makes a difference. I think they did a really nice job with that. All right, we're getting close to audience questions, so be thinking of what you want to ask.

So at OpenAI, how do you balance what you think users may need versus what they actually need today? - Also, I've got a question for you. - Yeah, well, I think it does get back to a bit of what we were saying around trying to build for what the model can just like, not quite do, but almost do.

But it's a real balance, too, as we, you know, we support over 200 million people every week on ChatGPT. You also can't say, no, it's cool, like, deal with this bug for three months or this issue. We've got something really cool coming. You've got us all for the needs of today.

And there are some really interesting product problems. I mean, you think about, I'm speaking to a group of people who know AI really well. Think of all the people in the world who have never used any of these products, and that is the vast majority of the world still.

You're basically giving them a text interface, and on the other side of the text interface is this like alien intelligence that's constantly evolving that they've never seen or interacted with, and you're trying to teach them all the crazy things that you can actually do, and all the ways it can help and integrate into your life and solve problems for you.

But people don't know what to do with it. You know, like, you come in, and you're just like, people type, like, hi, and it responds, you know, hey, great to see, like, how can I help you today? And then you're like, okay, I don't know what to say, and then you end up, you kind of walk away, and you're like, well, I didn't see the magic in that.

And so it's a real challenge figuring out how you, I mean, we all have a hundred different ways that we use chat GPT and AI tools in general, but teaching people what those can be, and then bringing them along as the model changes month by month by month, and suddenly gains these capabilities way faster than we as humans gain the capabilities, it's a really interesting set of problems, and I know it's one that you all solve in different ways as well.

- I have a question. Who feels like, they spend a lot of time with O1, and they would say, like, I feel definitively smarter than that thing. (audience laughing) Do you think you still go by O2? (audience laughing) No one taking the bet of, like, being smarter than O2?

So, one of the challenges that we face is, like, we know how to go do this thing that we think will be, like, it is probably smarter than all of us, like, a broad array of tasks, and yet you have to, like, still like fixed bugs, and do the, hey, how are you problem, and mostly what we believe in is that if we keep pushing on model intelligence, people will do incredible things with that, you know, we want to build the smartest, most helpful models in the world, and people then find all sorts of ways to use that, and build on top of that.

It has been definitely an evolution for us, to not just be entirely research-focused, and then we do have to fix all those bugs, and make this super usable, and I think we've gotten better at balancing that, but still, as part of our culture, I think, we trust that if we can keep pushing on intelligence, so it's all four of you around down here, it'll, people will build just incredible things with that capability.

- Yeah, I think it's a core part of the philosophy, and you do a good job pushing us to always, we'll basically incorporate the frontier of intelligence into our products, both in the APIs, and into our first-party products, because it's easy to kind of stick to the thing you know, the thing that works well, but you're always pushing us to like, get the frontier in, even if it only kind of works, because it's gonna work really well soon, so I always find that a really helpful push.

You kind of answered the next one. You do say please and thank you to the models, I'm curious, how many people say please and thank you? - Isn't that so interesting? - I do, too. (laughing) I kind of can't, I mean, I'd feel bad if I don't. Okay, last question, and then we'll go into audience questions for the last 10 or so minutes.

Do you plan to build models specifically made for agentic use cases, things that are better at reasoning and tool calling? - We plan to make models that are great at agentic use cases, that'll be a key priority for us over the coming months. Specifically is a hard thing to ask for, 'cause I think it's also just how we keep making smarter models.

So yes, there's some things like tool use, function calling that we need to build that'll help, but mostly we just wanna make the best reasoning models in the world, those will also be the best agentic-based models in the world. Cool, let's go to audience questions. I don't know who's got the mic.

All right, we got a mic. (audience laughs) - How extensively do you dog food your own technology in your company? Do you have any interesting examples that may not be obvious? - Yeah, I mean, we put models up for agent-only use even before they're done training. We use checkpoints and try to have people use them for whatever they can, and try to build new ways to explore the capability of the model internally and use them for our own development or research or whatever else as much as we can.

We're still always surprised by the creativity of the outside world and what people do, but basically the way we've figured out every step along our way of what to push on next, what we can productize, what the models are really good at is by internal dog food. That's like our whole, that's how we like to go our way through this.

We don't yet have employees that are based off of O1, but as we move into the world of agents, we will try that. We will try having things that we deploy in our internal systems that help you with stuff. There are things that get closer to that. Customer service, we have bots internally that do a ton about answering external questions and fielding internal people's questions on Slack and so on, and our customer service team is probably, I don't know, 20% the size it might otherwise need to be because of it.

I know Matt Knight and our security team has talked extensively about all the different ways we use models internally to automate a bunch of security things and take what used to be a manual process where you might not have the number of humans to even look at everything incoming on models, taking, you know, separating signal from noise and highlighting to humans what they need to go look at, things like that.

So I think internally there are tons of examples and people maybe underestimate the, you all probably will not be surprised by this, but a lot of folks that I talk to are. The extent to which it's not just using a model in a place, it's actually about using chains of models that are good at doing different things and connecting them all together to get one end-to-end process that is very good at the thing you're doing, even if the individual models have, you know, flaws and make mistakes.

- Thank you. I'm wondering if you guys have any plans on sharing models for, like, offline usage because with this distillation thing, it's really cool that we can share our own models, but a lot of use cases, you really wanna kind of, like, have a version of it. - Okay, Ben?

- We're open to it, it's not like a high priority on the current roadmap. If we had, like, more resources and bandwidth, we would go do that, I think there's a lot of reasons you want a local model, but it's not like, it's not like a this-year kind of thing.

- Question. - Hi, my question is, there are many agencies in the government, above the local, state, and national level that could really greatly benefit from the tools that you guys are developing, but have perhaps some hesitancy on deploying them because of security concerns, data concerns, privacy concerns, and I guess I'm curious to know if there are any sort of planned partnerships with governments, world governments, once whatever ADI is achieved, because obviously if AGI can help solve problems like world hunger, poverty, climate change, government's gonna have to get involved with that, right?

And I'm just curious to know if there is some, you know, plan that works when that time comes. - Yeah, I think, I actually think you don't wanna wait until AGI, you wanna start now, right? Because there's a learning process, and there's a lot of good that we can do with our current models, so we've even announced a handful of partnerships with government agencies, some states, I think Minnesota, some others, Pennsylvania, also with organizations like USAID.

It's actually a huge priority of ours to be able to help governments around the world get acclimated, get benefit from the technology, and of all places, government feels like somewhere where you can automate a bunch of workflows and make things more efficient, reduce drudgery, and so on. So I think there's a huge amount of good we can do now, and if we do that now, it just accrues over the long run as the models get better and we get closer to AGI.

- I've got a pretty open-ended question. What are your thoughts on open source? So whether that's open weights, just general discussion, where do you guys sit with open source? - I think open source is awesome. I think, again, if we had more bandwidth, we would do that too. We've gotten very close to making a big open source effort a few times, and then the really hard part is prioritization, and we have put other things ahead of it.

Part of it is there's such good open source models in the world now that I think that segment, the thing we always end in most high is a really great on-device model, and I think that segment is fairly well-served. I do hope we do something at some point, but we want to find something that we feel like if we don't do it, then we'll just be in the same space and not make another thing that's a tiny bit better on benchmarks, because we think there's a lot of good stuff out there now, but spiritually, philosophically, very glad it exists.

Would like to figure out how to contribute. - Hi Sam, hi Kevin. Thanks for inviting us to Dev Day. It's been awesome. All the live demos work, it's been incredible. Why can't Advanced Voice Mod sing? And as a follow-up to this, if it's a company, like a legal issue in terms of copyright, et cetera, is there a daylight between how you think about safety in terms of your own products, on your own hardware, versus giving us developers kind of the, I don't know, sign the right things off so we can make our Advanced Voice Mod sing?

Could you answer this? (audience laughing) - You know, the funny thing is Sam asked the same question. Why can't this thing sing? I want it to sing, I've seen it sing before. Actually, it's a, there are things obviously that we can't have it sing, right? We can't have it sing copyrighted songs, we don't have the licenses, et cetera.

And then there are things that it can't sing, and you could have it sing Happy Birthday, and that would be just fine, right? And we want that too. It's a matter of, I think once you, basically it's easier in finite time to say no, and then build it in, but it's nuanced to get it right, and we, you know, there are penalties to getting these kinds of things wrong.

So it's really just where we are now. We really want the models to sing too. (audience applauding) People were tired of waiting for us to ship Voice Mod, which is like, very fair. We could have waited longer and kind of really got the classifications and filters on copyrighted music versus not, but we decided we would just ship it and we'll have more.

But I think Sam has asked me like four or five times why it doesn't have a feature. (audience laughing) I mean, we still can't like offer something where we're gonna be in like pretty bad, you know, hot water developers or first party or whatever. So yes, we can like maybe have some differences, but we still have to be compliant with the law.

- Could you speak a little to the future of where you see context windows going and kind of the timeline for when, how you see things balance between context window growth and RAG, basically, information retrieval? - I think there's like two different takes on that that matter. One is like, when is it gonna get to like, kind of normal long context, like context length, 10 million or whatever, like long enough that you just throw stuff in there and it's fast enough you're happy about it.

And I expect everybody's gonna make pretty fast progress there, and that'll just be a thing. Long context has gotten weirdly less usage than I would have expected so far. But I think, you know, there's a bunch of reasons for that. I don't wanna go too much into it. And then there's this other question of like, when do we get to context length, not like 10 million, but 10 trillion?

When do we get to the point where you throw like every piece of data you've ever seen in your entire life in there? And you know, like that's a whole different set of things. That obviously takes some research breakthroughs. But I assume that infinite context will happen at some point, and some point it's like, less than a decade.

And that's gonna be just a totally different way that we use these models. Even getting to the like, 10 million tokens of very fast and accurate context, which I expect measured in like, months, something like that. You know, like, people will use that in all sorts of ways, and it'll be great.

But yeah, the very, very long context I think is gonna happen, and it's really interesting. I think we maybe have time for one or two more. - Don't worry, this is, you can call me your favorite question. So, with voice, and all the other changes that users have experienced, since you all have launched your technology, what do you see is the vision for the new engagement layer, the form factor, and how we actually engage with this technology to make our lives so much better?

- I love that question. It's one that we ask ourselves a lot, frankly. There's this, and I think it's one where developers can play a really big part here, because there's this trade-off between generality and specificity here. I'll give you an example. I was in Seoul and Tokyo a few weeks ago, and I was in a number of conversations with folks that, with whom I didn't have a common language, and we didn't have a translator around.

Before, we would not have been able to have a conversation. We would have just sort of smiled at each other and continued on. I took out my phone. I said, "Junji P.T., I want you to be a translator for me. "When I speak in English, I want you to speak in Korean.

"You hear Korean, I want you to repeat it in English." And I was able to have a full business conversation, and it was amazing. You think about the impact that could have, not just for business, but think about travel and tourism and people's willingness to go places where they might not have a word of the language.

You can have these really amazing impacts. But inside ChatGBT, that was still a thing that I had to, like, ChatGBT's not optimized for that, right? Like, you want this sort of digital, you know, universal translator in your pocket that just knows that what you want to do is translate.

Not that hard to build, but I think there's, we struggle with trying to build an application that can do lots of things for lots of people, and it keeps up, like we've been talking about a few times, it keeps up with the pace of change and with the capabilities, you know, agentive capabilities and so on.

I think there's also a huge opportunity for the creativity of an audience like this to come in and, like, solve problems that we're not thinking of, that we don't have the expertise to do. And ultimately, the world is a much better place if we get more AI to more people, and it's why we are so proud to serve all of you.

- I, the only thing I would add is if you just think about everything that's gonna come together, at some point in not that many years in the future, you'll walk up to a piece of glass, you will say whatever you want. They will have, like, there'll be incredible reasoning models, agents connected to everything.

There'll be a video model streaming back to you, like a custom interface just for this one request. Whatever you need is just gonna get, like, rendered in real time in video. You'll be able to interact with it, you'll be able to, like, click through the stream or say different things, and it'll be off doing, like, again, the kinds of things that used to take, like, humans years to figure out.

And it'll just, you know, dynamically render whatever you need, and it'll be a completely different way of using a computer, and also getting things to happen in the world that it's gonna be quite a while. - Awesome, thank you. That was a great question to end on. I think we're at time.

Thank you so much for coming today. (audience applauding) - That's all for our coverage of Dev Day 2024. (gentle music) ♪ Under Dev Day lights, code ignites ♪ ♪ Real-time voice streams reach new heights ♪ ♪ O1 and GPT-408 ♪

Building AGI in Real Time (OpenAI Dev Day 2024)

Chapters

Transcript