Inside OpenAI Enterprise: Forward Deployed Engineering, GPT-5, and More

We literally had to bring the weights of the model physically into their supercomputer. In San Francisco, you could take a car from one part of SF to the other fully autonomously. As opposed to the digital world, I can't book a ticket online right now. Physical autonomy is ahead of digital autonomy in 2025.

I think AI agents are really in day one here. ChatGPT only came out in 2022. And the slope, I think, is incredibly steep. I actually do think self-driving cars have a good amount of scaffolding in the world. You have roads. Roads exist. They're pretty standardized. You have stoplights. AI agents are just kind of dropped in the middle of nowhere.

We'll start with long, short game. I'm short on the entire category of tooling, evals products. Healthcare is probably the industry that will benefit the most from AI. I think I'm AGI-peeled. You're definitely AGI-peeled. The first one was the realization in 2023 that I would never need to code manually like ever, ever again.

Hey, folks. I'm Apoor Vagrabal. And today at the OpenAI office, we had a wide-ranging conversation about OpenAI's work in enterprise. I have with me the head of engineering and head of product of the OpenAI platform, Sherwin Wu and Olivia Goldman. I'm AGI-peeled. OpenAI is well known as the creator of ChatGPT, which is a product that billions across the world have come to love and enjoy.

But today we dive into the other side of the business, which is OpenAI's work in enterprise. We go deep into their work with specific customers and how OpenAI is transforming large and important industries like healthcare, telecommunications, and national security research. We also talk about Sherwin and Olivia's outlook on what's next in AI, what's next in technology, and their picks both on the long and short side.

This is a lot of fun to do. I hope you really enjoy it. Well, two world-class builders, two people who make look building easy. Sherwin, my Palantra 2013 classmate, tennis buddy, with two stops at Quora and Opendoor through the IPO before joining OpenAI, before ChatGPT. You've now been here for three years and lead engineering for all OpenAI platform.

Olivier, former entrepreneur, winner of the Golden Llama at Stripe, where you were for just under a decade, and now lead all of the product at OpenAI platform. That's right. Thanks for doing it. Thank you. Thanks for having us. As a shareholder, as a thought partner, kicking ideas back and forth, I always learn a lot from you guys.

And so it's a treat, it's a real treat to we do this for everybody. You know, I'll open with, people know OpenAI as the firm that built ChatGPT, the product that they have in their pocket that comes with them every day, to work, to personal lives. But the focus for today is OpenAI for Enterprise.

You guys lead OpenAI platform. Tell us about it. What's underneath the OpenAI platform for B2B for Enterprise? Yeah, so this is actually a really interesting question too, because like I said, when I joined OpenAI around three years ago to work on the API, it was actually the only product that we had.

So I think a lot of people actually forget this, where the original product from OpenAI actually was not ChatGPT. It was a B2B product. It was the API we were catering towards developers. And so I've actually seen, you know, the launch of ChatGPT and all of everything downstream from that.

But at its core, I actually think the reason why we have a platform and why we started with an API is kind of comes back to the OpenAI mission. So our mission obviously is to build AGI, which is pretty hard in and of itself, but also to distribute the benefits of it to everyone in the world, to all of humanity.

And, you know, it's pretty clear right now to see ChatGPT doing that because, you know, my mom, you know, maybe even your parents are using ChatGPT. But we actually view our platform and especially our API and how we work with our customers, our enterprise customers as our way of getting the benefits of AGI, of AI to as many people as possible to everyone in every corner of the world.

ChatGPT obviously is really, really, really big now. It's, I think, like the fifth largest website in the world. But we actually, by working through developers using our API, we're actually able to reach even more people in, you know, every corner of the world and every different use case that you might have.

And especially with some of our enterprise customers, we're able to reach even use cases within businesses and reach end users of those businesses as well. And so we actually view the platform as kind of our way of fully expressing our mission of getting the benefits of AGI to everyone.

And so, concretely though, what the platform actually includes today, the biggest product that we have is obviously our developer platform, which is our API. You know, many developers, the, you know, the majority of the startup ecosystem builds on top of this as well as a lot of start digital natives, fortune 500 enterprises at this point.

We also have a product that we sell to governments as well in the public sector. So that's all part of this as well. And also an emerging product line for us in the platform is our enterprise products. So what we actually might sell directly to enterprises beyond just a core API offering.

Fascinating. And maybe to double down, like, I think B2B is actually quite core to the OpenA mission. What we mean by distributing AGI benefits is, you know, I want to live in a world where, you know, there are like 10x more medicines, like, you know, going out every year.

I want to live in a world where, you know, education, like, you know, public service, civil service, you know, like, you know, increasingly, like, optimize, you know, to everyone. And, you know, there are like a large, like, category of use cases that only go through B2B, frankly, unless you enable, like, the enterprises.

And, you know, we talked about Palantir. I think that's probably the same piece at Palantir. Yeah. It's like, hey, like, those are the businesses who are actually, like, you know, making stuff happen in the real world. And so if you do enable them, if you do accelerate them, like, that's how essentially you benefit, you know, to distribute AGI.

Yeah. Well, maybe we can double click into that, Olivier. You know, the reach for chat is obviously wide. Yeah. Billions of users. But for enterprise, it's, maybe tell us about it. Maybe we go deep into a customer example or two. And what is an organization that we have helped transform, maybe?

And at what layers? So, if I were to step back, like, we started our B2B efforts with the API, like, a few years ago. Initially, the customers were startups, developers, indie hackers, extremely technically sophisticated people, like, you know, who are building, like, you know, cool new stuff, essentially, and taking massive, like, you know, market technical risk.

So, we still have a bunch of customers in that category, and we love them, and we keep building with them. On top of that, you know, over the past couple of years, we've been working one more with traditional, like, enterprises, and also, like, digital natives. Essentially, I think, basically, everyone woke up, like, with a GPT on, like, those models are working.

There is a ton of value, and they could see essentially many use cases in the enterprise. A couple of examples which I like the most. One which is very both fresh and, you know, it's quite cool. We've been working a lot with T-Mobile. T-Mobile. So, T-Mobile, leading, like, U.S.

telco operator. T-Mobile has, like, you know, a massive customer support load. Like, you know, people asking, like, you know, "Hey, I was charged, like, that amount of money. What's going on? Or, you know, my cell phone, like, isn't working anymore." A massive, like, you know, share of that load is, like, you know, voice calls.

Like, people want to talk to someone. And so, for them, like, you know, to be able to essentially automate, like, more and more and, you know, to help, like, people, like, self-serve in a way. Like, you know, debug their subscription was pretty big. And so, we've been working with T-Mobile pretty much for the past year at that point to basically automate, like, not only, like, text support but also voice support.

And so, today, like, you know, there are, like, features, like, in the T-Mobile app that, if you call, actually handled by open-air, you know, models behind the scenes. And, you know, it does sound, like, super natural, like, you know, human-sounding latency, quality-wise. So, that one was really fun. A second one, which is very...

Just on that, can I ask you a follow-up question? So, we've got text models, we've got voice models, maybe even video models someday that are deployed at T-Mobile. But what above the models or adjacent to the models might we have helped T-Mobile with, for example? Yeah, there is a term we're doing.

The first one is, you know, you have to put yourself in the shoes of an enterprise buyer. Like, their goal is to automate, you know, reduce, like, you know, optimize customer support. And, you know, going from, like, a model, like, tokens in, tokens out to that case, it's hard.

Yeah. And so, you know, first, like, there's a lot of design, like, you know, system design. We do have, actually, now forward deployed engineers who are helping us quite a bit. Forward deployed engineers. Yeah, that's familiar to the... Borrowed the term from Palantir. Yeah, it's a great term. Were you FTEs at Palantir?

I was not an FTE. I was on, I think they called it the dev side, right? It's like self-engineering. I was also only an intern at Palantir. But yeah, it's a great term. I think it accurately describes what we're asking folks to do, which is, like, embed very deeply with customers and honestly, like, build things specific to their systems.

They're deployed onto these customers. But yeah, we are obviously growing and hiring that team quite a bit because they've been very effective, like, T-Mobile. Yeah, four years of my life. Yeah, yeah. Yeah, forward deployed. Yeah. But go ahead. So, forward deployed engineering. Forward deployed engineers and the sort of, like, systems and, like, integrations are doing is, you know, first, like, you know, you have to orchestrate those models.

Like, those models are not just, you know, those models, like, know nothing about, like, you know, the CRM, like, you know, and like, what's going on. And so, you have to plug the model to, like, many, many different tools. Many of those, like, tools, like, in enterprise do not even have, like, APIs or, like, clean interfaces, right?

It's the first time they're being exposed, like, you know, to a third-party system. And so, there is a lot of, you know, standing up, like, API gateways, like, tools connecting. Then you have to essentially, like, define what good looks like, you know? Again, like, it's a pretty new exercise for everyone, like, you know, defining, like, a golden set of evals is, you know, easier than it sounds, harder than it sounds.

Yeah. And so, we've been spending, like, a bunch of time with them. Evals are important. Evals are super important. Especially, like, audio evals. I know audio evals are, like, extra hard to grade and get right. But, like, the bulk of the use case here is actually audio. Right, right.

And then you have, like, I don't know, a five-minute, like, call transfer power to actually know that the right thing happened. It's a pretty tough problem. Yeah, it's pretty tough. And then, you know, actually nailing down, like, the quality of the customer experience, like, you know, until it feels unnatural.

And here, latency and interruptions play a really, like, you know, important part. We shipped in GA an API, a real-time API. I think it was last week. That's right. A couple of weeks. Yeah. Yeah. It was just last week, I think. Which is, like, a beautiful work of engineering.

You know, there was a really cracked team behind the scenes, which basically allows us, like, to get, like, the most, like, natural sounding, like, you know, voice experience without having, like, these weird interruptions on your leg, where you can feel that essentially the thing is off. So, yeah, cobbling all that together, you know, and you get, like, you know, a really good experience.

Yeah, that's a lot more than just models. Yeah. Yeah, I was going to say, one actually really great thing that I think we've gotten from the T-Mobile experience is actually working with them to improve our models themselves. So, for example, the last real-time, the real-time GA last week, we obviously released a new snapshot, the GA snapshot.

And a lot of the improvements that we actually got into the model came out of, you know, the learnings that we have from T-Mobile. It brings in a lot of other change from other customers, but because we were so deeply embedded into T-Mobile and we were able to understand what good looks like for them, we were able to bring that to some of our models.

That makes sense. So, this is a large customer with tens of millions of users, if not hundreds of millions, and the before and after is on the support side, both tech support internally and then their customer support. Yeah. Makes sense. Yeah. Is there another one that you guys can share?

I like a lot Amgen. Amgen, the healthcare business. Amgen, yeah. So, we're working quite a bit with healthcare companies. Amgen is one of the leading like healthcare companies. We specialize into drugs for cancer or like, you know, inflammatory diseases. They're based out of LA. And we've been working essentially with Amgen to essentially speed up like the drug like development and like commercialization process.

Wow. So, you know, the sort of the north star is like pretty bold. And it's really interesting, like when you similarly, like, you know, we embedded like pretty deeply with Amgen to understand what are their needs. And it's really interesting, like, when I look at those healthcare companies, I feel like there are two big buckets of needs.

One is like pure R&D. It's like, you know, you're seeing like a massive amount of data and like you have super smart scientists who are trying to, you know, combine, test out things, you know, so that's one bucket. A second bucket is like, you know, much more like, you know, common across other industries.

It's like pure like, you know, admin, document authoring, documents reviewing work, which is, you know, by the time like your R&D team has essentially locked the recipe of a medication, getting that medication to market is a ton of work. Like you have to submit to like various regulatory bodies, get a ton of reviews.

And, you know, when we looked at essentially those problems, what we knew, what models were capable of, we saw like, you know, a ton of benefits, a ton of like opportunities to automate and, you know, augment essentially the work of those teams. And so, yeah, Amgen has been like a top customer of DP5, for instance.

Wow. I mean, this could be hundreds of millions of lives if a new drug is developed faster. Yeah, exactly. Huge impact. So that's, you know, that's, I think, one good example of like a kind of impact on which you need to enable enterprises like to do it, you know.

And so I think we're going to do more and more of those. And yeah, frankly, like, you know, on a personal level, like it's a delight, you know, if I can play like, you know, a tiny role essentially, like doubling, like, you know, the kind of medication that people, you know, get in the real world, that feels like, you know, a pretty good, like, you know, achievement.

Huge. Huge, huge. I know you had one. Yeah. So one of my favorite deployments that we've done more recently actually is with the Los Alamos National Labs. So this is the like government national research lab that the US government is running in Los Alamos, New Mexico. It's also where, you know, the Manhattan project happened back in the 40s and 50s, back when it was a secret project.

So, you know, after that, they ended up formalizing it as a city and a program and then now it's a pretty sizable national laboratory. This one is very interesting because one, just the depth of impact here is like unimaginable for me. It's like on the scale of Amgen and some of these other larger companies, but, you know, obviously they're doing a lot of actual new research there.

So a lot of new science. They're doing a lot of stuff with our defense departments and defense use cases as well. So very intense, you know, very intense stuff. But the other thing that's actually very interesting about this one was that it's also a story of a very like bespoke and like new type of deployment that we've done.

Yeah. So because they are so, their government lab, they're so, you know, restrictive and high security and high clearance with a lot of their things, we couldn't just do a normal deployment with them. They couldn't, you know, you can't have people doing national security research just hitting our APIs.

And so we actually did a custom on-prem deployment with them onto one of their supercomputers called Vanado. And so this actually involves a bunch of, you know, very bespoke work with some FDs, also with a lot of our developer team to actually bring one of our reasoning models, O3, into their laboratory, into an air-gapped, you know, supercomputer Vanado and actually deploy it and get it installed to work on their hardware, on their networking stack and actually run it in this particular environment.

And so it was actually very interesting because we literally had to bring the weights of the model physically into their, into their supercomputer. In an environment, by the way, where you're not allowed to have, you know, it's very, it's very locked down for a good reason. You're not allowed to have like cell phones or like any electronics with you as well.

So I think that was a very, very unique challenge. And then the other interesting thing about this deployment is just how it's being used, right? So the interesting thing is because it's so locked down and on-prem, we actually do not have much visibility into exactly what they're, what they're doing with it.

But we do have, you know, they give us feedback. Yeah, yeah. They actually do have some telemetry, but it's, you know, within their own, their own systems. But we do know that it's, you know, being used for a bunch of different things. It's being used for aiding them in terms of speeding up their experiments.

They have a lot of data analysis use cases, a lot of notebooks that they're running with reams of data that they're trying to process. They're actually using it as a thought partner, which is something that's pretty interesting to me. O3 is like pretty smart as a model. And a lot of these people are tackling really tough, you know, novel research problems.

And a lot of times they're kind of using O3 and going back and forth with it on their experiment design on like what they actually should be using it for. Which is, you know, something that we can really say about our older models. And so, yeah, it's just being used by, for a lot of different use cases for the, for the national lab.

And the other cool thing is it's actually being shared between Los Alamos and some of the other labs, Lawrence Livermore, Sandia as well, because it's the super computer setup where they can all kind of connect with it remotely. Fascinating. I mean, we've just gone through three pretty large scale enterprise deployments.

Right. Which might touch tens, if not hundreds of millions of people. But there's this, on the other side of this is the MIT report that came out a couple of weeks ago. 95% of AI deployments don't work. A bunch of, you know, scary headlines that even shook the markets for a couple of days.

Like, you know, put this in perspective, like for every deployment that works, there's presumably a bunch that don't work. So maybe we can, you know, maybe talk about that. Like, what does it take to build a successful enterprise deployment, a successful customer deployment, and the counterfactual based on all your experience serving all these large enterprises?

I think at that point, I may have worked with like a couple of hundreds, I think. A couple of hundreds. So, okay, I'm going to pattern match. What I've seen being like clear leading indicator of success. Number one is like the, the interesting combination of like top-down like buy-in and like enabling like, you know, very clear group of like a tiger team essentially.

Like, you know, at the enterprise, which is sometimes a mix of like open AI, like, you know, enterprise employee. So, you know, typically like, you know, you take like T-Mobile, like the top leadership was like extremely bought-in, like, you know, it's a priority. Right, right. But then letting the team, like, you know, organize and be like, okay, if you want to start small, start small.

Right. You know, and then you can scale it up essentially. So, that would be part number one. So, top-down buy-in and a bottom called a tiger team. Tiger team, you know, people like, you know, a mix of like technical skills and like people who just have like the organizational knowledge, like institutional knowledge, you know.

It's really funny, like in the enterprise, like customer support, a good example, like what we found is that the vast majority of the knowledge is in people's heads. Right. Right. Which is probably like a thing that, you know, with FDs like in general, but like, you know, you take a customer support, you would think that, you know, everything is like perfectly documented, like, you know, etc.

The reality is like the standard like operating procedures, like the SOPs, are largely in people's head. And so, unless you have that tiger team, like mix of like technical and like, you know, subject matter expert, really hard like to get something on the ground. That would be one. Two would be evals first.

Like, whatever we define the good evals, like that gives like a clear, clear, common goal for people to hit. Whenever like, you know, the customer like fails to come up with good evals, it's a moving target. You never know, essentially, you know, if you've made it or not. And, you know, evals are much harder than what it looks to get done.

And evals also oftentimes need to come up bottom up, right? Because all of these things are kind of in people's heads, in the actual operator's heads. Like, it's actually very hard to have a top-down mandate of like, you got like, this is how the evals should look. a lot of it needs the bottom-up adoption.

Right. Yeah, yeah, yeah. And so, we've been building quite a bit of tooling on evals. We have like an evals product and, you know, we're working on more to essentially solve like, you know, that problem or, you know, make it as easy as we can. The last thing is, you know, you want to hill climb, essentially.

You have your evals, the goal is to get to 99%. You start at like, you know, 46. You know, how do you get there? And here, frankly, I think oftentimes like, you know, a mix of like, like, I will say like almost wisdom from people who've done it before.

Like, you know, a lot of that is like, you know, like art, sometimes more than science. Yeah, yeah, yeah. And like, you know, knowing like the quirks of the model, the behavior. Sometimes we even need to fine tune ourselves, the models, you know, when there are some clear limitations.

And, you know, being patient, getting your way, you know, up there and then, you know, ship. Can we go under the hood a little bit? You know, one of the things that we think about a lot is autonomy more broadly, right? What is the makeup of autonomy? On one side, you know, in San Francisco, you could take a car from one part of SF to the other fully autonomously.

No humans involved, no, you press a button. Yeah, we love the Waymos. Right? They've done billions of rides. I think it was like, what, three and a half billion rides. This is on the Tesla FSD. I think Waymo's done like million, tens of millions of rides. That's a lot of autonomy.

Yeah. In the physical world, as opposed to the digital world, I can't book a ticket online right now. There's all sorts of problems that happen if I have my operator try to book a ticket. And it's very counterintuitive because the bar for physical safety is so much higher. The bar for physical safety is higher than the human's capability because lives are at stake.

Yeah. The bar for digital safety, not that high, because all you're going to lose is money. Nobody's life is at stake. But yet physical autonomy is ahead of digital autonomy in 2025. What seems counterintuitive, like why is that the case that, you know, at a technical level, why is it that what should sound easier is actually a lot harder?

Yeah. So I think there are kind of two things at play here. And I really like the analogy with self-driving cars because they've actually been like one of the best applications of AI, I think that I've used recently. But I think there are two things in play. One of them is honestly just the timelines.

Like we've been working on self-driving cars for so long. That's right. I remember when I, you know, back in like 2014, it was kind of like the advent of this and everyone was like, oh, it's happening in like five years. It turns out it took like, I don't know, 10, 15 years or so for this time.

So there's been a long time for this technology to really mature. And I think there's probably like dark ages, you know, back in like 2015 or 2018 or something where it felt like it wasn't going to happen. The trough of disillusionment. Yes. Yes. Yeah. And then now we're, you know, I'm finally seeing it get deployed, which is really exciting.

But it has been like, I don't know, 10 years, maybe even 20 years from the very beginning of the research. Whereas, I think AI agents are like really in day one here. Like ChatGPT only came out in 2022. So like around three, like less than three years ago. But I actually think that what we think about with AI agents and all that really, I think, started with the reasoning paradigm that when we released the O1 preview model back in late last year, I think.

And so I actually think this whole reasoning paradigm with AI agents and the robustness that those bring has only really unfolded for like a year, less than a year really. And so I know you had a chart in your blog post, which I really like, which, you know, the slope is very meaningfully different now.

Like self-driving started very, very early. The slope seems to be a little bit slower, but now it's reaching the promised land. But man, like we started super recently with AI agents and the slope I think is incredibly steep and we'll probably see a crossover at some point. Yeah. But we really have only had like a year really to explore these things.

Do you think we haven't crossed over already? When you look at like the coding work in particular? Yeah, it's a good point. It's like, you know, your chart actually shows AI agents as below self-driving, but like, you know, it's like, what is the y-axis? Like by some measures, like I would not be surprised actually if, you know, AI products or AI agent products is making more revenue than Waymo at this point.

Like Waymo is making a lot, but like just look at all the startups coming up. Look at, you know, chat GPT and how many subscriptions are happening there and all of that. And so maybe we have actually crossed and, you know, a couple years from now, it's going to look very, very different.

Yeah. The y-axis is tangible felt autonomy. Yeah. Yeah. Perfectly objective. How do I feel about that? Exactly. Vibes more than revenue. But revenue is a good one. We should probably redo that with revenue. There's a second thing I wanted to mention on this as well, which is the scaffolding and the environment in which these things operate in.

So I actually remember in the early days of self-driving, a lot of the like researchers around self-driving, we're saying that the roads themselves will have to like change to accommodate self-driving, right? There might be like sensors everywhere so that the self-driving cars can interact with it, which I think is like, you know, retrospect overkill.

Yeah. But I actually do think self-driving cars have a good amount of scaffolding in the world for them to operate in. It's like not completely like unlimited. Yeah. You have roads. Roads exist. They're pretty standardized. You have stoplights. People generally operate in like pretty normal ways. And there are all these traffic laws that you can learn.

Whereas AI agents are just kind of dropped in the middle of nowhere and they kind of have to feel around for them. And I actually think, you know, going off of what Olivier just said too, my hunch is some of the enterprise deployments that don't actually work out likely don't have the scaffolding or infrastructure for these agents to interact with as well.

A lot of the like really successful deployments that we've made, a lot of what our FDs end up doing with some of these customers, is to create almost like a platform or some type of scaffolding connectors organizing the data so that the models have something that they can interact with in a more standardized way.

And so my sense of self-driving cars actually have had this in some degree with roads over the last, you know, over the course of their deployment. But I actually think it's still very early in the AI agents space. And I would not be surprised if a lot of these, a lot of enterprises, a lot of companies just don't really have the scaffolding ready.

So if you drop an AI agent in there, it kind of doesn't really know what to do and its impact will be limited. And so I think once this scaffolding gets built out across some of these companies, I think the deployment will also speed up. But again, to our point earlier, I think there's no slowdown.

There's no, you know, things are still moving very fast. That's great. Well, you know, I've thought about autonomy as a three-part structure. You've got perception, you've got the reasoning, the brain. And then you've got the scaffolding, the last mile of making things work. Maybe we can dive into the second part, which is the reasoning, which is the juice that you guys are building with GPT-5 most recently.

Huge endeavor. Congrats. The first time you guys have launched a full system, not a model or a set of models, but a full system. Talk about that. I mean, the full arc of that development, what was your focus? I mean, honestly, the benchmarks all seem so saturated. Like, clearly, it was more than just benchmarks that you were focused on.

And so what was the North Star like? Tell us about GPT-5, soup to nuts. It's been the work of love of many people for a long time. And to your point, I think GPT-5 is amazingly intelligent. You look at the benchmark, like, you know, the sweep bench and the likes, you know, it is going pretty high.

But I think to me, equally important and impactful was, I would say, the craft, like the style, the tone, the behavior of the model. So, you know, capabilities, intelligence and, you know, behavior of the model. On the behavior of the model, I think it's the first model, like large model release, for which we have worked so closely with a bunch of customers for like month and month, essentially, to better understand like what are the concrete like locks, like what are the concrete blockers of the model.

And often, like, you know, it's not about like, you know, like having a model which is way more intelligent, a model which is faster, a model that better like follows instruction, a model like we're more likely to say no, you know, when you know, he doesn't know about something.

And so that like super close, like, you know, customer feedback loop on GPT-5 was pretty impressive to see. And I think like all the love that GPT-5 has been getting like, you know, in the past like couple of weeks, I think people are starting to feel that, essentially, the builders.

And once you see it, like, it's really hard, essentially, to come back to a model which is like extremely intelligent, but, you know, an exclusively like academic, essentially way. Yeah. Are there trade-offs that you made as you were going through it? Like maybe what are the hardest trade-offs you made as you were building GPT-5?

I should think a very clear trade-off, which I honestly think we are still iterating on, is the trade-off between the reasoning tokens and how long it thinks versus performance. Yeah. Because, and I honestly, this is something that I think we've been working on with our customers since the launch of the reasoning models, which is these models are so, so smart.

Especially if you give it all this like thinking time. I think the feedback I've been seeing around GPT-5 Pro has been pretty crazy too. It's just like, you know, these- Andre had a great tweet last night. Yeah, yeah, I saw that Sam retweeted it. But like these like unsolved problems that none of the other models could handle, you throw out a GPT-5 Pro and it just like one shots it is pretty crazy.

But the trade-off here is you're waiting for 10 minutes. It's quite a long time. And so these things just get like so smart with more inference time. But on the like product builder, on the API side for some of these like business use cases, I think it's pretty tough to like, you know, manage that trade-off.

And for us, it's been difficult to figure out where we want to fall on that spectrum. So we've had to make some trade-offs on like how much of the model think versus like how intelligent should it get. Because as a product builder, there's a latency, there's a real latency trade-off that you have to deal with where, you know, your user might not be happy waiting 10 minutes for like the best answer in the world.

It might be more okay with the substandard answer and like no wait at all. Yeah. I mean, even between GPT-5 and GPT-5 thinking, I have to toggle it now because sometimes I'm so impatient, I just want it ASAP. Yeah. I think there's an ability to skip, right? Yeah, that's right.

Where it's like, I'm impatient, I just want the more simple answer. That's right. That's right. Well, four weeks in, GPT-5, how's the feedback? Yeah, I think feedback has been very positive, especially on the platform side, which has been really great to see. I think a lot of the things that Olivier mentioned have been, you know, come up in feedback from customers.

The model is extremely good at coding, extremely good at kind of like reasoning through different tasks. But especially for like coding use cases, especially at the, you know, at the, when it thinks for a while, it'll usually solve problems that no other models can solve. So, I think that's been a big positive point of feedback.

The kind of robustness and the reduction in hallucinations has been a really big positive feedback. I think there's an evil that showed that the hallucinations basically went to zero for a lot of this. It's not perfect. There's still a lot of work to be done, but that's a big one.

I think because of the reasoning in there too, it just makes the model more likely to say no, less likely to hallucinate answers. So that's been something that people have really liked as well. Other bit of feedback has been around instruction following. So it's really good at instruction following.

This almost bleeds into like the constructive feedback that we're working on where for that it's so good at construction following that instruction following that people need to tweak their prompts or it's almost like too literal in understanding. That's one interesting trade-off actually. Because, you know, when you ask people, developers, like what do you want?

Like you want the model to follow instructions, of course, you know. But once you have a model which, who is like, that is like, you know, extremely literal, essentially, that essentially forces you to express extremely clearly what you want. Otherwise the model, you know, may go sideways. And so that's one of the interesting feedback.

It's almost like the monkey paw where it's like developers and platform customers ask for better instruction following. So they're like, yes, we'll give you really good instruction following. But it's like, you know, follows it almost to a T. And so it's obviously something that the team is actually working through.

I think a good example of this, by the way, is some customers would have these prompts. I remember when we were testing GPT-5, one of the negative feedback that we got was the model was too concise. We were like, what's going on? Why is the model so concise? Interesting.

And then we realized it was because they were using their old prompts from other models. And with the other models, they have to like, you have to like, really beg the model to be precise, concise. So they're like, 10 lines of like, be concise, really be concise. Also, keep your answer short.

And it turns out when you give that to GPT-5, it's like, oh my gosh, this person really wants it to be concise. And so the response would be like one sentence, which is too terse. And so just by removing the extra prompts around being concise, the model behaved in a much better way and much closer to what they actually ended up wanting.

Yeah. It turns out writing the right prompt is still important. Yes. Yes. Yeah. Prompt engineering is still very, very important. Yeah. Yeah. On constructive feedback for GPT-5, there's actually been a good amount as well, which we're all working through. One of them that I think is, I'm really excited for the next, you know, snapshot to come out to fix some of this is code quality and like small like code, like paradigms or like idioms that they might use.

I think they're like feedback around the types of code and the patterns in which it was using, which I think we're working through as well. And then the other bit of feedback, which I think we've already made good progress on internally is around the trade-off of the reasoning tokens and thinking and latency around intelligence.

I think, especially for these simpler problems, you don't usually need a lot of thinking. The thinking should ideally be a little bit more dynamic. And of course, we're always trying to squeeze as much reasoning and performance into as little reasoning tokens as possible. So I didn't mention that, that kind of going down as well.

Yeah. Well, huge congrats. I mean, it's been, I know it's a work in motion for a bunch of our companies. They've had incredible outcomes with GPT-5. One of them is Expo, cyber security business, just like a huge, huge, huge upgrade from whatever they were using prior to that. And I think they're going to need a new eval soon.

That's right. They're going to need a new eval. It's all about evals. On the multimodality side of it, obviously, you guys announced the real-time APA last week. I saw T-Mobile was one of the featured customers on there. Talk about that, like how obviously the text models are leading the pack, but then we've got audio and we've got video.

Yeah. Talk about the progress on the multimodal models. When should we expect to have like the next big unlock and what would that look like? It's a good question. The teams have been making amazing progress on multimodality. On voice, image, video, frankly, the last generation models have been unlocking like quite a few cool use cases.

One of the feedback that we've received is, you know, because like text was so much leading the pack on intelligence, like people felt like in particular on voice that the model was somewhat a little less intelligent and, you know, until you actually see it, like it does feel weird, like, you know, to have like, you know, to have a better answer like on text versus voice.

And so that's pretty much a focus that we have at the moment. I think we like filled like part of that gap, but not the full gap for sure. So I think, you know, catching up, I would say, you know, with the text, like, you know, would be one.

Yeah. A second one, you know, which is absolutely fascinating is the model is like excellent at the moment on like, you know, easy, like casual conversation, like talk to your coach, your therapist. And we basically had to teach the model like to speak essentially better like in actual like, you know, work economically valuable setups.

Give an example, like the model has to be able to understand what an SSN is and, you know, what does it mean to spell a SSN. And if one digit is actually like, you know, fuzzy, it should actually have to repeat versus, you know, guess. You know, there are lots of like, you know, intuitions like that, that someone, you know, of course have, you know, of our voice that we are currently like teaching the model.

And that's like an ongoing work actually with our customers, you know, until we actually confront the model to like actual like customer support calls, actual sales call. It's really hard, like to get a feel, you know, for those gaps. So that's a top like, you know, priority as well.

So this is a completely off script, but an interesting question that comes up in voice models, particularly the real-time API is, you know, previously people were doing, they were taking a speech input, convert that to text, then have some layer of intelligence. Then you would have a text to speech model that would sort of play it back.

And this would be, it would be a stitch of these three parts. But the real-time API, you guys have integrated all of that. And, you know, how does it happen? Because a lot of the logic is written in text. A lot of the Boolean logic, or you call it any function calling is written in text.

How does it work with the real-time API? Is that? That's an excellent question. So the reason why we should do real-time API is that we saw that for the stitch model. The stitch model. Yeah. Oh, the real tenets. The stitched. Like a stitch together. Stitch together. Like speech to text, thinking, text to speech.

I see. Yeah. Like, we saw essentially a couple of issues. One, slowness, like, you know, hops, essentially. Yeah, yeah, yeah. Two, loss of signal, like, you know, across each model. Like, the speech text model is less intelligent. Yeah, you'd lose emotion. You'd lose accent, tone. Exactly. Right, pauses. Yeah.

And, you know, when you are doing, like, actual voice, like, phone, like, calls, essentially, like, those signals are, like, so important, like, kind of, for the system. Yeah. Yeah. One of the challenges that we have is what you mentioned, which is, you know, it means, like, a slightly different architecture, essentially, for text versus voice.

And so that's something that we are actively working on. But I think it was the right call to start, essentially, with, let's make, like, the voice experience, like, natural sounding to a point where, essentially, you're feeling comfortable, like, putting in production, and then working backward, like, to unify, like, the sort of, the orchestration logic, essentially, across modalities.

And then, to be clear, like, a lot of customers still stitch these together. It's, like, kind of what worked in the last generation, but what we're interested in seeing is more and more customers moving towards the real-time approach because of how natural it sounds, how much lower latency it is, especially as we up-level the intelligence of the model.

But also, even, like, taking a step back, I will say it's, like, pretty mind-blowing to me that it works, like, the fact that, like, I think it's mind-blowing that these LMs work at all, where you just train it on a bunch of text, and it's just, you know, auto-aggressively coming up with the next token, and it sounds super intelligent.

That's, like, mind-blowing in and of itself. But I think it's actually even more mind-blowing that this speech-to-speech setup actually works correctly, because you're literally taking the audio bits from someone speaking, streaming, or putting it into the model, and then it's generating audio bits back. And so, to me, it's actually crazy that this works at all, let alone the fact that it can understand accents and tone and pauses and things like that, and then also be intelligent enough to handle a support call or something like that.

You've gone from text-in, text-out to voice-in, voice-out. That's pretty crazy. We have a bunch of companies in our portfolio that are using these models, you know, Parlo on the customer support side, LiveKit on the infra side, and, you know, there's a bunch of use cases we were starting to see that a speech-to-speech model could address.

Obviously, a lot of the harder ones still running on what you're calling the stitched model. But I hope the day is not far when it's all on real-time API. It's going to happen at some point. Right, right, right, right. And actually, maybe that's a good segue into talking about model customization, because I suspect that you have such a wide variety of enterprise customers.

I think you mentioned, what, hundreds of customers, or maybe more. Each of them has a different use case, a different problem set, a different, call it, envelope of parameters that they're working in, maybe latency, maybe power, maybe others. How do you handle that? Talk about what OpenAI offers enterprises who need a customized version of a great model to make it great for them.

Yeah. So, model customization has actually been something that we've invested very deeply in on the API platform since the very beginning. So, even, you know, pre-ChatGPT days, we actually had a supervised fine-tuning API available, and people were actually using it to great effect. The most exciting thing, actually, I'd say, around model customization, it obviously resonates quite well with customers, because they want to be able to bring in your own custom data and create your own custom version of, you know, O3 or O4 mini or something, or GPT-5 even, suited to their own needs.

It's very attractive, but the most recent development I think is very exciting has been the introduction of reinforcement fine-tuning. Something we announced late last year, I think in the 12 days of Christmas, we've GA'd it since, and we're continuing to iterate on it. What is it? Break it down for us.

Yeah. So, it's called, it's actually funny, I think we made up the term reinforcement fine-tuning. It's like, not a real thing until we announced it. It's stuck now. I see it on Twitter all the time. I remember we were discussing it, and I was like, I don't know about RFT guys.

You're not kidding. You're not kidding. Yeah. Yeah. So, reinforcement fine-tuning. So, it really, it's introducing reinforcement learning into the fine-tuning process. So, the original fine-tuning API does something called supervised fine-tuning, you can call it SFT. It is not using reinforcement learning. It is, you know, it's using supervised learning.

Supervised learning. Yeah. And so, what that usually means is you need a bunch of data, a bunch of prompt completion pairs. You need to really supervise and tell exactly the model how it should be acting. And then when you train it on our fine-tuning API, it moves it closer in that direction.

Reinforcement fine-tuning introduces, like RL or reinforcement learning to this loop. Way more complex, way more finicky, but in order of magnitude more powerful. And so, that's actually what's really resonated with a lot of our customers. It allows you to, if you use RFT, the discussion is less of like creating a custom model that's specific to your own use case.

It is, you can actually use your own data and actually crank the RL, yeah, turn the crank on RL to actually create a like best-in-class model for your own particular use case. And so, that's kind of the main difference here. With RFT, the dataset looks a little bit different.

Instead of, you know, prompt completion pairs, you really need a set of tasks that are very gradable. You need a grader that is very objective that you can use here as well. And so, that's actually been something that we've invested a lot in over the last year. And we've actually seen a couple, a good number of customers get really good results on this.

We've talked about a couple of them across different verticals. So, Rogo, which is a startup in the financial services space. They have a very, you know, sophisticated AI team. I think they hire some folks in DeepMind to run their AI program. And they've been using RFT to get best-in-class results on, you know, parsing through financial documents, answer questions around it, and doing tasks around that as well.

There's another startup called Accordance that's doing this in the tax space. I think they've been targeting an eval called TaxBench, which looks at, you know, CPA-style tasks as well. And because they, because, you know, they're able to turn it into a very gradable setup, they're actually able to turn the RFT crank and also get, I think, like, soda results on a tax bench just using our RFT product as well.

And so, it has kind of shifted the discussion away from just customizing something for your own use case to, like, really leveraging your own data to create a best-in-class, maybe best-in-the-world model for something that you care about for your business. Yeah, I feel like the base models are getting so good at instruction following that for, you know, behavior like steering, like, you know, you don't need to find you at that point, like, you know, you can describe what you want and the model is pretty good at it.

But pushing the frontier on, like, actual capabilities, my hunch is that RFT will pretty much become the norm. Like, you know, if you are actually pushing in your field, like, you know, intelligence, like, you know, to a pretty high point. Like, at some point, like, you know, you need to IRL, IRL essentially with custom environments.

Yeah. Fascinating. And even going back to the point earlier around, like, top-down versus bottoms-up for some of these enterprises, a lot of the data that you end up needing for RFT require, like, very intricate knowledge about the exact task that you're doing and understanding how to grade it. And so, a lot of that actually comes from bottoms-up.

Like, I know a lot of these startups will work with experts in their field to try and get the right tasks and get the right feedback to craft some of these data sets. Without further ado, we're going to jump into my favorite section, which is a rapid-fire question. We had a lot of great friends of ours send in some questions for you guys.

We'll start with Altimeter's favorite game, which is a long-short game. Pick a business, an idea, a startup that you're long, and the same short that you would bet against that there's more hype than there's reality. Whoever's ready to go first, long-short? My long is actually not in the AI space, so this is going to be slightly different.

Wow. Here we go. My short is, though, in the AI space. So, I'm actually extremely long esports. And so, what I mean by esports is the entire, like, professional gaming industry that's emerging around video games. Very near and dear to my heart. I play a lot of video games, and so I watch a lot of this.

So, obviously, I'm pretty in the weeds on this. But I actually think there's incredible untapped potential in esports and incredible growth to be had in this area. So, concretely, what I mean are, like, you know, a really big one is League of Legends, all of the games that Riot Games puts out.

They actually have their own professional leagues. They actually have professional tournaments, believe it or not. They rent out stadiums, actually, now. But I just think it's, like, if you look at, kind of, what the youth and, like, what younger kids are looking and where their time is going, it's predominantly going towards these things.

They spend a lot of time on video games. They watch more esports and, like, soccer, basketball, etc. Yeah, yeah, yeah, yeah. A growing number of these, too. I've actually been to some of these events, and it's very interesting. It's very committed to his life. Yeah, yeah, I'm extremely long and stuff.

And so they're booking out stadiums for people to go watch electronic sports? Yeah, yeah, yeah. I literally went to Oracle Arena, the old warrior stadium, to watch one of these, I think, before COVID. And then, so, it's just... Before COVID? Wow, that's five years ago. It was a while ago.

So, I've been following this for a while, and I actually think it had a really big moment in COVID. Like, everyone was playing video games. And I think it's kind of, like, coming back down. So, I think it's, like, undervalued. You know, it's, like, I think no one's really appreciating it now.

But it has all the elements to, like, really, really take off. And so, the youth are doing it. The other thing I'd say is, it is huge in Asia. Like, absolutely massive in Asia. It is absolutely big in Korea, in China, as well. Like, you know, we rented out Oracle Arena, I think, or, like, the event I went to was in Oracle Arena.

My sense is, in Asia, they rent out, like, the entire stadiums. Like, the soccer stadiums. And the players are already, like, celebrities. So, anyways, you know, as, like, you know, the, I know, like, Korean culture is really making its way into the U.S. as well. I think that's another tailwind for this whole thing.

But, anyways, esports, I think, is something you should keep an eye out on, because there's a lot of room for growth. Very unexpected. Yeah. Good to hear. Short? My short. My short's a little spicy, which is I'm short on the entire category of, like, tooling around AI products. And so, this encapsulates a lot of different things.

Kind of cheating, because some of these, you know, I think are starting to play out already. But I think, like, two years ago, it was maybe, like, evals products or, like, frameworks or vector stores. I'm pretty short those. I think nowadays, there's a lot of additional excitement around other tooling around AI models.

So, RL environments, I think, are really big right now as well. Unfortunately, I'm very short on those. I don't really see a lot of potential there. I see a lot of potential in reinforcement learning and applying it. But I think the startup space around RL environments, I think, is really tough.

Main thing is, one, it's just a very competitive space. There's just a lot of people kind of operating in it. And then, two, if the last two years have shown us anything, the space is evolving so quickly and it's so difficult to try and, like, adapt and make sure and understand what the exact stack is that will really, you know, carry through to the next generation of models.

I think that just makes it very difficult when you're in the tooling space, because, you know, today's really hot framework or really hot tool might just not get used in the next generation of models. So, I've been noticing, like, the same pattern, which is the teams that build, like, breakout startups in AI, are extremely pragmatic.

Pragmatic. Yeah. Like, they're not super, like, you know, intellectual, but, like, the perfect world, etc. And it's funny because I feel like, you know, our generation has basically started in tech in a very, like, stable, like, you know, moment where, you know, technology had been building up, like, you know, for years and years with, like, SaaS, like, cloud, etc.

And so, we were, in a way, like, raised, like, you know, in that very stable moment where it makes sense at that point to, you know, design, like, very, like, you know, good, like, abstractions and toolings because, you know, you have a sense where it's going. But it's so different today.

Like, the white white knows what's going to happen next year or two. So, it's almost impossible, like, to define, like, the perfect tooling platform. Right, right, right. Well, that's, there's a lot of that going around right now. Yes. Spicy, a lot of homework there. Olivier, over to you, sir.

Long short. Long short. I've been thinking a lot about education for the past month in the context of kids. I'm pretty short on any education, which basically emphasizes human memorization at that point. And I say that having mostly been through the education myself, but, you know, like, I learned so much on, like, you know, history facts, like, you know, legal things that are, you know, some of it, like, does shape your way of thinking.

A lot of it, frankly, is just, like, you know, knowledge tokens, essentially. And those knowledge tokens, you know, turns out, like, you know, LLMs are pretty good at it. So, I'm quite short on that. That's right. You won't need memory when StratGPD is bionic. You can just think about it straight into your head.

Exactly. Exactly. What am I long at? Frankly, I think, healthcare is probably the industry that will benefit the most from AI in the next, like, year or two. Oh, say more. I think, like, all the ingredients are here for a perfect storm. Mm-hmm. A huge amount of, like, structured and structured data, you know, it's basically the heart of, you know, like, the pharma companies.

The models are excellent at digesting, processing, that kind of data. A huge amount of, like, admin, like, you know, heavy, like, documents-heavy, like, you know, culture. Interesting. But at the same time, like, companies which are very technical, very R&D friendly, like, you know, companies, like, you know, who's, like, sort of technology in a way is at the heart of what they do.

And so, yeah, I'm pretty bullish on their health. This is like life sciences. So, you mean life sciences, research organizations that are producing drugs. Exactly. Gotcha. Exactly. Yeah. Yeah, it's almost like, you know, over the last 20, 30 years, these, these, like, pharma or, like, biotech companies have basically, if you look at the work that they're doing, like, only a small amount of it is actual research.

And so much of it ends up being admin and, like, you know, documents and things like that. And that area is just so ripe for, you know, something to happen with AI. And I think that's what we're seeing with Amgen and some of these other customers. Exactly. And it's also, like, not what they want to do.

I think it's good that we have some regulations there, obviously, but, like, just means that they have, like, reams and reams of things to kind of go through. And so, you know, like, when you have a technology that's able to really help, like, bring down the cost of something like that, I think it'll just, you know, tear right through it.

And I think once governments and, like, you know, institutions are going to realize that. Like, if you step back, like, it is probably one of the biggest bottlenecks to, like, human progress, right? You step back in the past decade, like, you know, how many, like, true, like, breakthrough drugs have there been?

Like, you know, not that many. Like, you know, how life would be different, like, if you double that rate, essentially. So, once you realize what it has takes, yeah, my hunch is that we're going to see quite a bit of momentum in that space. Wow. All right. Lots of homework there as well.

Yeah. Next one. Favorite underrated AI tool other than ChatGP. Maybe. I love Granola. Oh, man. You stole my answer. Granola. I do so much Granola. Like… Two votes for Granola. There is something, like, yeah. Hey, what about ChatGPT record? I like ChatGPT record as well. But there are some features of Granola which I think are really done well.

Like, the whole, like, you know, integration with your Google Calendar is excellent. Yeah. And just, you know, the quality of, like, the transcription and, like, the summary is pretty good. Do you just have it on? Because I know your calendar is back-to-back. You just have Granola on. So, the funny thing is that I don't use Granola internally.

I use Granola for my personal life, mostly. I see. Yeah. Yeah. I see. On dates. I'm joking. I was going to say, yeah, Granola is actually going to be mine. So, two votes for Granola. I was going to say, the easy answer for me is Codex as a software engineer.

It's just, like, it's gotten so good recently. Codex CLI especially with GPT-5. Especially for me, I tend to be less time sensitive about, like, you know, the iteration loop with coding. And so, leaning into GPT-5 on Codex, I think, has been really, really… Interesting. What about Codex has changed?

Because, you know, Codex has also been through a journey. Codex has been around for a bit. I remember, like, it's been launched for, like, more than over a year ago. It's like, what's changed about Codex? Codex CLI has been around for a bit. I feel like it's been less than a year for Codex.

A few months, I would say. The time dilation is so crazy in this field. It feels like it's been around for a very long time. A year ago with GPT-4-0, like, you know, that demo, like, that feels like ages ago. One hadn't even come out yet. Probably as a Christmas hadn't happened yet.

The voice demo. Maybe there's a naming thing, okay. But anyway, yeah. Oh, there was a Codex model. That's what I'm thinking about. There was a Codex model? We are, yeah, we are. You're not to blame for that confusion. Also, I think the GitHub thing was called Codex as well.

That's right. Yes, yes. That's right. But I'm talking about our coding product within ChatGPT, which is the Codex Cloud offering, and then also Codex CLI. So, actually, maybe if I were to narrow my answer a little bit more to Codex CLI, which I've really, really liked. I like the local environment setup.

The thing that's actually made it really useful in the last, I'd say, like, month or so is, one, I think the team has done a really good job of just, like, getting rid of all the paper cuts, like the small product polish and, like, paper cut things. It just, it kind of feels like a joy to use now.

Nice. I feel more reactive. And then the second thing, honestly, is GPT-5. Like, I just think GPT-5 really allows the product to shine. Yeah. It's, you know, at the end of the day, this is kind of a, this is a product that really is dependent on the underlying model.

And when you have to, you know, like, iterate and go back and forth with model, like, four or five times to get it right, to get it to, like, you know, do the change that you want. Versus having it think a little bit longer and it just, like, one-shots and does exactly what you want to do.

Yeah. You get this, like, weird, like, bionic feeling where you're, like, I feel so mind melded with the model right now and, like, perfectly understands what I'm doing. And so getting, like, that kind of, like, dopamine hit and, like, feedback loop constantly with Codex has made it kind of, like, an indispensable thing that I really, really like.

Nice. And the other thing I'd say Codex is just really good for me is, so I use it in my, for, like, personal projects. I also use it to, like, help me understand code bases, like, as an engineering manager now, I'm not as in the weeds on the actual code.

And so you're actually able to use Codex to really understand what's happening with the code base, have it, like, ask questions and have an answer about things and really catch up to speed on things as well. So, like, even the non-coding use cases are really useful with Codex, Clay.

Fascinating. Sam had this tweet about Codex usage ripping, I think, like, yesterday. So I wonder what's going on there, but you're not alone. Yeah, I think I'm not alone. Just judging from the Twitter feedback, I think people are really realizing how great of a combination Codex, Clay, and GPT-5 are.

Yeah, I know that team is undergoing a lot of scaling challenges, but, I mean, the system hasn't gone down for me, so props to them. But we are in a GPU crunch, so we'll see how, you know, how long that goes. Awesome, awesome. All right, the next one. Will there be more software engineers in 10 years or less?

There's about, what, 40, 50 million. Full-time professional software engineers? That's what you mean? Like, full-time, like, actual jobs? Yeah, full-time, yeah. Yeah, because it's a hard one, because, like, I think, without a doubt, there's going to be a lot more software engineering going on. Yes, of course. There's actually a really great post that was shared, I think, in our internal Slack.

It was, like, a Reddit post recently. I actually think that highlights this. It was a really touching story. It was a Reddit post about someone who has a brother who's nonverbal. I actually don't know if you saw this. It was just posted. The person on Reddit posted they have a nonverbal brother who they have to take care of.

The brother, like, they tried all these types of things to help the brother interact with the world, use computers, but, like, vision tracking didn't work because I think his vision wasn't good. All the tools didn't work, and then this brother ended up using ChatGPT. I don't think he used Codex, but he used ChatGPT and basically taught himself how to create a set of tools that were tailor-made to his nonverbal brother.

Basically, a custom software application just for them. And because of that, he now has, like, custom setup that was written by his brother and allows him to, like, browse the Internet. I think the video was, like, him watching The Simpsons or something like that, which is really touching. But I think that's actually what we'll see a lot more of, like, this guy's not a professional software engineer.

His title is not software engineer. But he, like, did a lot of software engineering, probably pretty good, good enough definitely for his brother to use. So the amount of code, the amount of, like, building that will happen, I think is just going to go through an incredible transformation. Right.

I'm not sure what that means for software engineers like myself. Maybe there's, you know, equivalent or maybe there's even more. Of course, more showing. Yeah, more of me. More of me specifically. Way more of you. That's right. But definitely a lot more software engineering and a lot of code.

Yeah. I buy that completely. Like, I buy completely the thesis that there is a massive software shortage. Yeah. Like, in the world. Like, we've been sort of accepting it, you know, for the past 20 years. But, like, the goal of software was never to be that super rigid, super hard to build, you know, artifact.

It was to be, like, you know, customized, like, malleable. Yeah. And so, I expect that we'll see, like, way more. A sort of a reconfiguration of, like, people's, like, jobs and skill sets where way more people code. Like, you know, I expect that product managers are going to code, like, more and more, for instance.

But, yeah. You made your PM's code recently, if you remember. Oh, yeah. We did that. That was really fun. We started, like, essentially not doing, like, PRDs, like, product requirements documents. You know, classic PM thing. You write, like, five pages, like, my product does that, et cetera. And, you know, PM's have been basically by coding prototypes.

And one, it's pretty fast with GPT-5 and, like, codex. Yeah. Just a couple hours, I think. Freaking fast. And second, like, it sort of conveys, like, so much more information than a document. Yeah, yeah, yeah. Like, you get a feel, essentially, for the feature. Like, is it right or not?

So, yeah. I expect that sort of, you know, behavior we're going to see more and more. Yeah. Instead of writing English, you can actually now write the actual thing you want. Yeah, yeah, yeah. And, yeah, that's amazing. Advice for high school students who are just starting out their career.

My advice is, I don't know, maybe it's evergreen. Like, prioritize critical thinking above anything else. If you go in a field which requires, like, extremely high critical thinking, like, you know, skills. I don't know, math, physics, or, you know, maybe philosophies in that bucket. You will be fine regardless.

If you go in a field that sort of turns out, I think. And, again, it gets back to, like, memorization, like, you know, pattern matching. I think you will probably be less future-proof. Yeah. As a, you know. What's a good way to sharpen critical thinking? Use TrashDBT and have it test you.

That's true. Having, like, you know, a world-class tutor who essentially knows how to put the bar, like, 20% of what you can do all the time. Yeah, yeah, yeah. You know? Is actually probably, like, a really good way to do it. Yeah. Nice. Anything from you, sir? Mine is, I think it's just, I think we're actually in such an interesting, like, unique time period where the, like, younger, so, like, maybe this is a more general advice for not just, like, high school students, but just, like, the younger generation, even, like, college students.

It's, like, I think the advice would be don't underestimate how much of an advantage you have relative to the rest of the world right now because of how AI native you might be or how, like, you know, in the weeds of the tools you are. My hunch is, like, high schoolers, college students, when they come into the workplace, they're going to have, actually, a huge leg up on how to use AI tools, how to actually transform the workplace.

And my push for, like, some of the younger, I guess, high school students is, like, one, like, just really immerse yourself in this thing. Yeah. And then, two, just, like, really take advantage of the fact that you're in a unique time where, like, no one else in the workforce really understands these tools as deeply probably as you do.

A good example of this is actually we had our first intern class recently at OpenAI, a lot of software interns. And some of them were just, like, the most incredible cursor power users I've, like, ever seen. They were so productive. Yeah. I was shocked in a good way. Yeah.

Yeah. I was like, yeah, I know we can get good interns, but, like, I don't know, they'd be, like, this good. Yeah. And I think part of it is, like, they've grown up using these tools for better or worse in college. Yeah. But I think the meta level point is they're so, like, AI native.

And even, like, I don't know, me and Olivier, we're, like, kind of AI native. We work at OpenAI. But, like, we haven't, like, been steeped in this and kind of grown up in this. And so, the advice here would just be, like, yeah, leverage that. Like, you know, don't be afraid to kind of, like, go in and spread this knowledge and take advantage of that in the workplace.

Because it is a pretty big advantage for them. Yeah. I can't remember who said this to us at Palantir. But every intern class was just getting faster, smarter, like laptops. Like, smarter every generation. You sure it didn't peak in 2013, you know, when I was an intern? That's right.

That's right. That's right. That's right. That's right. That's right. That's right. That's right. That's right. Yeah. Well, lots happened. You know, lots happened since you guys joined OpenAI, right? What, three years and almost three years? In your OpenAI journey, what has been the rose moment, your favorite moment? The bud moment where you're, like, most excited about something but still opportunity ahead?

And the thorn, toughest moment of your three-year journey? The thorn is easy for me. What we call the blip, which is, you know, the coup of the board. Like, that was a really tough moment. Yeah. It's funny because, you know, after the fact, it's actually reunited quite a bit, the company.

Yeah. Like, there was a feeling. OpenAI had a pretty strong culture before. But, you know, there was a feeling of, like, camera 3, essentially, that was even stronger. Yeah. But, you know, sure, like, tough on the day off. It's very rare to see that anti-fragility. Most orgs after something like that break, break apart.

But I feel like OpenAI got stronger, OpenAI came back. It's a good point. I feel it made OpenAI stronger for real now, essentially, when I look after the fact. When I look at, you know, other, like, you know, news, like, departures or, you know, whatever, like, you know, bad news, essentially.

I feel the company has built, like, you know, a thicker skin and, you know, an ability to, like, recover, like, way quicker. I think it's definitely right. Part of it, too, I think, is also just the culture. I also think this is why it was such a low point for a lot of people.

So many people just at OpenAI care so deeply about what we're doing, which is why they work so hard. You just care a lot about the work. It almost feels like your life's work. Like, it's a very audacious machine and thing that you're doing, which is why I think the blip was, like, so tough on a lot of people.

But also is what I think helped bring people back together and why we were able to hold together and get that thick skin as well. Yeah. Yeah. I have a separate worst moment, which was the big outage that we had in December of last year, if you remember. Yeah, you remember.

I do. It was, like, a multi-hour outage. Really highlights to us how essential of almost, like, a utility the API was. So the background is I think we had, like, a three, four-hour outage sometime in November or December of last year. Really brutal, pure sub-zero. No one could hit ChatGPT.

No one could hit the APIs. It was really rough. That was just really tough just from a, like, you know, customer trust perspective. I remember we, like, talked to a lot of our customers to kind of, like, post-mortem them on what happened and kind of our plan moving forward.

Thankfully, we haven't had anything close to that since then. And I've been actually really happy with all the investments we've made in reliability over the last six months. But in that moment, I think it was really tough. Yeah. On the happy side, like on the roses, I think I have two of them.

The first one would be GPT-5 was really good. Like the sprint up to GPT-5, I think really like, you know, showed like the best of OpenAI. Like, you know, having like cutting edge, like science research, like, you know, extreme customer focus, like, you know, extreme, like, you know, infrastructure and inference, like, you know, talent.

And the fact that we were able like to ship like such a big model and like scale it to like, you know, many, many, many, many tokens, you know, per minute, like, almost like immediately, I think speaks to it. So that one I really... With no outages. With no outages.

Yeah, really good reliability. Yeah. Like, I remember when we shipped like GPT-4 Turbo like a year ago, a year and a half ago, we were terrified by, you know, like the influence of traffic. Yeah. And I feel we've really gotten like much better at, you know, shipping like those, you know, massive updates.

The second rose, like, you know, happy moment for me would be the first death day was really fun. Yeah. It felt like a coming of age, like OpenAI, like, you know, we are embracing that, you know, we have like a huge community developers, you know, we are going to ship models, new products.

And I remember basically seeing like, you know, all my favorite people, OpenAI or not, you know, like, you know, essentially nerding out on, you know, what are you building? Like, you know, what's coming up next? It felt really like, you know, a special moment in time. No, that was actually going to be mine as well.

So I'll just piggyback off of that, which is the very first death day, 2023 November. I actually, I remember it. So, I mean, obviously a lot of good things have happened since then. There's just a very, I don't know why for me, it was a very memorable moment, which was one, it was actually quite a rush up to that day.

We were, we shipped a lot. So our team was just really, really sprinting. So it was like this high, you know, high stress environment kind of going up. To add to that, you know, of course, because we're opening AI, we did a live demo on Sam's, in Sam's keynote of all the stuff that we shipped.

And I just remember being in the back of the audience, sitting with like the team and like waiting for the demo to happen. Once it finished happening, we all just like let out a huge sigh of relief. We were like, oh my God, thank you. And so there's, I think there's just like a lot of like, you know, build up to it.

For me, the most memorable thing is I remember right after Dev Day, all the demos worked well, all the talks worked well. We had the after party and then I was just in a way mode driving home at night with the music playing. It was just like such a great end to the Dev Day.

That was what I remember. That was my rose for the last years. Love it. That's awesome. I assume you guys are, but please tell me if you're AGI Pilled, yes or no. And if so, what was the moment that got you there? What was your aha moment? When did you feel the AGI?

I think I'm AGI Pilled. I think I'm AGI Pilled. You're definitely AGI Pilled. I am? Okay. I've had a couple of them. The first one was the realization in 2023 that I would never need to code manually like ever, ever again. Like I'm not the best coder, frankly, you know, I chose my job like, you know, for a reason.

Yeah. But realizing that, you know, what I thought was a given that we humans would have like to write like basically machine language like forever is actually not a given. Yeah. And that, you know, the pace of price is huge, feeling the AGI. The second like feel the AGI moment for me was maybe the progress on voice and multimodality.

Like, you know, text, like at some point you get used to it, like, okay, you know, the machine can write pretty good text. Yeah, voice makes it real. But once you start actually talking like, you know, to something that really understands your tone, like, you know, understand my accent like in French.

It felt like sort of a true moment like, okay, machines are going beyond like cold, mechanical, deterministic, like, you know, like logic to something like much more like emotional and like, you know, tangible. Yeah, that's a great one. Yeah, mine are, so I do think I am a GI pill that probably gradually became a GI pill over the last couple of years.

I think there are two. And for me, yeah, I think I actually get more shocked from the text models. I know the multimodal ones are really great as well. For me, I think they actually line up with two like general breakthroughs. So the first one was right when I joined the company in September 2022.

It was pre-TiGPT. Yeah, two months ago. But at the time, GPT-4 already existed internally. Really? And I think we were trying to figure out how to deploy. I think Nick Turley's talked about this a lot early as a chat GPT. But it was the first time I talked to GPT-4.

And it was like going from nothing to GPT-4 was just the most mind blowing experience for me. I think for the rest of the world, maybe going from nothing to GPT-3.5 and chat was maybe the big one and then going from 3.5 to 4. But for me, and I think for a lot of maybe some other people who joined around that time, going from nothing to, or not nothing, but like what was publicly available at the time.

But going from that to GPT-4 was just incredible. Like I just remember asking, throwing so many things out. I was like, there's no way this thing is going to be able to give an intelligible answer. And it just like knocks it out of the park. It was absolutely incredible.

GPT-4 was insane. I remember GPT-4 came out when I was interviewing with OpenAI. And I was still like on the phone, should I join? I saw that thing. I was like, okay. I'm in. I'm in. There is no way I can work on anything else at that point. Yeah.

Yeah. So GPT-4 was just crazy. And then the other one was like, is the other breakthrough, which is like the reasoning paradigm. I actually think the purest representation of that for me was deep research. And throwing like, like asking it to like really look up things that I didn't think it would be able to know.

And seeing it like think through all of it, like be really persistent with the search. Get really detailed with the write-up and all of that. That was pretty, pretty crazy. I don't remember the exact query that I threw it. But I just remember like, I feel like the field AGI moments for me are like, I'll throw something at the model that I was like, there's no way this thing will be able to get.

And then it just like knocks out of the park. Like that is kind of the field AGI moment. I definitely had that with deep research with some of the things that I was asking. Yeah. Yeah. Well, this has been great. Thank you so much, folks. You guys are building the future.

You guys are inspiring us every day and appreciate the conversation. Yeah. Thank you so much. As a reminder to everybody, just our opinions, not investment advice. not investment advice.

Inside OpenAI Enterprise: Forward Deployed Engineering, GPT-5, and More | BG2 Guest Interview

Chapters

Transcript