back to index

Building AGI with OpenAI's Structured Outputs API


Chapters

0:0 Introductions
6:37 Joining OpenAI pre-ChatGPT
8:21 ChatGPT release and scaling challenges
9:58 Structured Outputs and JSON mode
11:52 Structured Outputs vs JSON mode vs Prefills
17:8 OpenAI API / research teams structure
18:12 Refusal field and why the HTTP spec is limiting
21:23 ChatML & Function Calling
27:42 Building agents with structured outputs
30:52 Use cases for structured outputs
38:36 Roadmap for structured outputs
42:6 Fine-tuning and model selection strategies
48:13 OpenAI's mission and the role of the API
49:32 War stories from the trenches
51:29 Assistants API updates
55:48 Relationship with the developer ecosystem
58:8 Batch API and its use cases
60:12 Vision API
62:7 Whisper API
64:30 Advanced voice mode and how that changes DX
65:27 Enterprise features and offerings
66:9 Personal insights on Waterloo and reading recommendations
70:53 Hiring and qualities that succeed at OpenAI

Whisper Transcript | Transcript Only Page

00:00:00.000 | (upbeat music)
00:00:02.580 | - Hey, everyone.
00:00:05.240 | Welcome to the Latent Space Podcast.
00:00:06.960 | This is Alessio, partner and CTO
00:00:08.800 | on Residence and Decibel Partners,
00:00:10.320 | and I'm joined by my co-host Swiggs, founder of Small.ai.
00:00:13.440 | - Hey, and today we're excited to be
00:00:15.320 | in the in-person studio with Michelle, welcome.
00:00:18.000 | - Thanks, thanks for having me, very excited to be here.
00:00:20.040 | - This has been a long time coming.
00:00:21.680 | I've been following your work
00:00:23.200 | on the API platform for a little bit,
00:00:25.140 | and I'm finally glad that we could make this happen
00:00:28.200 | after you shipped the structured outputs.
00:00:30.240 | How does that feel?
00:00:31.200 | - Yeah, it feels great.
00:00:32.520 | We've been working on it for quite a while,
00:00:34.000 | so very excited to have it out there
00:00:36.000 | and have people using it.
00:00:37.240 | - We'll tell the story soon,
00:00:38.720 | but I want to give people a little intro
00:00:40.520 | to your backgrounds.
00:00:41.360 | So you've interned and/or worked at Google, Stripe,
00:00:44.880 | Coinbase, Clubhouse, and obviously OpenAI.
00:00:47.760 | What was that journey like?
00:00:49.240 | You know, the one that has the most appeal to me
00:00:51.840 | is Clubhouse because that was a very,
00:00:53.720 | very hot company for a while.
00:00:55.440 | Basically, you seem to join companies
00:00:57.080 | when they're about to scale up really a lot,
00:00:59.120 | and obviously OpenAI has been the latest.
00:01:01.640 | But yeah, just what are your learnings
00:01:03.480 | and your history going into all these notable companies?
00:01:06.480 | - Yeah, totally.
00:01:07.320 | For a bit of my background, I'm Canadian.
00:01:09.200 | I went to the University of Waterloo,
00:01:10.920 | and there you do like six internships
00:01:12.600 | as part of your degree.
00:01:13.840 | So I started, actually, my first job was really rough.
00:01:16.600 | I worked at a bank, and I learned Visual Basic,
00:01:19.360 | and I like animated bond yield curves,
00:01:21.760 | and it was, you know, not--
00:01:23.200 | - Me too.
00:01:24.040 | - Oh, really?
00:01:24.860 | - Yeah, I was a derivatives trader.
00:01:26.200 | - Interest rate swaps, that kind of stuff, yeah.
00:01:28.200 | - Yeah, so I liked, you know, having a job,
00:01:30.000 | but I didn't love that job.
00:01:31.080 | And then my next internship was Google,
00:01:33.680 | and I learned so much there.
00:01:35.120 | It was tremendous.
00:01:36.160 | But I had a bunch of friends that were into startups more,
00:01:38.400 | and, you know, Waterloo has like a big startup culture,
00:01:40.720 | and one of my friends interned at Stripe,
00:01:44.240 | and he said it was super cool.
00:01:45.160 | So that was kind of my,
00:01:46.360 | I also was a little bit into crypto at the time,
00:01:48.880 | then I got into it on Hacker News,
00:01:50.320 | and so Coinbase was on my radar.
00:01:52.280 | And so that was like my first real startup opportunity
00:01:54.720 | was Coinbase.
00:01:55.720 | I think I've never learned more in my life
00:01:57.880 | than in the four-month period
00:01:59.080 | when I was interning at Coinbase.
00:02:00.640 | They actually put me on call.
00:02:02.080 | I worked on like the ACH rails there,
00:02:04.160 | and it was absolutely crazy.
00:02:05.440 | You know, crypto was a very formative experience.
00:02:07.960 | Yeah.
00:02:08.800 | - This is 2018 to 2020,
00:02:10.480 | kind of like the first big wave.
00:02:11.720 | - That was my full-time.
00:02:13.360 | But I was there as an intern in 2016.
00:02:16.480 | Yeah, and so that was the period
00:02:17.960 | where I really like learned to become an engineer,
00:02:19.960 | learned how to use Git, got on call right away,
00:02:22.480 | you know, managed production databases and stuff.
00:02:24.520 | So that was super cool.
00:02:25.640 | After that, I went to Stripe
00:02:26.640 | and kind of got a different flavor of payments
00:02:28.320 | on the other side.
00:02:29.320 | Learned a lot, was really inspired by the Coulsons.
00:02:32.760 | And then my next internship after that,
00:02:34.600 | I actually started a company at Waterloo.
00:02:36.640 | So there's this thing you can do,
00:02:37.560 | it's an entrepreneurship co-op,
00:02:39.120 | and I did it with my roommate.
00:02:40.800 | The company's called Readwise, which still exists, but--
00:02:43.000 | - Yeah, yeah, yeah.
00:02:43.840 | - Everyone uses Readwise.
00:02:44.680 | - Yeah, awesome.
00:02:45.520 | - You co-founded Readwise?
00:02:46.600 | - Yeah.
00:02:47.440 | - I'm a premium user.
00:02:49.000 | - It's not even on your LinkedIn?
00:02:50.840 | - Yeah, I mean, I only worked on it for about a year,
00:02:52.640 | and so Tristan and Dan are the real founders,
00:02:54.800 | and I just had an interlude there.
00:02:56.520 | But yeah, really loved working on something
00:02:59.960 | very startup-focused, user-focused,
00:03:01.840 | and hacking with friends, it was super fun.
00:03:04.720 | Eventually, I decided to go back to Coinbase
00:03:06.120 | and really get a lot better as an engineer.
00:03:09.040 | I didn't feel like I was, you know,
00:03:10.680 | didn't feel equipped to be a CTO of anything at that point,
00:03:13.440 | and so just learned so much at Coinbase.
00:03:15.760 | And that was a really fun curve.
00:03:17.560 | But yeah, after that, I went to Clubhouse,
00:03:18.960 | which was a really interesting time.
00:03:21.920 | So I wouldn't say that I went there before it blew up.
00:03:24.920 | I would say I went there as it blew up,
00:03:26.600 | so not quite the startling track record that it might seem.
00:03:30.120 | But it was a super exciting place.
00:03:31.520 | I joined as the second or third backend engineer,
00:03:34.000 | and we were down every day, basically.
00:03:36.920 | One time, Oprah came on,
00:03:38.040 | and absolutely everything melted down,
00:03:40.120 | and so we would have a stand-up every morning,
00:03:41.640 | and we'd be like, "How do we make everything stay up?"
00:03:44.680 | Which is super exciting.
00:03:45.840 | Also, one of the first things I worked on there
00:03:47.600 | was making our notifications go out more quickly,
00:03:50.200 | because when you join a Clubhouse room,
00:03:51.800 | you need everyone to come in right away
00:03:53.640 | so that it's exciting,
00:03:54.600 | and the person speaking thinks a lot of my audience is here.
00:03:57.680 | But when I first joined, I think it would take 10 minutes
00:03:59.960 | for all the notifications to go out, which is insane.
00:04:02.920 | By the time you want to start talking
00:04:04.080 | to the time your audience is there,
00:04:05.400 | it's like you can totally kill the room.
00:04:07.120 | So that's one of the first things I worked on,
00:04:08.440 | is making that a lot faster and keeping everything up.
00:04:11.680 | - I mean, so already we have an audience of engineers.
00:04:14.040 | Those two things are useful.
00:04:15.160 | It's keeping things up and notifications out.
00:04:17.120 | Notifications, like is it a Kafka topic?
00:04:19.600 | - It was a Postgres shop,
00:04:20.800 | and you had all of the followers in Postgres,
00:04:23.920 | and you needed to iterate over the followers
00:04:25.760 | and figure out, is this a good notification to send?
00:04:28.080 | And so all of this logic,
00:04:29.240 | it wasn't well-batched and parallelized,
00:04:31.640 | and our job queuing infrastructure wasn't right.
00:04:34.000 | And so there was a lot of fixing all of these things.
00:04:36.400 | Eventually, there were a lot of database migrations,
00:04:38.200 | because Postgres just wasn't scaling well for us.
00:04:40.800 | - Interesting, and then keeping things up,
00:04:43.160 | that was more of a, I don't know, reliability issue,
00:04:46.560 | SRE type?
00:04:47.400 | - A lot of it, yeah, it goes down to database stuff.
00:04:50.920 | Everywhere I've worked--
00:04:51.760 | - It's on databases.
00:04:52.600 | (laughing)
00:04:54.280 | - Indexing.
00:04:55.120 | - Actually, at Coinbase, at Clubhouse, and at OpenAI,
00:04:57.760 | Postgres has been a perennial challenge.
00:04:59.760 | It's like, the stuff you learn at one job
00:05:02.400 | carries over to all the others,
00:05:03.520 | because you're always debugging
00:05:05.600 | a long-running Postgres query at 3 a.m. for some reason.
00:05:09.080 | So those skills have really carried me forward, for sure.
00:05:11.480 | - Why do you think that not as much of this is productized?
00:05:14.920 | Obviously, Postgres is an open-source project.
00:05:17.040 | It's not aimed at gigascale,
00:05:18.560 | but you would think somebody would come around
00:05:20.680 | and say, "Hey, we're like the--"
00:05:22.040 | - Yeah, I think that's what Planetscale is doing.
00:05:24.560 | It's not on Postgres, I think.
00:05:25.480 | It's on MySQL, but I think that's the vision.
00:05:27.920 | It's like, they have zero downtime migrations,
00:05:31.320 | and that's a big pain point.
00:05:33.120 | I don't know why no one is doing this on Postgres,
00:05:35.400 | but I think it would be pretty cool.
00:05:36.920 | - Their connection pullers, like PG Bouncer,
00:05:39.040 | is good enough, I don't know.
00:05:40.480 | - Yeah, well, even, I mean,
00:05:41.960 | I've run PG Bouncer everywhere,
00:05:43.440 | and there's still a lot of problems.
00:05:45.800 | Your scale, it's something that not many people see.
00:05:48.760 | - Yeah, I mean, at some point,
00:05:49.720 | every successful company gets to the scale
00:05:52.440 | where Postgres is not cutting it,
00:05:54.000 | and then you migrate to some sort of NoSQL database.
00:05:56.720 | And that process I've seen happen a bunch of times now.
00:05:59.200 | - MongoDB, Redis, something like that.
00:06:01.120 | - Yeah, I mean, we're on Azure now,
00:06:04.000 | and so we use Cosmos DB.
00:06:06.040 | - Cosmos DB, hey!
00:06:07.680 | - At Clubhouse, I really love DynamoDB.
00:06:10.080 | That's probably my favorite database,
00:06:12.160 | which is like a very nerdy sentence,
00:06:13.360 | but that's the one I'm using
00:06:14.800 | if I need to scale something as far as it goes.
00:06:16.560 | - Yeah, DynamoDB, I, when I learned,
00:06:18.800 | I worked at AWS briefly,
00:06:20.120 | and it's kind of like the memory register for the web.
00:06:23.280 | Like, you know, if you treat it just as physical memory,
00:06:26.800 | you will use it well.
00:06:27.920 | If you treat it as a real database,
00:06:29.760 | you might run into problems.
00:06:30.840 | - Right, you have to totally change your mindset
00:06:32.600 | when you're going from Postgres to Dynamo.
00:06:34.200 | But I think it's a good mindset shift,
00:06:35.440 | and kind of makes you design things in a more scalable way.
00:06:37.760 | - Yeah, I'll recommend the DynamoDB book
00:06:39.440 | for people who need to use DynamoDB.
00:06:41.640 | But we're not here to talk about AWS,
00:06:43.240 | we're here to talk about OpenAI.
00:06:44.680 | You joined OpenAI pre-Chad GPT.
00:06:46.360 | I also had the opportunity to join and I didn't.
00:06:48.840 | What was your insight?
00:06:50.600 | - Yeah, I think a lot of people who joined OpenAI
00:06:52.800 | joined because of a product that really gets them excited.
00:06:55.280 | And for most people, it's Chad GPT.
00:06:56.720 | But for me, I was a daily user of Copilot, GitHub Copilot.
00:07:00.840 | And I was like so blown away at the quality of this thing.
00:07:03.240 | I actually remember the first time seeing it on Hacker News
00:07:05.200 | and being like, wow, this is absolutely crazy.
00:07:07.040 | Like, this is gonna change everything.
00:07:08.920 | And I started using it every day.
00:07:10.560 | It just really, even now when like I don't have service
00:07:14.400 | and I'm coding without Copilot,
00:07:16.120 | it's just like 10x difference.
00:07:18.040 | So I was really excited about that product.
00:07:19.320 | I thought now is maybe the time for AI.
00:07:21.320 | And I'd done some AI in college
00:07:22.760 | and thought some of those skills would transfer.
00:07:25.600 | And I got introduced to the team.
00:07:26.800 | I liked everyone I talked to.
00:07:28.280 | So I thought that'd be cool.
00:07:29.720 | Why didn't you join?
00:07:30.760 | - It was like, I was like, is Dolly it?
00:07:32.880 | (laughing)
00:07:33.960 | - We were there.
00:07:35.320 | We were at the Dolly like launch thing.
00:07:37.320 | And I think you were talking with Lenny
00:07:39.000 | and Lenny was at OpenAI at the time.
00:07:41.160 | And you were like--
00:07:42.000 | - We don't have to go into too much detail.
00:07:43.760 | - This is one of my biggest regrets of my life.
00:07:45.960 | - No, no, no.
00:07:47.440 | - But I was like, okay, I mean, I can create images.
00:07:50.680 | I don't know if like this is the thing to dedicate,
00:07:52.680 | but obviously you had a bigger vision than I did.
00:07:55.360 | - Dolly was really cool too.
00:07:56.480 | I remember like first showing my family,
00:07:58.480 | I was like, I'm going to this company
00:08:00.000 | and here's like one of the things they do.
00:08:01.960 | And it like really helped bridge the gap.
00:08:03.320 | Whereas like, I still haven't figured out
00:08:05.600 | how to explain to my parents what crypto is.
00:08:08.160 | My mom for a while thought I worked at Bitcoin.
00:08:10.120 | So it's like, it's pretty different
00:08:12.160 | to be able to tell your family what you actually do
00:08:14.240 | and they can see it.
00:08:15.080 | - Yeah, and they can use it too, personally.
00:08:17.280 | So you were there, were you immediately on API platform?
00:08:20.080 | You were there for the chat GPT moment.
00:08:21.640 | - Yeah, I mean, API platform is like a very grandiose term
00:08:24.880 | for what it was.
00:08:25.720 | There was like just a handful of us working on the API.
00:08:27.920 | - Yeah, it was like a closed beta, right?
00:08:29.160 | Not even everyone had access to the GPT-3 model.
00:08:32.000 | - A very different access model then,
00:08:34.040 | a lot more like tiered rollouts.
00:08:36.520 | But yeah, I would say the applied team
00:08:38.480 | was maybe like 30 or 40 people
00:08:40.760 | and yeah, probably closer to 30.
00:08:42.520 | And there was maybe like five-ish total
00:08:44.520 | working on the API at most.
00:08:45.880 | So yeah, we've grown a lot since then.
00:08:47.480 | - It's like 60, 70 now, right?
00:08:49.300 | - No, applied is much bigger than that.
00:08:51.320 | Applied now is bigger than the company when I joined.
00:08:53.400 | - Okay, all right.
00:08:54.240 | - Yeah, we've grown a lot.
00:08:55.060 | I mean, there's so much to build.
00:08:55.900 | So we need all the help we can get.
00:08:56.740 | - I'm a little out of date, yeah.
00:08:58.080 | - Any chat GPT release, kind of like all ants on deck stories.
00:09:02.480 | I had lunch with Evan Morikawa a few months ago.
00:09:05.240 | It sounded like it was a fun time to get,
00:09:07.280 | build the APIs and have all these people
00:09:09.000 | trying to use the web thing.
00:09:10.080 | Like, how are you prioritizing internally?
00:09:12.160 | And like, what was the helping scaling
00:09:14.640 | when you're scaling non-GPU workloads
00:09:16.800 | versus like Postgres bouncers and things like that?
00:09:19.820 | - Yeah, actually surprisingly,
00:09:20.740 | there were a lot of Postgres issues when chat GPT came out
00:09:24.320 | because the accounts for like chat GPT
00:09:27.600 | were tied to the accounts in the API.
00:09:29.760 | And so you're basically creating a developer account
00:09:31.440 | to log into chat GPT at the time.
00:09:33.000 | 'Cause it's just what we had.
00:09:33.820 | It was low-key research preview.
00:09:35.360 | And so I remember there was just so much work scaling
00:09:37.480 | like our authorization system
00:09:38.840 | and that would be down a lot.
00:09:40.400 | Yeah, also GPU, you know,
00:09:42.040 | I never had worked in a place
00:09:43.880 | where you couldn't just scale the thing up.
00:09:45.960 | It's like everywhere I've worked in Qt is like free
00:09:47.800 | and you just like auto-scale a thing
00:09:49.240 | and you like never think about it again.
00:09:50.880 | But here we're having like tough decisions every day.
00:09:53.000 | We're like discussing like, you know,
00:09:54.320 | should they go here or here?
00:09:55.720 | And we have to be principled about it.
00:09:57.080 | So that's a real mindset shift.
00:09:58.480 | - So you just really structured outputs, congrats.
00:10:00.560 | You also wrote the blog post for it,
00:10:01.680 | which was really well-written.
00:10:02.640 | And I loved all the examples that you put out.
00:10:04.200 | Like you really give the full story.
00:10:06.000 | Yeah, tell us about the whole story from beginning to end.
00:10:09.080 | - Yeah, I guess the story we should rewind quite a bit
00:10:11.720 | to Dev Day last year.
00:10:13.000 | Dev Day last year, exactly.
00:10:14.320 | We shipped JSON mode,
00:10:15.600 | which is our first foray into this area of product.
00:10:18.560 | So for folks who don't know,
00:10:19.680 | JSON mode is this functionality you can enable
00:10:22.000 | in our chat completions and other APIs,
00:10:24.160 | where if you opt in,
00:10:25.720 | we'll kind of constrain the output of the model
00:10:28.120 | to match the JSON language.
00:10:30.240 | And so you basically will always get something
00:10:32.800 | in a curly brace.
00:10:33.880 | And this is good.
00:10:34.720 | This is nice for a lot of people.
00:10:35.720 | You can describe your schema,
00:10:37.280 | what you want in prompt,
00:10:38.800 | and then we'll constrain it to JSON.
00:10:40.920 | But it's not getting you exactly where you want,
00:10:43.080 | because you don't want the model
00:10:44.160 | to kind of make up the keys
00:10:45.800 | or match different values than what you want.
00:10:47.520 | Like if you want an enum or a number
00:10:49.080 | and you get a string instead,
00:10:50.000 | it's pretty frustrating.
00:10:51.600 | So we've been ideating on this for a while,
00:10:53.200 | and people have been asking for basically this
00:10:55.600 | every time I talk to customers for maybe the last year.
00:10:58.120 | And so it was really clear that there's a developer need,
00:11:00.200 | and we started working on kind of making it happen.
00:11:02.720 | And this is a real collab
00:11:04.640 | between engineering and research, I would say.
00:11:06.520 | And so it's not enough to just kind of constrain the model.
00:11:09.960 | I think of that as the engineering side,
00:11:11.600 | whereas basically you mask the available tokens
00:11:14.960 | that are produced every time to only fit the schema.
00:11:17.640 | And so you can do this engineering thing,
00:11:19.080 | and you can force the model to do what you want,
00:11:20.760 | but you might not get good outputs.
00:11:22.200 | And sometimes with JSON mode,
00:11:23.360 | developers have seen that our models output
00:11:25.240 | like white space for a really long time,
00:11:27.280 | where they don't--
00:11:28.120 | - Because it's a legal character.
00:11:29.320 | - Right, it's legal for JSON,
00:11:31.240 | but it's not really what they want.
00:11:32.760 | And so that's what happens
00:11:33.600 | when you do kind of a very engineering-biased approach.
00:11:36.120 | But the modeling approach is to also train the model
00:11:38.360 | to do more of what you want.
00:11:39.640 | And so we did these together.
00:11:41.000 | We trained a model which is significantly better
00:11:42.720 | than our past models at following formats,
00:11:44.960 | and we did the end work to serve
00:11:46.400 | like this constrained decoding concept at scale.
00:11:48.960 | So I think marrying these two
00:11:50.080 | is why this feature is pretty cool.
00:11:52.000 | - You just mentioned starts and ends with a curly brace,
00:11:54.640 | and maybe people's minds go to prefills in the Cloud API.
00:11:59.000 | How should people think about
00:12:00.240 | JSON mode structure output prefills?
00:12:02.640 | Because some of them are like,
00:12:03.840 | roughly starts with a curly brace
00:12:05.880 | and asks you for JSON, you should do it.
00:12:07.720 | And then Instructor is like,
00:12:08.720 | "Hey, here's a rough data scheme that you should use."
00:12:11.120 | And how do you think about them?
00:12:13.080 | - So I think we kind of designed structured outputs
00:12:15.160 | to be the easiest to use.
00:12:16.480 | So you just, like the way you use it in our SDK,
00:12:19.400 | I think is my favorite thing.
00:12:20.720 | So you just create like a pedantic object or a Zod object,
00:12:23.440 | and you pass it in and you get back an object.
00:12:25.520 | And so you don't have to deal with any of the serialization.
00:12:27.840 | - With the parse helper.
00:12:29.000 | - Yeah, you don't have to deal with any of the serialization
00:12:31.080 | on the way in or out.
00:12:32.320 | So I kind of think of this as the feature
00:12:34.080 | for the developer who is like,
00:12:35.680 | I need this to plug into my system.
00:12:37.520 | I need the function call to be exact.
00:12:39.440 | I don't want to deal with any parsing.
00:12:41.120 | So that's where structured outputs is tailored.
00:12:43.600 | Whereas if you want the model to be more creative
00:12:46.160 | and use it to come up with a JSON schema
00:12:48.320 | that you don't even know you want,
00:12:49.440 | then that's kind of where JSON mode fits in.
00:12:51.480 | But I expect most developers
00:12:52.840 | are probably going to want to upgrade to structured outputs.
00:12:55.480 | - The thing you just said,
00:12:56.320 | you just use interchangeable terms for the same thing,
00:12:59.480 | which is function calling and structured outputs.
00:13:02.680 | We've had disagreements or discussion before on the podcast
00:13:06.200 | about are they the same thing?
00:13:07.840 | Semantically, they're slightly different.
00:13:09.520 | - They are, yes.
00:13:10.360 | - Because I think function calling API came out first
00:13:12.760 | than JSON mode.
00:13:14.240 | And we used to abuse function calling for JSON mode.
00:13:18.480 | Do you think we should treat them as synonymous?
00:13:20.920 | - No. - Okay, yeah.
00:13:21.760 | Please clarify.
00:13:22.600 | (both laughing)
00:13:24.000 | And by the way, there's also tool calling.
00:13:26.000 | - Yeah, the history here is we started with function calling
00:13:29.120 | and function calling came from the idea of
00:13:31.760 | let's give the model access to tools
00:13:33.480 | and let's see what it does.
00:13:34.400 | And we basically had these internal prototypes
00:13:36.640 | of what a code interpreter is now.
00:13:38.520 | And we were like, this is super cool.
00:13:39.720 | Let's make it an API.
00:13:40.880 | But we're not ready to host code interpreter for everybody.
00:13:43.440 | So we're just going to expose the raw capability
00:13:45.560 | and see what people do with it.
00:13:47.040 | But even now, I think there's a really big difference
00:13:49.200 | between function calling and structured outputs.
00:13:51.040 | So you should use function calling
00:13:52.600 | when you actually have functions
00:13:53.920 | that you want the model to call.
00:13:55.600 | And so if you have a database
00:13:57.240 | that you want the model to be able to query from,
00:13:59.280 | or if you want the model to send an email
00:14:01.800 | or generate arguments for an actual action.
00:14:04.920 | And that's the way the model has been fine-tuned on,
00:14:07.720 | is to treat function calling
00:14:09.400 | for actually calling these tools and getting their outputs.
00:14:11.880 | The new response format
00:14:13.800 | is a way of just getting the model to respond to the user,
00:14:17.040 | but in a structured way.
00:14:18.400 | And so this is very different.
00:14:19.400 | Responding to a user versus I'm going to go send an email.
00:14:24.240 | A lot of people were hacking function calling
00:14:26.120 | to get the response format they needed.
00:14:28.440 | And so this is why we shipped this new response format.
00:14:31.040 | So you can get exactly what you want
00:14:32.640 | and you get more of the model's verbosity.
00:14:35.280 | It's responding in the way it would speak to a user.
00:14:38.560 | And so less just programmatic tool calling,
00:14:41.200 | if that makes sense.
00:14:42.080 | - Are you building something into the SDK
00:14:43.920 | to actually close the loop with the function calling?
00:14:46.880 | Because right now it returns the function,
00:14:48.360 | then you got to run it,
00:14:49.200 | then you got to fake another message
00:14:51.360 | to then continue the conversation.
00:14:53.640 | - They have that in beta, the runs.
00:14:55.160 | - Yes, we have this in beta in the Node SDK.
00:14:58.480 | So you can basically-
00:14:59.560 | - Oh, not Python.
00:15:00.840 | - It's coming to Python as well.
00:15:02.440 | - That's why I didn't know.
00:15:03.880 | - Yeah, I'm a Node guy.
00:15:04.840 | So I'm like, it's already existed.
00:15:08.760 | - It's coming everywhere.
00:15:09.720 | But basically what you do is you write a function
00:15:11.600 | and then you add a decorator to it.
00:15:13.720 | And then you can, basically there's this run tools method
00:15:16.480 | and it does the whole loop for you, which is pretty cool.
00:15:19.160 | - When I saw that in the Node SDK,
00:15:21.240 | I wasn't sure if that's,
00:15:22.760 | because it basically runs it in the same machine.
00:15:24.960 | - Yeah.
00:15:25.800 | - And maybe you don't want that to happen.
00:15:28.200 | - Yeah, I think of it as like,
00:15:29.600 | if you're prototyping and building something really quickly
00:15:32.240 | and just playing around,
00:15:33.080 | it's so cool to just create a function
00:15:34.840 | and give it this decorator.
00:15:35.880 | But you have the flexibility to do it however you like.
00:15:38.480 | - Like you don't want it in a critical path
00:15:39.840 | of a web request.
00:15:40.920 | - I mean, some people definitely will.
00:15:42.520 | (both laughing)
00:15:44.000 | It's just kind of the easiest way to get started.
00:15:46.080 | But let's say you want to like execute this function
00:15:48.320 | on a job queue async,
00:15:49.760 | then it wouldn't make sense to use that.
00:15:51.840 | - Prior art, Instructure, outlines, JSON former,
00:15:55.920 | what did you study?
00:15:56.760 | What did you credit or learn from these things?
00:15:59.640 | - Yeah, there's a lot of different approaches to this.
00:16:02.080 | There's more fill in the blank style sampling
00:16:04.880 | where you basically pre-form kind of the keys
00:16:08.880 | and then get the model to sample just the value.
00:16:11.480 | There's kind of a lot of approaches here.
00:16:13.200 | We didn't kind of use any of them wholesale,
00:16:15.280 | but we really loved what we saw from the community
00:16:17.480 | and like the developer experiences we saw.
00:16:19.680 | So that's where we took a lot of inspiration.
00:16:21.800 | - There was a question also just about constrained grammar.
00:16:24.880 | This is something that I first saw in Llama CPP,
00:16:27.920 | which seems to be the most,
00:16:29.520 | let's just say academically permissive.
00:16:32.080 | - It's kind of the lowest level.
00:16:33.320 | - Yeah.
00:16:34.240 | For those who don't know,
00:16:35.080 | maybe I don't know if you want to explain it,
00:16:36.040 | but they use Backus-Norform,
00:16:37.800 | which you only learn in like college
00:16:39.200 | when you're working on programming languages and compilers.
00:16:41.440 | I don't know if you like use that under the hood
00:16:43.560 | or you explore that.
00:16:44.800 | - Yeah, we didn't use any kind of other stuff.
00:16:48.160 | We kind of built our solution from scratch
00:16:50.600 | to meet our specific needs.
00:16:52.160 | But I think there's a lot of cool stuff out there
00:16:54.440 | where you can supply your own grammar.
00:16:56.360 | Right now, we only allow JSON schema
00:16:58.120 | and the dialect of that.
00:16:59.320 | But I think in the future,
00:17:00.160 | it could be a really cool extension
00:17:01.360 | to let you supply a grammar more broadly.
00:17:04.200 | And maybe it's more token efficient than JSON.
00:17:06.520 | So a lot of opportunity there.
00:17:08.680 | - You mentioned before also training the model
00:17:11.040 | to be better at function calling.
00:17:12.720 | What's that discussion like internally for like resources?
00:17:15.400 | It's like, hey, we need to get better JSON mode.
00:17:17.360 | And it's like, well,
00:17:18.200 | can't you figure it out on the API platform
00:17:20.240 | without touching the model?
00:17:21.920 | Like is there a really tight collaboration
00:17:24.400 | between the two teams?
00:17:25.440 | - Yeah, so I actually work on the API models team.
00:17:27.520 | I guess we didn't quite get into what I do at API.
00:17:29.400 | (all laughing)
00:17:31.440 | - What do you say it is you do here?
00:17:33.960 | - Yeah, so yeah, I'm the tech lead for the API,
00:17:36.360 | but also I work on the API models team.
00:17:38.240 | And this team is really working on
00:17:39.760 | making the best models for the API.
00:17:41.560 | And a lot of common deployment patterns
00:17:43.520 | are research makes a model
00:17:45.560 | and then you kind of ship it in the API.
00:17:47.560 | But I think there's a lot you miss when you do that.
00:17:50.440 | You miss a lot of developer feedback
00:17:52.200 | and things that are not kind of immediately obvious.
00:17:54.840 | What we do is we get a lot of feedback from developers
00:17:57.440 | and we go and make the models better in certain ways.
00:17:59.800 | So our team does model training as well.
00:18:01.720 | We work very closely with our post-training team.
00:18:04.000 | And so for structured outputs,
00:18:05.400 | it was a collab between a bunch of teams,
00:18:07.320 | including safety systems to make a really great model
00:18:10.760 | that does structured outputs.
00:18:12.680 | - Mentioning safety systems, you have a refusal field.
00:18:15.320 | - Yes.
00:18:16.160 | - You want to talk about that?
00:18:17.000 | - Yeah, it's pretty interesting.
00:18:19.200 | So you can imagine basically if you constrain the model
00:18:22.520 | to follow a schema,
00:18:23.880 | you can imagine there being like a schema supplied
00:18:26.800 | that it would add some risk or be harmful for the model
00:18:29.720 | to kind of follow that schema.
00:18:31.800 | And we wanted to preserve our model's abilities to refuse
00:18:35.120 | when something doesn't match our policies
00:18:37.800 | or is harmful in some way.
00:18:39.320 | And so we needed to give the model an ability to refuse
00:18:42.200 | even when there is this schema.
00:18:44.000 | But also, you know, if you are a developer
00:18:46.280 | and you have this schema
00:18:47.400 | and you get back something that doesn't match it,
00:18:48.880 | you're like, "Oh, the feature's broken."
00:18:50.440 | So we wanted a really clear way
00:18:51.600 | for developers to program against this.
00:18:53.240 | So if you get something back in the content,
00:18:54.920 | you know it's valid, it's JSON parsable.
00:18:56.880 | But if you get something back in the refusal field,
00:18:59.520 | it makes for a much better UI
00:19:00.720 | for you to kind of display this to your user
00:19:02.680 | in a different way.
00:19:03.640 | And it makes it easier to program against.
00:19:05.320 | So really there was a few goals,
00:19:06.520 | but it was mainly to allow the model to continue to refuse,
00:19:09.160 | but also with a really good developer experience.
00:19:11.240 | - Yeah, why not offer it as like an error code?
00:19:14.320 | Because we have to display error codes anyway.
00:19:16.880 | - Yeah, we've falafeled for a long time about API design,
00:19:20.040 | as we are wont to do.
00:19:21.920 | And there are a few reasons against an error code.
00:19:24.560 | Like you could imagine this being a 4xx error code
00:19:26.720 | or something, but you know,
00:19:28.120 | the developer's paying for the tokens.
00:19:29.620 | And that's kind of atypical for like a 4xx error code.
00:19:33.640 | - We pay with errors anyway, right?
00:19:36.840 | Or no? - So 4xx is-
00:19:38.120 | - Is not, that's a U error.
00:19:40.240 | - Right, and it doesn't make sense as a 5xx either,
00:19:43.500 | 'cause it's not our fault.
00:19:44.340 | It's the way the API, the model is designed.
00:19:46.880 | I think the HTTP spec is a little bit limiting
00:19:49.680 | for AI in a lot of ways.
00:19:51.520 | Like there are things that are in between
00:19:53.080 | your fault and my fault.
00:19:54.120 | There's kind of like the model's fault
00:19:56.040 | and there's no, you know, error code for that.
00:19:58.640 | So we really have to kind of invent
00:20:00.720 | a lot of the paradigm here.
00:20:02.160 | - We get 6xx.
00:20:03.840 | - Yeah, that's one option.
00:20:04.680 | There's actually some like esoteric error codes
00:20:06.600 | we've considered adopting.
00:20:07.880 | - 328, my favorite.
00:20:09.400 | - Yeah, there's the teapot one.
00:20:11.940 | - Hey! (laughs)
00:20:13.800 | - We're still figuring that out.
00:20:14.800 | But I think there are some things,
00:20:16.320 | like for example, sometimes our model will produce tokens
00:20:19.520 | that are invalid based on kind of our language.
00:20:22.600 | And when that happens, it's an error.
00:20:25.080 | But, you know, it doesn't, 500 is fine,
00:20:28.040 | which is what we return,
00:20:29.080 | but it's not as expressive as it could be.
00:20:31.320 | So yeah, just areas where, you know,
00:20:33.480 | web 2.0 doesn't quite fit with AI yet.
00:20:37.080 | - If you have to put in a spec,
00:20:38.520 | I was gonna-- - To just change.
00:20:39.440 | Yeah, yeah, yeah.
00:20:40.400 | What would be your number one proposal to like rehaul?
00:20:42.960 | - The HTTP committee to re-invent the world.
00:20:45.780 | - Yeah, that's a good one.
00:20:47.560 | I mean, I think we just need an error
00:20:48.960 | of like a range of model error.
00:20:51.400 | And we can have many different kinds of model errors.
00:20:53.480 | Like a refusal is a model error.
00:20:55.560 | - 601, auto refusal.
00:20:58.040 | - Yeah, again, like, so we've mentioned before
00:21:00.560 | that chat completions uses this chat ML format.
00:21:03.240 | So when the model doesn't follow chat ML, that's an error.
00:21:06.440 | And we're working on reducing those errors,
00:21:07.960 | but that's like, I don't know, 602, I guess.
00:21:10.600 | - A lot of people actually no longer know what chat ML is.
00:21:13.000 | - Yeah, fair enough. - Because that was
00:21:15.360 | briefly introduced by OpenAI and then like kind of deprecated.
00:21:18.280 | Everyone who implements this under the hood knows it,
00:21:21.200 | but maybe the API users don't know it.
00:21:23.360 | - Basically, the API started with just one endpoint,
00:21:25.960 | the completions endpoint.
00:21:27.320 | And the completions endpoint,
00:21:28.400 | you just put text in and you get text out.
00:21:30.920 | And you can prompt in certain ways.
00:21:33.240 | Then we released chat GPT,
00:21:35.000 | and we decided to put that in the API as well.
00:21:37.520 | And that became the chat completions API.
00:21:39.520 | And that API doesn't just take like a string input
00:21:41.880 | and produce an output.
00:21:42.840 | It actually takes in messages and produces messages.
00:21:45.720 | And so you can get a distinction
00:21:46.800 | between like an assistant message and a user message,
00:21:48.840 | and that allows all kinds of behavior.
00:21:50.720 | And so the format under the hood for that is called chat ML.
00:21:54.520 | Sometimes, you know, because the model
00:21:56.280 | is so out of distribution based on what you're doing,
00:21:58.400 | maybe the temperature is super high,
00:22:00.120 | then it can't follow chat ML.
00:22:02.120 | - Yeah, I didn't know that there could be errors
00:22:04.520 | generated there.
00:22:05.360 | Maybe I'm not asking challenging enough questions.
00:22:07.920 | - It's pretty rare, and we're working on driving it down.
00:22:10.680 | But actually, this is a side effect
00:22:12.440 | of structured outputs now,
00:22:14.040 | which is that we have removed a class of errors.
00:22:16.280 | We didn't really mention this in the blog,
00:22:17.760 | just 'cause we ran out of space.
00:22:19.280 | But--
00:22:20.120 | - That's what we're here to do.
00:22:21.080 | - Yeah, the model used to occasionally pick a recipient
00:22:24.040 | that was invalid, and this would cause an error.
00:22:26.760 | But now we are able to constrain to chat ML
00:22:30.280 | in a more valid way.
00:22:31.520 | And this reduces a class of errors as well.
00:22:33.840 | - Recipient meaning, so there's this,
00:22:35.640 | like a few number of defined roles,
00:22:37.480 | like user, assistant, system.
00:22:39.480 | - Like recipient as in like picking the right tool.
00:22:41.840 | - Oh. - Oh.
00:22:42.760 | - So the model before was able to hallucinate a tool,
00:22:46.520 | but now it can't when you're using structured outputs.
00:22:49.680 | - Do you collaborate with other model developers
00:22:52.600 | to try and figure out this type of errors?
00:22:54.520 | Like how do you display them?
00:22:55.600 | Because a lot of people try to work with different models.
00:22:58.880 | - Yeah. - Yeah, is there any?
00:22:59.920 | - Yeah, not a ton.
00:23:01.160 | We're kind of just focused
00:23:02.240 | on making the best API for developers.
00:23:04.640 | - A lot of research and engineering, I guess,
00:23:07.160 | comes together with evals.
00:23:08.840 | You published some evals there.
00:23:10.920 | I think Gorilla is one of them.
00:23:12.720 | What is your assessment of like the state of evals
00:23:15.200 | for function calling and structured output right now?
00:23:17.720 | - Yeah, we've actually collaborated with BFCL a little bit,
00:23:21.360 | which is, I think, the same thing as Gorilla.
00:23:23.480 | - Function calling leaderboard.
00:23:24.840 | - Kudos to the team.
00:23:25.880 | Those evals are great, and we use them internally.
00:23:27.920 | Yeah, we've also sent some feedback
00:23:29.320 | on some things that are misgraded.
00:23:31.160 | And so we're collaborating to make those better.
00:23:34.320 | In general, I feel evals are kind of the hardest part of AI.
00:23:37.960 | Like when I talk to developers, it's so hard to get started.
00:23:41.040 | It's really hard to make a robust pipeline.
00:23:43.160 | And you don't want evals that are like 80% successful
00:23:46.440 | because, you know, things are gonna improve dramatically.
00:23:49.000 | And it's really hard to craft the right eval.
00:23:51.160 | You kind of want to hit everything on the difficulty curve.
00:23:53.720 | I find that a lot of these evals are mostly saturated,
00:23:56.680 | like for BFCL.
00:23:58.120 | All the models are near the top already,
00:24:00.560 | and kind of the errors are more, I would say,
00:24:02.880 | like just differences in default behaviors.
00:24:05.560 | I think most of the models on leaderboard
00:24:07.360 | can kind of get 100% with different prompting,
00:24:10.280 | but it's more kind of you're just pulling apart
00:24:12.320 | different defaults at this point.
00:24:14.400 | So yeah, I would say in general, we're missing evals.
00:24:16.200 | You know, we work on this a lot internally, but it's hard.
00:24:19.160 | - Did you, other than BFCL, would you call out any others
00:24:21.920 | just for people exploring the space?
00:24:23.600 | - SweetBench is actually like a very interesting eval,
00:24:26.000 | if people don't know.
00:24:26.960 | You basically give the model a GitHub issue and like a repo
00:24:30.640 | and just see how well it does at the issue,
00:24:32.480 | which I think is super cool.
00:24:33.320 | It's kind of like an integration test,
00:24:35.360 | I would say, for models.
00:24:36.480 | - It's a little unfair, right?
00:24:38.280 | - What do you mean?
00:24:39.120 | - A little unfair, 'cause like usually as a human,
00:24:41.400 | you have more opportunity to like ask questions
00:24:43.400 | about what it's supposed to do.
00:24:44.880 | And you're giving the model like way too little information.
00:24:47.320 | - It's a hard job.
00:24:48.160 | - To do the job.
00:24:49.200 | - But yeah, SweetBench targets like,
00:24:50.680 | how well can you follow the diff format
00:24:52.400 | and how well can you like search across files
00:24:54.400 | and how well can you write code?
00:24:55.920 | So I'm really excited about evals like that
00:24:57.640 | because the pass rate is low,
00:24:59.400 | so there's a lot of room to improve.
00:25:01.200 | And it's just targeting a really cool capability.
00:25:03.280 | - I've seen other evals for function calling
00:25:05.040 | where I think might be BFCL as well,
00:25:07.120 | where they evaluate different kinds of function calling.
00:25:10.120 | And I think the top one that people care about,
00:25:12.440 | for some reason,
00:25:13.280 | I don't know personally that this is so important to me,
00:25:15.480 | but it's parallel function calling.
00:25:17.520 | I think you confirmed that you don't support that yet.
00:25:20.600 | Why is that hard?
00:25:21.920 | Just more context about it.
00:25:23.320 | - So yeah, we put out parallel function calling
00:25:25.520 | in Dev Day last year as well.
00:25:26.960 | And it's kind of the evolution of function calling.
00:25:29.080 | So function calling V1, you just get one function back.
00:25:31.840 | Function calling V2, you can get multiple back
00:25:33.360 | at the same time and save latency.
00:25:35.000 | We have this in our API, all our models support it,
00:25:36.760 | or all of our newer models support it,
00:25:38.640 | but we don't support it with structured outputs right now.
00:25:41.640 | And there's actually a very interesting trade-off here.
00:25:44.360 | So when you basically call our API for structured outputs
00:25:48.320 | with a new schema,
00:25:49.520 | we have to build this artifact for fast sampling later on.
00:25:52.480 | But when you do parallel function calling,
00:25:54.320 | the kind of schema we follow
00:25:56.080 | is not just directly one of the function schemas.
00:25:58.360 | It's like this combined schema based on a lot of them.
00:26:01.360 | If we were to kind of do the same thing
00:26:02.680 | and build an index every time
00:26:03.840 | you pass in a list of functions,
00:26:05.240 | if you ever change the list,
00:26:06.480 | you would kind of incur more latency.
00:26:08.360 | And we thought it would be really unintuitive
00:26:09.920 | for developers and hard to reason about.
00:26:12.360 | So we decided to kind of wait
00:26:13.960 | until we can support a no-added-latency solution
00:26:16.600 | and not just kind of make it really confusing for developers.
00:26:19.400 | - Mentioning latency,
00:26:20.240 | that is something that people discovered,
00:26:21.800 | is that there is an increased cost and latency
00:26:24.160 | for the first token.
00:26:25.240 | - For the first request, yeah.
00:26:26.120 | - First request.
00:26:26.960 | Is that an issue?
00:26:27.800 | Is that going to go down over time?
00:26:28.680 | Is there just an overhead to parsing JSON
00:26:31.280 | that is just insurmountable?
00:26:33.160 | - It's definitely not insurmountable.
00:26:34.640 | And I think it will definitely go down over time.
00:26:37.040 | We just kind of take the approach of ship early and often.
00:26:41.440 | And if there's nothing in there you don't want to fix,
00:26:45.720 | then you probably shipped too late.
00:26:47.600 | So I think we will get that latency down over time.
00:26:49.960 | But yeah, I think for most developers,
00:26:51.200 | it's not a big concern.
00:26:52.480 | 'Cause you're testing out your integration,
00:26:54.000 | you're sending some requests while you're developing it,
00:26:56.280 | and then it's fast and broad.
00:26:58.000 | So it kind of works for most people.
00:26:59.560 | The alternative design space that we explored
00:27:02.480 | is like pre-registering your schema,
00:27:04.360 | so like a totally different endpoint,
00:27:06.120 | and then passing in like a schema ID.
00:27:08.320 | But we thought, you know, that was a lot of overhead
00:27:10.640 | and like another endpoint to maintain
00:27:12.400 | and just kind of more complexity for the developer.
00:27:15.120 | And we think this latency is going to come down over time.
00:27:17.520 | So it made sense to keep it kind of in chat completions.
00:27:20.440 | - I mean, hypothetically,
00:27:21.720 | if one were to ship caching at a future point,
00:27:24.480 | it would basically be the superset of that.
00:27:26.840 | - Maybe.
00:27:27.680 | I think the caching space is a little underexplored.
00:27:30.280 | Like we've seen kind of two versions of it.
00:27:32.800 | But I think, yeah,
00:27:33.640 | there's ways that maybe put less onus on the developer.
00:27:36.560 | But, you know, we haven't committed to anything yet,
00:27:38.600 | but we're definitely exploring opportunities
00:27:40.320 | for making things cheaper over time.
00:27:42.560 | - Is AGI and Agents just going to be
00:27:44.640 | a bunch of structure upload
00:27:45.800 | and function calling one next to each other?
00:27:47.640 | Like, how do you see, you know,
00:27:49.240 | there's like the model does everything.
00:27:50.760 | Where do you draw the line?
00:27:51.800 | Because you don't call these things like an agent API,
00:27:54.600 | but like if I were a startup trying to raise a C round,
00:27:57.120 | I would just do function calling and say,
00:27:58.480 | this is an agent API.
00:27:59.920 | So how do you think about the difference
00:28:01.280 | and like how people build on top of it
00:28:02.640 | for like agentic systems?
00:28:04.200 | - Yeah, love that question.
00:28:05.480 | One of the reasons we wanted to build structured outputs
00:28:07.520 | is to make agentic applications actually work.
00:28:09.880 | So right now it's really hard.
00:28:10.840 | Like if something is 95% reliable,
00:28:13.600 | but you're chaining together a bunch of calls,
00:28:15.680 | if you magnify that error rate,
00:28:17.120 | it makes your like application not work.
00:28:18.960 | So that's a really exciting thing here
00:28:20.440 | from going from like 95% to 100%.
00:28:23.080 | I'm very biased working in the API
00:28:24.760 | and working on function calling and structured outputs,
00:28:26.600 | but I think those are the building blocks
00:28:28.200 | that we'll be using kind of to distribute
00:28:30.000 | this technology very far.
00:28:31.400 | It's the way you connect like natural language
00:28:34.000 | and converting user intent
00:28:35.640 | into working with your application.
00:28:37.320 | And so I think like kind of,
00:28:38.920 | there's no way to build without it, honestly.
00:28:40.600 | Like you need your function calls to work.
00:28:42.560 | Like, yeah, we wanted to make that a lot easier.
00:28:44.960 | - Yeah, and do you think the assistance
00:28:47.240 | kind of like API thing will be a bigger part
00:28:50.000 | as people build agents?
00:28:51.280 | I think maybe most people just use messages and completion.
00:28:54.440 | - So I would say the assistance API
00:28:56.120 | was kind of a bet in a few areas.
00:28:58.200 | One bet is hosted tools.
00:28:59.880 | So we have the file search tool and code interpreter.
00:29:02.640 | Another bet was kind of statefulness.
00:29:04.640 | It's our first stateful API.
00:29:06.280 | It'll store threads and you can fetch them later.
00:29:09.160 | I would say the hosted tools aspect
00:29:11.280 | has been really successful.
00:29:12.760 | Like people love our file search tool
00:29:14.920 | and it's like saves a lot of time
00:29:16.960 | to not build your own rag pipeline.
00:29:19.000 | I think we're still iterating on the shape
00:29:20.680 | for the stateful thing to make it as useful as possible.
00:29:23.520 | Right now, there's kind of a few endpoints you need to call
00:29:26.160 | before you can get a run going.
00:29:27.800 | And we want to work to make that, you know,
00:29:29.360 | much more intuitive and easier over time.
00:29:31.400 | - One thing I'm just kind of curious about,
00:29:32.840 | did you notice any trade-offs
00:29:34.760 | when you add more structured output,
00:29:36.760 | it gets worse at some other thing that was like kind of,
00:29:39.400 | you didn't think was related at all?
00:29:41.120 | - Yeah, it's a good question.
00:29:42.600 | Yeah, I mean, models are very spiky
00:29:44.760 | and RL is hard to predict.
00:29:47.440 | And so every model kind of improves on some things
00:29:50.640 | and maybe is flat or neutral on other things.
00:29:52.880 | - Yeah, like it's like very rare to just add a capability
00:29:56.800 | and have no trade-offs in everything else.
00:29:58.320 | - So yeah, I don't have something off the top of my head,
00:30:00.240 | but I would say, yeah,
00:30:01.080 | every model is a special kind of its own thing.
00:30:04.000 | This is why we put them in API dated
00:30:06.160 | so developers can choose for themselves
00:30:07.680 | which one works best for them.
00:30:09.120 | In general, we strive to continue improving on all evals,
00:30:12.320 | but it's stochastic.
00:30:14.040 | - Yeah, able to apply the structured output system
00:30:17.120 | on backdated models like 4.0 May,
00:30:21.080 | as well as Mini, as well as August.
00:30:23.280 | - Actually the new response format
00:30:25.680 | is only available on two models.
00:30:27.360 | It's 4.0 Mini and the new 4.0.
00:30:30.160 | So the old 4.0 doesn't have the new response format.
00:30:33.640 | However, for function calling,
00:30:35.000 | we were able to enable it for all models
00:30:36.640 | that support function calling.
00:30:38.000 | And that's because those models were already trained
00:30:40.320 | to follow these schemas.
00:30:41.880 | We basically just didn't wanna add the new response format
00:30:44.400 | to models that would do poorly at it
00:30:46.400 | because they would just kind of do infinite white space,
00:30:48.840 | which is the most likely token
00:30:50.600 | if you have no idea what's going on.
00:30:52.240 | - I just wanted to call out a little bit more
00:30:53.640 | in the stuff you've done in the blog posts.
00:30:55.600 | So in blog posts, just use cases, right?
00:30:57.280 | I just want people to be like,
00:30:58.640 | yeah, we're spelling it out for you.
00:30:59.960 | Use these for extracting structured data
00:31:02.080 | from unstructured data.
00:31:03.240 | By the way, it does vision too, right?
00:31:05.440 | So that's cool.
00:31:06.880 | Dynamic UI generation.
00:31:08.480 | Actually, let's talk about dynamic UI.
00:31:10.160 | I think gen UI, I think,
00:31:12.000 | is something that people are very interested in.
00:31:14.120 | As your first example, what did you find about it?
00:31:16.920 | - Yeah, I just thought it was a super cool capability
00:31:19.040 | we have now.
00:31:19.880 | So the schemas, we support recursive schemas,
00:31:22.600 | and this allows you to do really cool stuff.
00:31:24.280 | Like, every UI is a nested tree that has children.
00:31:28.080 | So I thought that was super cool.
00:31:29.160 | You can use one schema and generate tons of UIs.
00:31:33.520 | As a backend engineer who's always struggled
00:31:35.520 | with JavaScript and frontend,
00:31:37.280 | for me, that's super cool.
00:31:38.720 | We've now built a system where I can get
00:31:40.400 | any frontend that I want.
00:31:42.080 | So yeah, that's super cool.
00:31:43.120 | The extracting structured data,
00:31:45.600 | the reality of a lot of AI applications
00:31:47.440 | is you're plugging them into your enterprise business
00:31:50.600 | and you have something that works,
00:31:52.080 | but you want to make it a little bit better.
00:31:53.920 | And so the reliability gains you get here
00:31:55.880 | is you'll never get a classification
00:31:58.600 | using the wrong enum.
00:32:00.000 | It's just exactly your types.
00:32:02.200 | So really excited about that.
00:32:03.600 | - Like maybe hallucinate the actual values, right?
00:32:06.240 | So let's clearly state what the guarantees are.
00:32:08.520 | The guarantees is that this fits the schema,
00:32:09.960 | but the schema itself may be too broad
00:32:13.160 | because the JSON schema type system doesn't say like,
00:32:16.240 | I only want to range from one to 11.
00:32:19.040 | You might give me zero.
00:32:20.480 | You might give me 12.
00:32:21.520 | - So yeah, JSON schema.
00:32:22.680 | So this is actually a good thing to talk about.
00:32:24.200 | So JSON schema is extremely vast
00:32:26.560 | and we weren't able to support every corner of it.
00:32:29.600 | So we kind of support our own dialect
00:32:31.480 | and it's described in the docs.
00:32:33.280 | And there are a few trade-offs we had to make there.
00:32:35.040 | So by default,
00:32:36.080 | if you don't pass in additional properties in a schema,
00:32:39.520 | by default, that's true.
00:32:40.800 | And so that means you can get other keys,
00:32:43.200 | which you didn't spell out,
00:32:44.720 | which is kind of the opposite of what developers want.
00:32:47.000 | You basically want to supply the keys and values
00:32:48.600 | and you want to get those keys and values.
00:32:50.520 | And so then we had to decision to make.
00:32:51.920 | It's like, do we redefine what additional properties means
00:32:54.800 | as the default?
00:32:56.040 | And that felt really bad.
00:32:56.960 | It's like, there's a schema that's predated us.
00:32:58.840 | Like, it wouldn't be good.
00:33:00.240 | It'd be better to play nice with the community.
00:33:01.920 | And so we require that you pass it in as false.
00:33:04.560 | One of our design principles is to be very explicit
00:33:06.960 | and so developers know what to expect.
00:33:09.200 | And so this is one where we decided,
00:33:11.240 | it's a little harder to discover,
00:33:12.800 | but we think you should pass this thing in
00:33:14.960 | so that we can have a very clear definition
00:33:16.880 | of what you mean and what we mean.
00:33:18.360 | There's a similar one here with required.
00:33:20.560 | By default, every key in JSON schema is optional,
00:33:23.760 | but that's not what developers want, right?
00:33:25.520 | You'd be very surprised if you passed in a bunch of keys
00:33:28.480 | and you didn't get some of them back.
00:33:29.840 | And so that's the trade-off we made,
00:33:31.040 | is to make everything required
00:33:32.400 | and have the developers spell that out.
00:33:33.880 | - Is there a require false?
00:33:35.560 | Can people turn it off or they're just getting all--
00:33:38.160 | - So developers can, basically what we recommend for that
00:33:41.120 | is to make your actual key a union type.
00:33:43.920 | And so-- - Nullable.
00:33:45.040 | - Yeah, make it union of int and null
00:33:47.520 | and that gets you the same behavior.
00:33:48.920 | - Any other of the examples you want to dive into,
00:33:50.880 | math, chain of thought?
00:33:52.120 | - Yeah, you can now specify like a chain of thought field
00:33:55.120 | before a final answer.
00:33:56.520 | This is just like a more structured way
00:33:57.960 | of extracting the final answer.
00:33:59.680 | One example we have, I think we put up a demo app
00:34:02.800 | of this math tutoring example, or it's coming out soon.
00:34:05.760 | - Did I miss it?
00:34:06.600 | Oh, okay, well.
00:34:07.420 | - Basically, it's this math tutoring thing
00:34:08.920 | and you put in an equation
00:34:10.640 | and you can go step by step and answer it.
00:34:12.760 | This is something you can do now with Structured Office.
00:34:14.480 | In the past, a developer would have to specify their format
00:34:17.560 | and then write a parser and parse out the model's output
00:34:20.800 | to be pretty hard.
00:34:21.640 | But now you just specify steps and it's an array of steps
00:34:24.480 | and every step you can render and then the user can try it
00:34:27.160 | and you can see if it matches and go on that way.
00:34:29.720 | So I think it just opens up a lot of opportunities.
00:34:32.120 | Like for any kind of UI where you want to treat
00:34:34.440 | different parts of the model's responses differently,
00:34:36.720 | Structured Outputs is great for that.
00:34:38.280 | - I remembered my question from earlier.
00:34:40.280 | I'm basically just using this to ask you all the questions
00:34:43.400 | as a user, as a daily user of the stuff that you put out.
00:34:46.160 | So one is a tip that people don't know
00:34:47.920 | and I confronted you on Twitter,
00:34:48.960 | which is you respect descriptions of JSON schemas, right?
00:34:52.240 | And you can basically use that as a prompt for the field.
00:34:54.760 | - Totally.
00:34:55.580 | - I assume that's blessed
00:34:56.420 | and people should do that. - Intentional, yeah.
00:34:58.560 | - One thing that I started to do,
00:35:00.320 | which I don't, it could be a hallucination of me,
00:35:02.160 | is I changed the property name to prompt the model
00:35:07.160 | to what I wanted to do.
00:35:08.200 | So for example, instead of saying topics as a property name,
00:35:12.600 | I would say like, "Brainstorm a list of topics up to five,"
00:35:16.800 | something like that as a property name.
00:35:19.280 | I could stick that in the description as well,
00:35:20.840 | but is that too much? (laughs)
00:35:23.240 | - Yeah, I would say, I mean, we're so early in AI
00:35:26.760 | that people are figuring out the best way to do things.
00:35:29.000 | And I love when I learn from a developer
00:35:30.880 | like a way they found to make something work.
00:35:33.280 | In general, I think there's like three
00:35:34.880 | or four places to put instructions.
00:35:37.200 | You can put instructions in the system message
00:35:39.000 | and I would say that's helpful
00:35:40.720 | for like when to call a function.
00:35:42.720 | So it's like, let's say you're building
00:35:44.840 | a customer support thing and you want the model
00:35:47.320 | to verify the user's phone number or something.
00:35:49.280 | You can tell the model in the system message,
00:35:51.000 | like here's when you should call this function.
00:35:52.680 | Then when you're within a function,
00:35:53.960 | I would say the descriptions there
00:35:55.160 | should be more about how to call a function.
00:35:57.400 | So really common is someone will have like a date
00:35:59.800 | as a string, but you don't tell the model,
00:36:01.880 | like, do you want year, year, month, month, day, day?
00:36:04.120 | Or do you want that backwards?
00:36:05.760 | And that's what a really good spot is
00:36:07.400 | for those kinds of descriptions.
00:36:08.480 | It's like, how do you call this thing?
00:36:10.200 | And then sometimes there's like really stuff
00:36:12.880 | like what you're doing.
00:36:13.720 | It's like, name the key by what you want.
00:36:16.080 | So sometimes people put like, do not use.
00:36:18.200 | And you know, if they don't want, you know,
00:36:19.920 | this parameter to be used except only in some circumstances.
00:36:22.920 | And really, I think that's the fun nature of this.
00:36:24.840 | It's like, you're figuring out the best way
00:36:26.560 | to get something out of the model.
00:36:27.760 | - Okay, so you don't have an official recommendation
00:36:29.720 | is what I'm hearing.
00:36:30.720 | - Well, the official recommendation is, you know,
00:36:32.800 | how to call a model, system instructions.
00:36:34.240 | - Exactly, exactly.
00:36:35.080 | - Or when to call a function, yeah.
00:36:36.600 | - Do you benchmark these type of things?
00:36:38.720 | So like, say with date, it's like description,
00:36:40.920 | it's like return it in like ISO 8.
00:36:43.880 | Or if you call the key date in ISO A6001,
00:36:47.920 | I feel like the benchmarks don't go that deep,
00:36:50.080 | but then all the AI engineering kind of community,
00:36:53.360 | like all the work that people do, it's like,
00:36:55.320 | oh, actually this performs better,
00:36:56.920 | but then there's no way to verify, you know?
00:36:59.200 | Like even the, I'm gonna tip you $100,000 or whatever,
00:37:03.120 | like some people say it works, some people say it doesn't.
00:37:05.600 | Do you pay attention to this stuff as you build this?
00:37:08.280 | Or are you just like, the model is just gonna get better,
00:37:10.800 | so why waste my time running evals on these small things?
00:37:14.080 | - Yeah, I would say to that,
00:37:15.520 | I would say we basically pick our battles.
00:37:17.840 | I mean, there's so much surface area of LLMs
00:37:20.920 | that we could dig into, and we're just mostly focused
00:37:23.280 | on kind of raising the capabilities for everyone.
00:37:25.880 | I think for customers, and we work with a lot of customers,
00:37:28.720 | really developing their own evals is super high leverage,
00:37:31.800 | 'cause then you can upgrade really quickly
00:37:33.280 | when we have a new model,
00:37:34.160 | you can experiment with these things with confidence.
00:37:36.560 | So yeah, we're hoping to make making evals easier.
00:37:39.160 | I think that's really generally very helpful for developers.
00:37:42.560 | - For people, I would just kind of wrap up the discussion
00:37:44.840 | for structured outputs, I immediately implemented,
00:37:47.720 | we use structured outputs for AI News,
00:37:50.200 | I use Instructor, and I ripped it out,
00:37:52.360 | and I think I saved 20 lines of code,
00:37:55.200 | but more importantly, it was like,
00:37:56.360 | we cut it by 55% of API costs based on what I measured,
00:37:59.960 | because we saved on the retries.
00:38:02.320 | - Nice, yeah, love to hear that.
00:38:03.880 | - Yeah, which people I think don't understand,
00:38:05.480 | when you can't just simply add Instructor or add outlines,
00:38:10.160 | you can do that, but it's actually gonna cost you
00:38:12.280 | a lot of retries to get the model that you want,
00:38:14.800 | but you're kind of just kind of building
00:38:16.240 | that internally into the model.
00:38:17.720 | - Yeah, I think this is the kind of feature
00:38:19.640 | that works really well when it's integrated
00:38:21.400 | with the LLM provider.
00:38:23.320 | Yeah, actually, I had folks, even my husband's company,
00:38:25.840 | he works at a small startup,
00:38:26.800 | they thought we were just retrying,
00:38:28.560 | and so I had to make them re-blog those.
00:38:31.800 | We are not retrying, you know, we're doing it in one shot,
00:38:34.080 | and this is how you save on latency and cost.
00:38:36.120 | - Awesome, any other behind-the-scenes stuff,
00:38:38.520 | just generally on structured outputs?
00:38:40.000 | We're gonna move on to the other models.
00:38:41.960 | - Yeah, I think that's it.
00:38:43.000 | - Oh, look, that's excellent products,
00:38:44.560 | and I think everyone will be using it,
00:38:45.960 | and we have the full story now that people can try out.
00:38:49.280 | So Roadmap would be parallel function calling,
00:38:51.560 | anything else that you've called out as coming soon?
00:38:53.880 | - Not quite soon, but we're thinking about,
00:38:56.120 | does it make sense to expose custom grammars
00:38:59.000 | beyond JSON schema?
00:38:59.960 | - What would you want to hear from developers
00:39:01.880 | to give you information, whether it's custom grammars
00:39:04.320 | or anything else about structured outputs?
00:39:05.800 | What would you want to know more of?
00:39:06.960 | - Just always interested in feature requests,
00:39:09.440 | what's not working, but I'd be really curious,
00:39:11.360 | what specific grammars folks want.
00:39:13.200 | I know some folks want to match programming languages
00:39:15.840 | like Python.
00:39:17.000 | There's some challenges with the expressivity
00:39:20.400 | of our implementation, and so, yeah,
00:39:22.760 | just kind of the class of grammars folks want.
00:39:25.640 | - I have a very simple one,
00:39:26.680 | which is a lot of people try to use GPT as judge, right?
00:39:30.680 | Which means they end up doing a rating system,
00:39:32.720 | and then there's like 10 different kinds of rating systems,
00:39:34.600 | there's a Likert scale, there's whatever.
00:39:36.720 | If there was an officially blessed way
00:39:38.400 | to do a rating system with structured outputs,
00:39:41.320 | everyone would use it.
00:39:42.400 | - Yeah, yeah, that makes sense.
00:39:43.520 | I mean, we often recommend using log probs
00:39:47.120 | with classification tasks.
00:39:48.680 | So rather than like sampling,
00:39:51.000 | let's say you have four options,
00:39:52.480 | like red, yellow, blue, green,
00:39:54.080 | rather than sampling two tokens for yellow,
00:39:56.880 | you can just do like A, B, C, D,
00:39:58.880 | and get the log probs of those.
00:40:00.800 | The inherent randomness of each sampling
00:40:02.880 | isn't taken into account,
00:40:04.200 | and you can just actually look at
00:40:05.200 | what is the most likely token.
00:40:07.200 | - I think this is more of like a calibration question.
00:40:09.120 | Like if I asked you to rate things from one to 10,
00:40:11.640 | a non-calibrated model might always pick seven,
00:40:14.400 | just like a human would.
00:40:15.720 | - Right.
00:40:16.800 | - So like actually have a nice gradation from one to 10
00:40:19.840 | would be the rough idea.
00:40:21.920 | And then even for structured outputs,
00:40:23.400 | I can't just say have a field of rating from one to 10
00:40:26.240 | because I have to then validate it,
00:40:28.200 | and it might give me 11.
00:40:29.680 | - Yeah, absolutely.
00:40:31.080 | - So what about model selection?
00:40:33.040 | Now you have a lot of models.
00:40:34.360 | When you first started, you had one model endpoint.
00:40:37.200 | I guess you had like the DaVinci,
00:40:39.280 | but like most people were using one model endpoint.
00:40:42.760 | Today, you have like a lot of competitive models,
00:40:45.040 | and I think we're nearing the end of the 3.5 run RIP.
00:40:49.640 | How do you advise people to like experiment,
00:40:51.840 | select, both in terms of like tasks and like costs,
00:40:54.680 | like what's your playbook?
00:40:56.360 | - In general, I think folks should start with 4.0 mini.
00:40:59.280 | That's our cheapest model,
00:41:01.080 | and it's a great workhorse.
00:41:04.080 | Works for a lot of great use cases.
00:41:05.840 | If you're not finding the performance you need,
00:41:07.640 | like maybe it's not smart enough,
00:41:09.400 | then I would suggest going to 4.0.
00:41:11.440 | And if 4.0 works well for you, that's great.
00:41:13.520 | Finally, there's some like really advanced
00:41:15.280 | frontier use cases, and maybe 4.0 is not quite cutting it,
00:41:18.880 | and there I would recommend our fine tuning API.
00:41:21.120 | Even just like 100 examples is enough to get started there,
00:41:24.400 | and you can really get the performance you're looking for.
00:41:26.360 | - We're recording this ahead of it,
00:41:27.840 | but like you're announcing other some fine tuning stuff
00:41:30.800 | that people should pay attention to.
00:41:32.280 | - Yeah, actually tomorrow we're dropping our GA
00:41:35.200 | for GPT 4.0 fine tuning.
00:41:37.040 | So 4.0 mini has been available for a few weeks now,
00:41:39.640 | and 4.0 is now gonna be generally available.
00:41:42.240 | And we also have a free training offering for a bit.
00:41:45.320 | I think until September 23rd,
00:41:47.120 | you get one million free training tokens a day.
00:41:49.720 | - This is already announced, right?
00:41:51.720 | Am I talking about a different thing?
00:41:52.560 | - So that was for 4.0 mini, and now it's also for 4.0.
00:41:55.120 | So we're really excited to see what people do with it.
00:41:57.000 | And it's actually a lot easier to get started
00:41:58.800 | than a lot of people expect.
00:42:00.000 | I think they might need tens of thousands of examples,
00:42:02.520 | but even 100 really high quality ones,
00:42:04.880 | or 1,000 is enough to get going.
00:42:06.880 | - Oh, well, we might get a separate podcast
00:42:08.960 | just specifically on that,
00:42:10.080 | but we haven't confirmed that yet.
00:42:13.000 | It basically seems like every time,
00:42:15.080 | I think people's concerns about fine tuning
00:42:17.440 | is that they're kind of locked into a model.
00:42:19.200 | And I think you're paving the path for migration of models.
00:42:22.480 | As long as they keep their original data set,
00:42:24.360 | they can at least migrate nicely.
00:42:26.200 | - Yeah, I'm not sure what we've said publicly there yet,
00:42:28.800 | but we definitely wanna make it easier for folks to migrate.
00:42:31.480 | - It's the number one concern.
00:42:32.920 | I'm just, it's obvious. (laughs)
00:42:34.800 | - Absolutely.
00:42:36.080 | I also wanna point people to,
00:42:37.560 | you have official model selection docs,
00:42:39.480 | where it's in the guide, we'll put it in the show notes,
00:42:42.560 | where it says to optimize for accuracy first,
00:42:44.960 | so prompt engineering, RAG, evals, fine tuning.
00:42:46.680 | This was done at Dev Day last year,
00:42:47.960 | so I'm just repeating things.
00:42:49.480 | And then optimize for cost and latency second,
00:42:51.880 | and there's a few sets of steps for optimizing latency,
00:42:55.120 | so people can read up on that stuff.
00:42:57.000 | - Yeah, totally.
00:42:58.280 | - We had one episode with Nigolas Carlini from DeepMind,
00:43:02.000 | and we actually talked about how some people
00:43:04.480 | don't actually get to the boundaries
00:43:06.320 | of the model performance.
00:43:07.360 | You know, they just kind of try one model,
00:43:08.720 | and it's like, "Oh, LLMs cannot do this," and they stop.
00:43:11.520 | How should people get over the hurdle?
00:43:13.160 | It's like, how do you know if you hit the model performance,
00:43:15.880 | or like you hit skill issues?
00:43:17.360 | You know, it's like, "Your prompt is not good,"
00:43:18.800 | or like, "Try another model," and whatnot.
00:43:20.600 | Is there an easy way to do that?
00:43:22.320 | - That's tough.
00:43:23.160 | Some people are really good at prompting,
00:43:24.760 | and they just kind of get it right away,
00:43:26.240 | and for others, it's more of a challenge.
00:43:28.280 | I think there's a lot we can do to make it easier
00:43:30.480 | to prompt our models, but for now,
00:43:32.000 | I think it requires a lot of creativity
00:43:33.640 | and not giving up right away, yeah.
00:43:35.320 | And a lot of people have experience now with ChatGPT.
00:43:38.040 | You know, before, ChatGPT, the easiest way to play
00:43:40.440 | with our models was in the playground,
00:43:42.640 | but now kind of everyone's played with it,
00:43:44.560 | with a model of some sort,
00:43:45.680 | and they have some sort of intuition.
00:43:47.160 | It's like, you know, if I tell you my grandma is sick,
00:43:50.040 | then maybe I'll get the right output,
00:43:51.600 | and we're hoping to kind of remove the need for that,
00:43:53.800 | but playing around with ChatGPT is a really good way
00:43:56.600 | to get a feel for, you know, how to use the API as well.
00:43:59.320 | - Will prompt engineering be here forever,
00:44:01.320 | or is it a dying art as the models get better?
00:44:04.520 | - I mean, it's like the perennial question
00:44:05.960 | of software engineering as well.
00:44:07.320 | It's like, as the models get better at coding, you know,
00:44:09.440 | if we hit a hundred on SWE Bench, what does that mean?
00:44:11.680 | I think there will always be alpha in people
00:44:13.520 | who are able to, like, clearly explain
00:44:16.040 | what they're trying to build.
00:44:17.280 | Most of engineering is like figuring out the requirements
00:44:20.120 | and stating what you're trying to do,
00:44:22.080 | and I believe this will be the case with AI as well.
00:44:24.480 | You're going to have to very clearly explain what you need,
00:44:26.600 | and some people are better than others at it,
00:44:29.080 | and people will always be building.
00:44:30.880 | It's just the tools are going to get far better.
00:44:32.960 | - Last two weeks, you released two models.
00:44:34.720 | There's GPT 4.0 2034-0806,
00:44:37.680 | and then there's also ChatGPT 4.0 latest.
00:44:39.680 | I think people were a little bit confused by that,
00:44:41.080 | and then you issued a clarification that it was,
00:44:43.600 | one's chat-tuned and the other is more
00:44:45.520 | function calling-tuned.
00:44:46.760 | Can you elaborate, just?
00:44:47.720 | - Yeah, totally.
00:44:49.120 | So part of the impetus here was to kind of very transparent
00:44:52.120 | with what's on ChatGPT and in the API.
00:44:54.960 | So basically, we're often training models,
00:44:57.520 | and there are different use cases.
00:44:58.880 | So you don't really need function calling
00:45:01.080 | for user-defined functions in ChatGPT.
00:45:03.280 | And so this gives us kind of the freedom
00:45:04.800 | to build the best model for each use case.
00:45:07.160 | So in ChatGPT latest, we're releasing
00:45:09.320 | kind of this rolling model.
00:45:11.320 | The weights aren't pinned.
00:45:12.520 | As we release new models--
00:45:14.320 | - This is literally what we use.
00:45:15.920 | - Yeah, so it's in what's in ChatGPT,
00:45:18.120 | so it's very good for like chat-style use cases.
00:45:20.800 | But for the API broadly, you know,
00:45:23.400 | we really tune our models to be good at things
00:45:25.640 | that developers want, like function calling
00:45:27.320 | and structured outputs, and when a developer builds
00:45:29.880 | their application, they want to know
00:45:31.400 | that kind of the weights are stable under them.
00:45:33.600 | And so we have this offering where it's like,
00:45:36.000 | if you're tuning to a specific model
00:45:37.640 | and you know your function works,
00:45:39.160 | you know it will never change the weights out from under you.
00:45:42.040 | And so those are the models we commit
00:45:43.240 | to supporting for a long time,
00:45:44.960 | and we think those are the best for developers.
00:45:47.160 | But we want to give it up, you know,
00:45:48.480 | we want to leave the choice to developers.
00:45:49.800 | Like, do you want the ChatGPT model
00:45:51.480 | or do you want the API model?
00:45:53.040 | And you have the freedom to choose what's best for you.
00:45:55.040 | - I think it's for people,
00:45:56.760 | they do want to pin model versions,
00:45:59.040 | so I don't know when they would use ChatGPT,
00:46:02.400 | like the rolling one, unless they're really
00:46:05.040 | just kind of cloning ChatGPT.
00:46:06.960 | Which is like, why would they?
00:46:08.600 | - I mean, I think there's a lot of interesting stuff
00:46:11.920 | that developers can do when unbounded,
00:46:14.120 | and so we don't want to limit them artificially.
00:46:16.400 | So it's kind of survival of the fittest,
00:46:18.920 | like whichever model is better, you know,
00:46:20.640 | that's the one that people should use.
00:46:21.880 | - Yeah, I talked about it to my friends,
00:46:23.680 | there's like, this isn't that new thing.
00:46:25.920 | And basically, OpenAI has never actually shared with you
00:46:29.000 | the actual ChatGPT model, and now they do.
00:46:32.040 | - Well, it's not necessarily true.
00:46:33.400 | Actually, a lot of the models we have shipped
00:46:35.240 | have been the same, but you know,
00:46:37.760 | sometimes they diverge and it's not a limitation
00:46:39.840 | we want to stick around.
00:46:41.520 | - Anything else we should know about the new model?
00:46:43.160 | I don't think there was no evals announced or anything,
00:46:47.160 | but people say it's better.
00:46:48.120 | I mean, obviously, LMSYS is like way better
00:46:50.040 | above on everything, right?
00:46:50.880 | It's like number one in the world on--
00:46:52.640 | - Yeah, we published some release notes.
00:46:55.800 | They're not as in-depth as we want to be yet,
00:46:57.840 | because it's still kind of a science
00:46:59.560 | and we're learning what actually changes with each model
00:47:02.160 | and how can we better understand the capabilities.
00:47:04.720 | But we are trying to do more release notes in the future
00:47:07.600 | and keep folks updated.
00:47:09.400 | But yeah, it's kind of an art and a science right now.
00:47:12.640 | - You need the best evals team in the world
00:47:15.600 | to help you figure this out.
00:47:16.960 | - Yeah, evals are hard.
00:47:17.880 | We're hiring if you want to come work on evals.
00:47:20.120 | - Hold that thought on hiring.
00:47:21.320 | We'll come back to the end on what you're looking for,
00:47:23.720 | 'cause obviously people want to join you
00:47:25.240 | and they want to know what qualities you're looking for.
00:47:27.960 | - So we just talked about API versus ChargedGBT.
00:47:31.240 | What's, I guess, the vision for the interface?
00:47:34.200 | You know, the mission of OpenAI is to build AGI
00:47:36.800 | that is accessible.
00:47:37.640 | Like, where is it going to come from?
00:47:40.120 | - Totally, yeah.
00:47:41.080 | So I believe that the API is kind of our broadest vehicle
00:47:44.720 | for distributing AGI.
00:47:46.680 | You know, we're building some first-party products,
00:47:48.760 | but they'll never reach every niche in the world
00:47:50.680 | and kind of every corner and community.
00:47:52.520 | And so really love working with developers
00:47:54.680 | and seeing the incredible things they come up with.
00:47:56.840 | I often find that developers kind of see the future
00:47:58.880 | before anyone else,
00:47:59.880 | and we love working with them to make it happen.
00:48:02.320 | And so really the API is a bet on going really broad.
00:48:05.680 | And we'll go very deep as well in our first-party products,
00:48:08.240 | but I think just that our impact is absolutely magnified
00:48:11.240 | by every developer that we uplift.
00:48:13.280 | - They can do the last mile where you cannot.
00:48:15.840 | Like, ChargedGBT is one type of product,
00:48:17.880 | but there's many other kinds.
00:48:19.400 | In fact, you know, I observed, I think in February,
00:48:22.520 | basically, ChargedGBT's user growth stopped
00:48:25.360 | when the API was launched,
00:48:26.760 | because everyone was kind of able to take that
00:48:29.120 | and build other things.
00:48:30.680 | That has not become true anymore
00:48:32.760 | because ChargedGBT growth has continued to grow.
00:48:34.840 | But then, you're not confirming any of this.
00:48:36.880 | This is me quoting similar web numbers,
00:48:38.800 | which have very high variance.
00:48:40.800 | - Well, the API predates ChargedGBT.
00:48:42.440 | The API was actually opened as first product,
00:48:44.680 | and the first idea for commercialization,
00:48:47.320 | that predates me as well.
00:48:48.560 | - Wide release.
00:48:49.520 | Like, GA, everyone can sign up and use it immediately.
00:48:52.320 | Yeah, that's what I'm talking about.
00:48:53.440 | But yeah, I mean, I do believe that.
00:48:55.520 | And, you know, that means you also have to expose
00:48:57.920 | all of open-end models, right?
00:48:59.920 | Like, all the multi-modal models.
00:49:02.200 | We'll ask you questions on that,
00:49:03.800 | but I think that API mission is important.
00:49:07.440 | It's interesting that the hottest new programming language
00:49:10.040 | is supposed to be English,
00:49:12.000 | but it's actually just software engineering, right?
00:49:14.640 | It's just, you know, we're talking about HTTP error codes.
00:49:17.480 | - Right.
00:49:18.320 | (laughs)
00:49:19.160 | - Yeah, I think, you know,
00:49:20.040 | engineering is still the way you access these models.
00:49:22.080 | And I think there are companies working on tools
00:49:25.640 | to make engineering more accessible for everyone,
00:49:28.360 | but there's still so much alpha
00:49:29.840 | in just writing code and deploying.
00:49:32.400 | - Yeah, one might even call it AI engineering.
00:49:34.560 | - Exactly. - I don't know.
00:49:35.880 | Yeah, so like, there's lots of war stories
00:49:37.960 | from building this platform.
00:49:39.360 | We started at the start of your career,
00:49:40.840 | and then we jumped straight to structured outputs.
00:49:42.760 | There's a whole thing, like two years,
00:49:44.440 | that we skipped in between.
00:49:45.760 | What have become your principles?
00:49:47.200 | What are your favorite stories that you like to tell?
00:49:50.120 | - We had so much fun working on the Assistance API
00:49:52.600 | and leading up to Dev Day.
00:49:53.760 | You know, things are always pretty chaotic
00:49:55.440 | when you have an externally, like a date--
00:49:58.120 | - Forcing function.
00:49:58.960 | - That is hard, and there's like a stage,
00:50:00.840 | and there's like 1,000 people coming.
00:50:03.200 | - You can always launch a wait list, I mean.
00:50:04.600 | (laughs)
00:50:05.960 | - We're trying hard not to,
00:50:07.680 | because, you know, we love it
00:50:08.760 | when people can access the thing on day one.
00:50:10.600 | And so, yeah, the Assistance API,
00:50:12.720 | we had like this really small team,
00:50:14.960 | and just working as hard as we could
00:50:18.200 | to make this come to life.
00:50:19.360 | But even, actually, the morning of,
00:50:21.320 | I don't know if you'll remember this,
00:50:22.360 | but Sam did this keynote,
00:50:24.360 | and Ramon came up,
00:50:25.960 | and they gave free credits to everybody.
00:50:28.480 | So that was live, fully live,
00:50:30.120 | as were all of the demos that day.
00:50:32.040 | But actually, maybe like two hours before that,
00:50:34.680 | we had a little outage,
00:50:36.080 | and everyone was like scrambling
00:50:37.600 | to make this thing work again.
00:50:39.320 | So, yeah, things are early and scrappy here,
00:50:42.160 | and, you know, we were really glad.
00:50:43.640 | We were a bit on the edge of our seat watching it live.
00:50:46.440 | - What's the plan B in that situation?
00:50:48.360 | If you can share.
00:50:49.200 | - Play a video.
00:50:50.040 | This is classic DevRel, right?
00:50:51.160 | I don't know.
00:50:52.000 | - I mean, I actually don't know what the plan B was.
00:50:55.120 | - No plan B.
00:50:55.960 | No failure.
00:50:56.800 | - But we just, you know, we fixed it.
00:50:58.120 | We got everything running again,
00:50:59.760 | and the demo went well.
00:51:01.640 | - Just higher-cracked Waterloo tracks.
00:51:02.960 | - Exactly.
00:51:03.800 | (laughs)
00:51:04.640 | - Skill issues, as usual.
00:51:05.720 | - Sometimes you just gotta make it happen.
00:51:07.960 | - I imagine it's actually very motivating,
00:51:09.560 | but I did hear that after Dev Day,
00:51:11.200 | like the whole company got like a few weeks off,
00:51:13.520 | just to relax a little bit.
00:51:15.760 | - Yeah, we sometimes get,
00:51:18.680 | like we just had the week of July 4th off, and yeah.
00:51:21.320 | It's hard to take vacation,
00:51:22.640 | because people are working on such exciting things,
00:51:24.320 | and it's like, you get a lot of FOMO on vacation,
00:51:26.640 | so it helps when the whole company's on vacation.
00:51:28.800 | - Mentioning Ace Assistance API,
00:51:30.400 | you actually announced a roadmap there,
00:51:32.600 | and things have developed.
00:51:34.200 | I think people may not be up to date.
00:51:36.440 | What's the offering today versus, you know, one year ago?
00:51:39.480 | - Yeah, so we've made a bunch of key improvements.
00:51:41.960 | I would say the biggest one is in the file search product.
00:51:44.520 | Before, we only supported, I think,
00:51:46.120 | like 20 files per assistant,
00:51:47.960 | and the way we used those files was like less effective.
00:51:51.120 | Basically, the model would decide based on the file name,
00:51:54.080 | whether to search a file,
00:51:55.040 | and there's not a ton of information in there.
00:51:57.360 | So our new offering, which we shipped a few months ago,
00:51:59.600 | I think now allows 10K files per assistant,
00:52:02.760 | which is like dramatically more.
00:52:04.640 | And also, it's a kind of different operation.
00:52:07.320 | So you can search semantically over all files at once,
00:52:09.600 | rather than just kind of the model choosing one up front.
00:52:12.080 | So a lot of customers have seen really good performance.
00:52:14.400 | We also have exposed more like chunking
00:52:16.400 | and re-ranking options.
00:52:18.000 | I think the re-ranking one is coming,
00:52:19.560 | I think, next week or very soon.
00:52:21.960 | So this kind of gives developers more control
00:52:24.200 | and more flexibility there.
00:52:25.320 | So we're trying to make it the easiest way
00:52:27.720 | to kind of do RAG at scale.
00:52:29.560 | - Yeah, I think that visibility into the RAG system
00:52:33.360 | was the number one thing missing from Dev Day,
00:52:35.520 | and then people got their first impressions,
00:52:37.240 | and then they never looked at it again.
00:52:38.680 | So that's important.
00:52:39.840 | The re-ranker is a core feature of, let's say,
00:52:42.800 | some other Foundation Model Labs.
00:52:44.920 | Is OpenAI going to offer a re-ranking service,
00:52:47.760 | a re-ranker model?
00:52:49.360 | - So we do re-ranking as part of it.
00:52:51.560 | I think we're soon going to ship more controls for that.
00:52:54.360 | - Okay, got it.
00:52:55.200 | And if I'm an existing LANG chain, MAMA index, whatever,
00:52:58.800 | how do you compare?
00:52:59.720 | Do you make different choices?
00:53:01.360 | Where does that exist in the spectrum of choices?
00:53:04.400 | - I think we are just coming at it
00:53:06.480 | trying to be the easiest option.
00:53:08.600 | And so ideally, you don't have to know what a re-ranker is,
00:53:12.080 | and you don't have to have a chunking strategy,
00:53:14.160 | and the thing just kind of works out of the box.
00:53:16.040 | So I would say that's where we're going,
00:53:18.160 | and then giving controls to the power users
00:53:20.400 | to make the changes they need.
00:53:22.480 | - Awesome.
00:53:23.320 | I'm going to ask about a couple other things,
00:53:24.320 | just updates on stuff also announced at Dev Day,
00:53:26.640 | and we talked about this before.
00:53:28.120 | Determinism, something that people really want.
00:53:30.480 | Dev Day will announce the Seed Program
00:53:32.240 | as well as System Fingerprint.
00:53:33.800 | And objectively, I've heard issues.
00:53:36.320 | - Yeah.
00:53:37.160 | - I don't know what's going on.
00:53:38.000 | - The Seed parameter is not fully deterministic,
00:53:41.280 | and it's kind of a best effort thing.
00:53:43.280 | - Yeah.
00:53:44.120 | - So you'll notice there's more determinism
00:53:45.320 | in the first few tokens.
00:53:46.440 | That's kind of the current implementation.
00:53:48.160 | We've heard a lot of feedback.
00:53:49.320 | We're thinking about ways to make it better,
00:53:51.080 | but it's challenging.
00:53:51.960 | It's kind of trading off against reliability and uptime.
00:53:55.280 | - Other maybe underrated API-only thing,
00:53:58.440 | Logit Bias, that's another thing
00:54:00.600 | that kind of seems very useful,
00:54:02.160 | and then maybe most people are like,
00:54:03.680 | it's a lot of work, I don't want to use it.
00:54:05.320 | Do you have any examples of use cases
00:54:07.760 | or products that are made a lot better through using it?
00:54:11.360 | - So yeah, classification is the big one.
00:54:13.480 | So Logit Bias, your valid classification outputs,
00:54:17.000 | and you're more likely to get something that matches.
00:54:19.680 | We've seen people Logit Bias punctuation tokens,
00:54:23.160 | maybe trying to get more succinct writing.
00:54:26.000 | Yeah, it's generally very much a power user feature,
00:54:28.760 | and so not a ton of folks use it.
00:54:30.720 | - I actually wanted to use it
00:54:32.280 | to reduce the incidence of the word delve.
00:54:34.320 | - Yeah.
00:54:35.240 | - Have people done that?
00:54:36.640 | - Probably, I don't know, is delve one token?
00:54:38.920 | You're probably, you got to do a lot of permutations.
00:54:41.680 | - It's used so much.
00:54:42.520 | - Maybe it is, depends on the tokenizer.
00:54:44.400 | - Are there non-public tokenizers?
00:54:46.160 | I guess you cannot answer or you would omit it.
00:54:48.480 | Are the 100K and 200K vocabs,
00:54:50.720 | like the ones that you use across all models, or?
00:54:53.240 | - Yeah, I think we have docs that publish more information.
00:54:57.000 | I don't have it off the top,
00:54:58.120 | but I think we publish which tokenizers for which model.
00:55:00.480 | - Okay, so those are the only two.
00:55:02.080 | - Rate, the tiering rate limiting system,
00:55:04.320 | I don't think there was an official blog post
00:55:06.200 | kind of announcing this,
00:55:07.040 | but it was kind of mentioned that you started tying
00:55:09.680 | fine tuning to tiering and feature rollouts.
00:55:13.960 | Just from your point of view, how do you manage that?
00:55:16.760 | And what should people know
00:55:17.680 | about the tiering system and rate limiting?
00:55:20.120 | - Yeah, I think basically the main changes here
00:55:22.280 | were to be more transparent and easier to use.
00:55:24.280 | So before developers didn't know what tier they're in,
00:55:26.280 | and now you can see that in the dashboard.
00:55:29.040 | I think it's also, I think we publish
00:55:31.400 | how you move from tier to tier.
00:55:33.400 | And so this just helps us do kind of gated rollouts
00:55:36.040 | for the fine tuning launch.
00:55:37.360 | I think everyone tier two and up has full access.
00:55:40.200 | - That makes sense.
00:55:41.040 | I would just advise people to just get to tier five
00:55:43.240 | as quickly as possible.
00:55:44.240 | (both laughing)
00:55:45.240 | - Sure.
00:55:46.080 | - Like a gold star customer, you know?
00:55:46.920 | Like, I don't know, it seems to make sense.
00:55:48.520 | - Do we want to maybe wrap with future things
00:55:51.200 | and kind of like how you think about designing and everything?
00:55:53.800 | So you just mentioned you want to be the easiest way
00:55:56.320 | to basically do everything.
00:55:58.240 | What's the relationship with other people building
00:56:00.800 | in the developer ecosystem?
00:56:02.400 | Like I think maybe in the early days, it's like, okay,
00:56:05.280 | we only have these APIs and then everybody helps us,
00:56:07.400 | but now you're kind of building a whole platform.
00:56:09.560 | How do you make decisions?
00:56:10.960 | - Yeah, I think kind of the 80/20 principle applies here.
00:56:13.960 | We'll build things that kind of capture, you know,
00:56:16.240 | 80% of the value and maybe leave the long tail
00:56:18.720 | to other developers.
00:56:19.960 | So we really prioritize by like,
00:56:21.400 | how much feedback are we getting?
00:56:22.560 | How much easier will this make something,
00:56:25.200 | like an integration for a developer?
00:56:27.040 | So yeah, we want to do more in this space
00:56:29.000 | and not just be an LLM as a service,
00:56:31.600 | but kind of AI development platform as a service.
00:56:34.320 | - Ooh, okay.
00:56:35.160 | That ties into a thing that I put in the notes
00:56:38.040 | that we prepped.
00:56:38.880 | There are other companies trying
00:56:40.120 | to be AI development platform.
00:56:42.280 | So you will compete with them
00:56:45.080 | or they just want to know what you won't build
00:56:48.200 | so that they can build it? (laughs)
00:56:50.080 | - Yeah, it's a tough question.
00:56:51.960 | I think we haven't, you know,
00:56:53.320 | determined what exactly we will and won't build,
00:56:55.600 | but you can think of something,
00:56:57.040 | if it makes it a lot easier for developers to integrate,
00:56:59.520 | you know, it's probably on our radar
00:57:01.200 | and we'll, you know, stack rank by impact.
00:57:03.440 | - Yeah, so there's like cost tracking and model fallbacks.
00:57:06.720 | Model fallbacks is an interesting one
00:57:08.240 | because people do it.
00:57:09.680 | I don't think it adds a ton of value,
00:57:11.680 | but like if you don't build it, I have to build it
00:57:14.160 | because if one API is down or something,
00:57:16.800 | I need to fall back to another one.
00:57:18.360 | - Yeah, I mean, the way we're targeting that user need
00:57:20.340 | is just by investing a lot in reliability.
00:57:22.880 | And so we- - Oh yeah.
00:57:24.320 | - We have- - Just don't fail.
00:57:25.960 | - I mean, we have improved our uptime
00:57:27.840 | like pretty dramatically over the last year
00:57:29.680 | and it's been, you know,
00:57:30.720 | the result of a lot of hard work from folks.
00:57:32.640 | So you'll see that on our status page
00:57:34.520 | and our continued commitment going forward.
00:57:37.080 | - Is the important thing about owning the platform
00:57:39.720 | that it gives you the flexibility
00:57:41.200 | to put all the kind of messy stuff behind the scenes?
00:57:44.960 | Or yeah, how do you draw the line
00:57:46.440 | between what you want to include?
00:57:48.080 | - Yeah, I just think of it as like,
00:57:49.640 | how can we onboard the next generation of AI engineers,
00:57:53.400 | as you put it, right?
00:57:54.440 | Like what's the easiest way
00:57:55.600 | to get them building really cool apps?
00:57:57.440 | And I think it's by building stuff
00:57:58.800 | to kind of hide this complexity
00:58:00.480 | or just make it really easy to integrate.
00:58:02.640 | So I think of it a lot as like,
00:58:04.160 | what is the value add we can provide
00:58:05.720 | beyond just the models that makes the models really useful?
00:58:08.760 | - Okay, we'll touch on four more features
00:58:10.360 | of the API platform that we prepped.
00:58:12.320 | Batch, Vision, Whisper, and then Team Enterprise stuff.
00:58:15.120 | So you wanted to talk about Batch.
00:58:16.680 | - Yeah. - So the rough idea is
00:58:19.160 | you give a, the contract between you and me
00:58:21.120 | is that I give you the Batch job.
00:58:23.360 | You have 24 hours to run it.
00:58:25.120 | It's kind of like spot inst for the API.
00:58:28.960 | What should people know about it?
00:58:31.040 | - So it's half off, which is a great savings.
00:58:33.960 | It also works with like 4.0 mini.
00:58:36.200 | So the savings on top of 4.0 mini is pretty crazy.
00:58:39.840 | Like the stuff you can do-
00:58:40.680 | - Like 7.5 cents or something per million.
00:58:42.560 | - Yeah, I should really have that number top of mind,
00:58:44.640 | but it's like staggeringly cheap.
00:58:46.360 | And so I think this opens up a lot more use cases.
00:58:48.640 | Like let's say you have a user activation flow
00:58:51.800 | and you want to send them an email like maybe every day
00:58:54.240 | or like at certain points in their user journey.
00:58:56.520 | So now you can do this with the Batch API
00:58:58.400 | and something that was maybe a lot more expensive
00:59:00.440 | and not feasible is now very easy to do.
00:59:02.680 | So right now we have this 24 hour turnaround time
00:59:05.040 | for half off and curious,
00:59:07.160 | would love to hear from your community,
00:59:08.440 | like what kind of turnaround time do they want?
00:59:11.040 | - I would be an ideal user of Batch
00:59:12.440 | and I cannot use Batch because it's 24 hours.
00:59:14.520 | I need two to four.
00:59:15.720 | - Two to four hours, okay.
00:59:17.360 | Yeah, that's good to know.
00:59:18.400 | But yeah, just a lot of folks haven't heard about it.
00:59:20.200 | It's also really great for like evals, running them offline.
00:59:22.720 | You don't, generally don't need them to come back
00:59:24.640 | within, you know, two hours.
00:59:26.000 | - I think you could do a range, right?
00:59:27.440 | Two to four for me, like I need to produce a daily thing
00:59:29.840 | and then 24 for like the average use case.
00:59:32.280 | And then maybe like a week, a month, who cares?
00:59:34.280 | Like for people who just have a lot to do.
00:59:36.880 | - Yeah, absolutely.
00:59:37.960 | So yeah, that's Batch API.
00:59:38.920 | I think folks should use it more.
00:59:40.400 | It's pretty cool.
00:59:41.240 | - Is there a future in which like six months is like free?
00:59:45.200 | You know, like is there like small,
00:59:47.200 | is there like super small like shards of like GPU runtime
00:59:51.040 | that like over a long enough timeline
00:59:53.320 | you can just run all these things for free?
00:59:55.400 | - Yeah, it's certainly possible.
00:59:56.760 | I think we're getting to the point
00:59:58.080 | where a lot of these are like almost free.
01:00:00.040 | - That's true.
01:00:00.880 | - Why would they work on something
01:00:02.200 | that's like completely free?
01:00:03.080 | I don't know.
01:00:03.920 | Okay, so Vision.
01:00:05.360 | Vision got G8.
01:00:06.560 | Last year, people were so wild by the GPT-4 demo
01:00:09.560 | and that was primarily Vision.
01:00:11.480 | What was it like building the Vision API?
01:00:13.080 | - Yeah, the Vision API is super cool.
01:00:14.640 | We have a great team working there.
01:00:16.800 | I think the cool thing about Vision
01:00:17.920 | is that it works across our APIs.
01:00:19.800 | So there's, you can use it in the Assistance API,
01:00:22.520 | you can use the Batch API in track completions.
01:00:24.160 | It works with structured outputs.
01:00:25.800 | I think it just helps a lot of folks
01:00:27.520 | with kind of data extraction
01:00:29.440 | where the spatial relationships between the data
01:00:32.760 | is too complicated and you can't get that over text.
01:00:35.360 | But yeah, there's a lot of really cool use cases.
01:00:37.080 | - I think the tricky thing for me is understanding
01:00:40.320 | how frequent to turn Vision from like single images
01:00:44.040 | into like effectively just always watching.
01:00:46.560 | And right now, I think people just like
01:00:48.120 | send a frame every second.
01:00:50.320 | Will that model ever change?
01:00:51.360 | Will there just be like, I stream you a video and then?
01:00:55.160 | - Yeah, I think it's very possible
01:00:56.560 | that we'll have an API where you stream video in
01:00:58.960 | and maybe, you know, to start,
01:01:00.840 | we'll do the frame sampling for you.
01:01:03.240 | - 'Cause the frame sampling is the default, right?
01:01:05.360 | - Right.
01:01:06.200 | - But I feel like it's hacky.
01:01:07.240 | - Yeah, I think it's hard for developers to do.
01:01:09.080 | And so, you know,
01:01:10.120 | we should definitely work on making that easier.
01:01:11.760 | - Is there in the Batch API,
01:01:12.920 | do you have like a time guarantees, like order guarantees?
01:01:17.120 | Like if I send you a Batch request of like a video analysis,
01:01:20.160 | I need every frame to be done in order?
01:01:22.240 | - For Batch, you send like a list of requests
01:01:24.880 | and each of them stand alone.
01:01:26.120 | So you'll get all of them finished,
01:01:27.400 | but they don't kind of chain off each other.
01:01:29.360 | - Well, if you're doing a video, you know,
01:01:31.440 | if you're doing like analyzing a video.
01:01:34.000 | - I wasn't linking video to Batch, but that's interesting.
01:01:36.920 | - Yeah, well, a video it's like, you know,
01:01:38.960 | if you have a very long video,
01:01:40.040 | you can just do a Batch of all the images
01:01:41.800 | and let it process.
01:01:42.640 | - Oh, that's a good idea.
01:01:43.480 | It's like Batch, but serially.
01:01:45.080 | - Sequential true.
01:01:46.040 | - Yeah, yeah, yeah, exactly.
01:01:46.880 | - You know.
01:01:48.120 | - But the whole point of Batch
01:01:49.080 | is you're just using kind of spare time to run it.
01:01:52.120 | Let's talk about my favorite model, Whisper.
01:01:54.520 | Oliver, I built this thing called SmallPodcaster,
01:01:58.240 | which is an open source tool for podcasters.
01:02:00.200 | And why does Whisper API not have diarization
01:02:03.520 | when everybody is transcribing people talking?
01:02:06.320 | That's my main question.
01:02:07.520 | - Yeah, it's a good question.
01:02:08.520 | And you've come to the right person.
01:02:09.600 | I actually worked on the Whisper API and shipped that.
01:02:11.920 | That was one of my first APIs I shipped.
01:02:13.720 | Long story short is that like Whisper V3,
01:02:16.840 | which we open sourced,
01:02:18.080 | has I think the diarization feature,
01:02:20.960 | but there's some like performance trade-offs.
01:02:23.240 | And so Whisper V2 is better at some things than Whisper V3.
01:02:26.240 | And so it didn't seem that worthwhile to ship Whisper V3
01:02:29.880 | compared to like the other things in our priorities.
01:02:32.160 | I think we still will at some point.
01:02:33.840 | But yeah, it's just, you know,
01:02:35.160 | there's always so many things we could work on.
01:02:37.120 | It's tough to do everything.
01:02:38.680 | - We have a Python notebook
01:02:39.800 | that does the diarization for the pod,
01:02:41.800 | but I would just like,
01:02:42.800 | you can translate like 50 languages,
01:02:45.680 | but you cannot tell me who's speaking.
01:02:47.360 | That was like the funniest thing.
01:02:49.600 | - There's like an XKCD thing about this,
01:02:51.800 | about hard problems in AI.
01:02:53.560 | I forget the one. - Yeah, yeah, yeah, exactly.
01:02:55.480 | - Tell me if this was taken in a park.
01:02:57.480 | And like, that's easy.
01:02:58.320 | And it's like, tell me if there's a bird in this picture.
01:02:59.560 | And it's like, give me 10 people on a research team.
01:03:01.800 | It's like, you never know which things are challenging
01:03:04.040 | and diarization is, I think, you know,
01:03:06.560 | more challenging than expected.
01:03:08.800 | - Yeah, yeah.
01:03:09.640 | It still breaks a lot with like overlaps, obviously.
01:03:12.480 | Sometimes similar voices it struggles with.
01:03:14.960 | Like I need to like double read the thing.
01:03:17.240 | - Totally.
01:03:18.080 | - But yeah, great model.
01:03:18.920 | I mean, it would take us so long to do transcriptions.
01:03:21.600 | And I don't know why,
01:03:22.760 | like small podcasts has better transcription
01:03:24.640 | than like mostly every commercial tool.
01:03:27.720 | - It beats the script.
01:03:28.920 | - And I'm like, I'm just using the model.
01:03:30.280 | I'm literally not doing anything.
01:03:31.600 | You know, it's just a notebook.
01:03:33.000 | So yeah, it just speaks to like,
01:03:35.120 | sometimes just using the simple OpenAI model
01:03:37.280 | is better than like figuring out your own pipeline thing.
01:03:40.640 | - I think the top feature request there just would be,
01:03:43.560 | I mean, again, you know,
01:03:44.920 | using you as a feature request, Dom,
01:03:47.240 | is like being able to bias the vocab.
01:03:49.760 | I think there is like in raw Whisper, you can do that.
01:03:51.840 | - You can pass a prompt in the API as well.
01:03:53.960 | - But you pass in the prompts, okay.
01:03:55.360 | - Yeah.
01:03:56.200 | - There's no more deterministic way to do it.
01:03:57.280 | - So this is really helpful when you have like acronyms
01:04:00.600 | that aren't very familiar to the model.
01:04:02.240 | And so you can put them in the prompt
01:04:03.440 | and you'll basically get the transcription
01:04:05.240 | using those correctly.
01:04:06.320 | - We have the AI engineer solution,
01:04:07.960 | which is just a dictionary.
01:04:09.640 | - Nice.
01:04:10.480 | - We're like all the way misspelled it in the past
01:04:12.000 | and then G-sub and like replace the thing.
01:04:14.400 | - If it works, it works.
01:04:15.320 | Like that's engineering.
01:04:17.240 | - It's like, you know, llama with like one L
01:04:19.720 | or like all these different things or like length chain.
01:04:22.600 | It like transcribes length chain and like-
01:04:24.640 | - Capitalization.
01:04:25.480 | - A bunch of like three or four different ways.
01:04:27.560 | - Yeah.
01:04:28.400 | - You guys should try the prompt feature.
01:04:29.720 | - I love these like kind of pro tip.
01:04:31.120 | Okay, fun question.
01:04:32.040 | I know we don't know yet,
01:04:33.720 | but I've been enjoying the advanced voice mode.
01:04:36.480 | It really streams back and forth
01:04:38.320 | and it handles interruptions.
01:04:39.800 | How would your audio endpoint change when that comes out?
01:04:43.640 | - We're exploring, you know, new shape of the API
01:04:46.600 | to see how it would work in this kind of
01:04:48.760 | speech to speech paradigm.
01:04:50.240 | I don't think we're ready to share quite yet,
01:04:51.840 | but we're definitely working on it.
01:04:53.160 | I think just the regular request response
01:04:55.360 | probably isn't going to be the right solution.
01:04:57.800 | - For those who are listening along,
01:04:59.040 | I think it's pretty public that OpenAI uses LiveKit
01:05:01.240 | for the chat GPT app,
01:05:03.000 | which like seems to be the socket based approach
01:05:05.800 | that people should be at least up to speed on.
01:05:08.560 | Like I think a lot of developers only do request response
01:05:10.960 | and like that doesn't work for streaming.
01:05:12.920 | - Yeah.
01:05:13.760 | When we do put out this API,
01:05:14.600 | I think we'll make it really easy for developers
01:05:16.560 | to figure out how to use it.
01:05:17.680 | Yeah.
01:05:18.520 | It's hard to do audio. - It'll be a paradigm change.
01:05:20.160 | Okay.
01:05:21.000 | And then I think the last one on our list
01:05:22.080 | was team enterprise stuff.
01:05:23.160 | Audit logs, service accounts, API keys.
01:05:24.920 | What should people know using the enterprise offering?
01:05:27.480 | - Yeah, we recently shipped our admin and audit log APIs.
01:05:31.240 | And so a lot of enterprise users
01:05:32.960 | have been asking for this for a while.
01:05:34.520 | The ability to kind of manage API keys programmatically,
01:05:37.160 | manage your projects, get the audit log.
01:05:39.200 | So we've shipped this and for folks that need it,
01:05:41.320 | it's out there and happy for your feedback.
01:05:43.600 | - Yeah, awesome. I don't use them.
01:05:44.680 | So I don't know.
01:05:45.880 | I imagine it's just like build your own internal gateway
01:05:49.040 | for your internal developers
01:05:51.400 | to manage your deployment of OpenAI.
01:05:53.560 | - Yeah.
01:05:54.400 | I mean, if you work at like a company
01:05:55.480 | that needs to keep track of all the API keys,
01:05:57.720 | it was pretty hard in the past to do this in the dashboard.
01:06:01.080 | We've also improved our SSO offering.
01:06:02.800 | So that's much easier to use now.
01:06:04.680 | - The most important feature of an enterprise company.
01:06:06.680 | - Yeah, people love SSO.
01:06:08.760 | - All right, let's go outside of OpenAI.
01:06:11.960 | What about just you personally?
01:06:14.040 | So you mentioned Waterloo.
01:06:15.240 | Maybe let's just do, why is everybody at Waterloo cracked?
01:06:18.120 | And why are people so good?
01:06:19.800 | And like, why have people not replicated it?
01:06:22.040 | Or any other commentary on your experience?
01:06:24.440 | - The first is the co-op program.
01:06:25.880 | It's obviously really good.
01:06:27.120 | You know, I did six internships,
01:06:28.800 | learned so much in those.
01:06:30.920 | I think another reason is that Waterloo is like,
01:06:34.000 | you know, it's very cold in the winter.
01:06:35.920 | It's pretty miserable.
01:06:37.400 | There's like not that much to do apart from study
01:06:39.760 | and like hack on projects.
01:06:41.160 | And there's this big like hacker mentality.
01:06:43.320 | You know, there's a Hack the North
01:06:44.920 | is a very popular hackathon.
01:06:47.520 | And there's a lot of like startup incubators.
01:06:49.280 | It's kind of just has this like startup and hacker ethos.
01:06:52.040 | Then that combined with the six internships
01:06:53.720 | means that you get people who like graduate
01:06:55.880 | with two years of experience
01:06:57.360 | and they're very entrepreneurial.
01:06:58.680 | And you know, they're down to grind.
01:07:00.440 | - I do notice a correlation between climate
01:07:03.040 | and the crackiness of engineers.
01:07:05.280 | So, you know, it's no coincidence that Seattle
01:07:07.840 | is the birthplace of Microsoft and Amazon.
01:07:10.600 | I think I had this compilation of Denmark
01:07:12.920 | where people like,
01:07:14.360 | so it's the birthplace of C++, PHP, Turbo Pascal,
01:07:18.040 | Standard ML, BNF, the thing that we just talked about,
01:07:21.160 | MD5 Crypt, Ruby on Rails, Google Maps, and V8 for Chrome.
01:07:25.600 | And it's 'cause according to Bjorn Sjostrup,
01:07:27.600 | the creator of C++, there's nothing else to do.
01:07:29.720 | - Well, you have Linus Thorvalds in Finland.
01:07:32.400 | - Yeah, I mean, you hear a lot about this,
01:07:34.680 | like in relation to SF.
01:07:35.920 | People say, you know, New York is way more fun.
01:07:37.880 | There's nothing to do in SF.
01:07:38.920 | And maybe it's a little by design
01:07:40.360 | that Altec is here.
01:07:41.360 | - The climate is too good.
01:07:42.360 | - Yeah.
01:07:43.200 | - If we also have fun things to do.
01:07:44.640 | - Nature is so nice, you can touch grass.
01:07:46.360 | Why are we not touching grass?
01:07:47.640 | - You know, restaurants close at like 8 p.m.
01:07:50.440 | Like that's what people are referring to.
01:07:51.680 | There's not a lot of like late night dining culture.
01:07:55.240 | Yeah, so you have time to wake up early and get to work.
01:07:58.440 | - You are a book recommender or book enjoyer.
01:08:01.560 | What underrated books do you recommend most to others?
01:08:03.920 | - Yeah, I think a book I read somewhat recently
01:08:05.960 | that was very formative
01:08:07.040 | was "The Making of the Prince of Persia."
01:08:09.080 | It's a striped press book.
01:08:10.400 | That book just made me want to work hard.
01:08:12.720 | Like nothing I've ever read.
01:08:14.120 | It's just like this journal of what it takes
01:08:16.640 | to like build, you know, incredible things.
01:08:19.040 | So I'd recommend that.
01:08:20.160 | - Yeah, it's funny how video games are,
01:08:21.920 | for a lot of people, at least for me,
01:08:23.320 | kind of like the, some of the moments in technology.
01:08:26.880 | Like when I played "The Science of Time" on PS2
01:08:29.040 | was like my first PlayStation 2 game.
01:08:31.760 | And I was like, man, this thing is so crazy
01:08:33.440 | compared to any PlayStation 1 game.
01:08:35.080 | And it's like, wow.
01:08:35.920 | My expectations for like the technology,
01:08:37.760 | I think like open AI is a lot of similar things,
01:08:39.720 | like the advanced voice.
01:08:41.600 | It's like, you see that thing and then you're like, okay,
01:08:44.040 | what I can expect from everybody else
01:08:45.600 | is kind of raised now, you know?
01:08:47.160 | - Totally.
01:08:48.000 | Another book I like to plug
01:08:48.920 | is called "Misbehaving" by Richard Thaler.
01:08:51.760 | He's a behavioral economist
01:08:53.240 | and talks a lot about how people act irrationally
01:08:56.080 | in terms of decision-making.
01:08:57.480 | And I actually think about that book like once a week,
01:08:59.240 | probably, at least when I'm making a decision
01:09:01.360 | and I realize that, you know, I'm falling into a fallacy
01:09:04.000 | or, you know, it could be a better decision.
01:09:06.080 | - Yeah, you did a minor in psych.
01:09:07.280 | - I did, yeah.
01:09:08.360 | I don't know if I learned that much there,
01:09:10.000 | but it was interesting.
01:09:11.200 | - Is there like an example of like a cognitive bias
01:09:14.760 | or misbehavior that you just love telling people about?
01:09:18.120 | - Yeah, people, so let's say you won tickets
01:09:20.840 | to like a Taylor Swift concert
01:09:22.720 | and I don't know how much they're going for,
01:09:23.960 | but it's probably like $10,000.
01:09:25.760 | - Oh, okay.
01:09:26.600 | - Or whatever, sure.
01:09:28.080 | And like a lot of people are like, oh, I have to keep these.
01:09:30.120 | Like I won them, it's $10,000.
01:09:31.640 | But really it's the same decision you're making
01:09:33.400 | if you have $10,000, like would you buy these tickets?
01:09:36.040 | And so people don't really think about it rationally.
01:09:37.720 | I'm like, would they rather have $10,000 of the tickets?
01:09:40.200 | For people who won it, a lot of the time
01:09:41.800 | it's going to be the $10,000,
01:09:43.040 | but their bias is because they won it.
01:09:45.360 | The world organized itself this way
01:09:47.240 | and you should keep it for some reason.
01:09:48.560 | - Yeah, oh, okay.
01:09:49.400 | I'm pretty familiar with this stuff.
01:09:50.800 | There's also a loss version of this
01:09:53.560 | where it's like, if I take it away from you,
01:09:55.800 | you respond more strongly than if I give it to you.
01:09:58.000 | - Yes.
01:09:58.840 | If people are like really upset,
01:10:00.320 | if they like don't get a promotion,
01:10:01.960 | but if they do get a promotion, they're like, okay, phew.
01:10:04.480 | It's like not even, you know, excitement.
01:10:06.440 | It's more like, we react a lot worse to losing something.
01:10:10.880 | - Which is why, like when you join like a new platform,
01:10:13.520 | they often give you points and then they'll take it away
01:10:15.760 | if you like don't do some action in like the first few days.
01:10:19.800 | - Yeah, totally.
01:10:20.640 | Yeah, the book references people
01:10:22.560 | who like operate very rationally as econs,
01:10:25.440 | as like a separate group to humans.
01:10:27.760 | And I often think like, you know,
01:10:29.760 | what would an econ do here in this moment
01:10:32.200 | and try to act that way.
01:10:34.000 | - Okay, let's do this.
01:10:34.960 | Our LLMs, econs.
01:10:36.160 | (all laughing)
01:10:38.840 | - I mean, they are maximizing probability distributions.
01:10:42.440 | - Minimizing loss.
01:10:43.280 | - Yeah, so I think way more than all of us, they are econs.
01:10:46.320 | - Whoa, okay.
01:10:47.200 | So they're more rational than us?
01:10:49.720 | - I think their optimization functions
01:10:51.200 | are more clear than ours.
01:10:52.720 | - Yeah, just to wrap, you mentioned
01:10:54.520 | you need help on a lot of things.
01:10:56.320 | - Yeah.
01:10:57.160 | - Any specific roles, call outs,
01:10:58.240 | and also people's backgrounds.
01:10:59.680 | Like, is there anything that they need to have done before?
01:11:02.400 | Like what people fit well at OpenAI?
01:11:04.400 | - Yeah, we've hired people, all kinds of backgrounds,
01:11:07.200 | people who have PhD and an ML,
01:11:09.760 | or folks who've just done engineering like me.
01:11:12.560 | And we're really hiring for a lot of teams.
01:11:14.280 | We're hiring across the Applied Org,
01:11:15.960 | which is where I sit for engineering,
01:11:17.600 | and for a lot of researchers.
01:11:19.320 | And there's a really cool model behavior role
01:11:21.880 | that we just dropped.
01:11:23.160 | So yeah, across the board,
01:11:24.760 | we'd recommend checking out our careers page,
01:11:26.320 | and you don't need a ton of experience
01:11:28.320 | in AI specifically to join.
01:11:30.760 | - I think one thing that I'm trying to get at
01:11:32.600 | is like, what kind of person does well at OpenAI?
01:11:35.400 | I think objectively you have done well.
01:11:37.200 | And I've seen other people not do as well,
01:11:38.960 | and basically be managed out.
01:11:42.000 | I know it's an intense environment.
01:11:43.840 | - I mean, the people I enjoy working with the most
01:11:45.880 | are kind of low ego, do what it takes,
01:11:49.040 | ready to roll up their sleeves,
01:11:50.880 | do what needs to be done, and unpretentious about it.
01:11:54.120 | Yeah, I also think folks that are very user-focused
01:11:57.200 | do well on kind of API and chat GBT.
01:11:59.440 | Like, the YC ethos of build something people want
01:12:02.520 | is very true at OpenAI as well.
01:12:04.760 | So I would say low ego, user-focused, driven.
01:12:08.720 | - Cool, yeah, this was great.
01:12:10.040 | Thank you so much for coming on.
01:12:10.880 | - Yeah, thanks for having me.
01:12:12.320 | (upbeat music)
01:12:14.900 | (upbeat music)
01:12:17.480 | (upbeat music)
01:12:20.060 | (upbeat music)