back to index

Pragmatic AI with TypeChat: Daniel Rosenwasser


Whisper Transcript | Transcript Only Page

00:00:00.000 | Good afternoon my AI engineering friends. How are we all feeling today?
00:00:18.280 | There we go. We got some energy even post-launch. All right, you heard. I'm Daniel Rosenwasser.
00:00:25.020 | I'm the program manager on TypeScript as well as a new little experimental library. I'm here
00:00:29.920 | to talk about today called TypeChat. Now, this is an AI engineering conference. Everybody
00:00:35.880 | here has used something like ChatGPT, right? We use it for this continuous flow of information.
00:00:41.440 | We've been able to prototype things with it, just get useful answers, just by having this
00:00:46.700 | adorable little chat interface, right? But that's this one end of the spectrum. And on
00:00:51.420 | the other end of the spectrum, we have our traditional apps. These apps that are looking for this more
00:00:56.760 | precise sort of data to work with. So the question is, how do we make all of the new AI
00:01:02.720 | tools? All of these language models that are so powerful, accessible to every engineer
00:01:07.680 | out there. And so, just to start things off, what if we had this cute -- you know, this little
00:01:13.480 | app right here. You have some basic user input at the very top, followed by these items. And
00:01:19.080 | each of these items has a venue name and description. So this just helps me figure out what I need
00:01:23.720 | to do on a rainy day in Seattle because this is every day in Seattle for me. A lot of weather
00:01:28.780 | apps at this conference. But the problem that you may find with trying to bridge together these
00:01:36.100 | language models and these traditional apps is that you find that you need to sort of massage
00:01:41.920 | the data. You need to sort of like really, really, really pamper the models to give you what you're
00:01:46.640 | looking for. And even after all that's said and done, by default, these apps will give you
00:01:53.200 | natural language. Which is great for people, but it's not great for code.
00:01:59.300 | So, if we just prototype this in, you know, something like a chat view, maybe you'd actually
00:02:03.860 | use the playground to do this. You'd find yourself saying certain things to pamper, like keep it
00:02:08.260 | short and do this and put everything on its different line and do whatever. You might find that you're
00:02:15.060 | starting to glom onto the patterns of what the language model gives you because you've seen it in
00:02:19.860 | a certain way, right? And you've noticed, oh, well, it gives me this format. Each of these things is on
00:02:24.100 | its own line. Each of the lines has a leading number. They're always separating the venue name by the
00:02:29.220 | description by a colon. So I'll just do some basic parsing. Split by new line. Remove the trailing,
00:02:35.300 | the leading numbers, and then split on the colon. That is a disaster waiting to happen because you
00:02:43.460 | can't rely on the language model to always do this. And you can't know whether or not you're going to
00:02:49.380 | have something in the middle of that input that is going to just sort of wreck your parsing strategy,
00:02:54.180 | right? Parsing natural language is extremely hard, if not a fool's errand for most people.
00:02:59.140 | The thing that many people at this conference and elsewhere have discovered is you can say,
00:03:07.380 | "Pretty, pretty, please, give me some JSON." And it works pretty well, right? Here's an example of
00:03:14.740 | what I'm expecting. Please respond with the answer. And, voila, it comes right back. But there's two
00:03:21.620 | issues with this. One is just doing that on its own is not enough to guarantee that your app is actually
00:03:27.540 | going to get the data it's looking for. Because maybe there's an extra property that doesn't seem
00:03:34.180 | to align. Maybe there's not enough data in the actual response. So you need to do some level of
00:03:38.020 | validation. But not just that. You can't comprehensively describe all of the things that you want,
00:03:46.260 | practically. In this case, I have really, really simple schema, or really, really simple example.
00:03:52.260 | All the objects are uniform. They all have the same properties. End of story, right? But what if
00:03:58.980 | something is optional? What if something is required but needs to be null in some cases? What if this could be a
00:04:04.500 | string or a number but never something else? I don't know. So you will not be able to get that far for
00:04:12.900 | more complex examples because you end up with this combinatorial explosion. So what we found is that you
00:04:18.980 | can use types. Types are this great description to actually guide the model. Here, I'm just using type
00:04:27.140 | type definitions and typescript. These are just plain interfaces. All I want is a thing with a list and the
00:04:31.540 | list has these objects and the objects have these two properties that are both strings on them. And the
00:04:35.540 | beauty of these type definitions is that the types can guide the model, right? So you can actually use these
00:04:44.580 | types to tell a model, hey, here's some user input. Here's a user intent. Now use this with the types that I'm actually going to use in my application.
00:04:54.980 | Throw it through your cool AI service, whatever that is. That maybe OpenAI, Cohere, Anthropic. Maybe it's a local model. Maybe it's Lama
00:05:03.380 | code. I don't know. But the point is, what we found is that if you use a language model that is sufficiently trained on both
00:05:11.380 | human prose, natural language, and code, this actually bridges the two worlds together. But like I said, the guidance is not -- it's only half of the problem, right? You need to be able to actually validate what you're getting.
00:05:23.380 | actually validate what you're getting.
00:05:25.900 | And that's the key insight is that the types can also validate the results.
00:05:30.720 | And so what we found is in our experience, you know, we're using TypeScript.
00:05:35.260 | TypeScript's great for JSON because it's a superset of JavaScript, which is a superset
00:05:38.940 | of JSON, which means that you can actually construct a miniature little program that
00:05:43.660 | underneath the hood the TypeScript compiler is using to do that validation.
00:05:49.020 | And if that all goes well, then great, you have well-typed data from your language model.
00:05:56.500 | And if it doesn't go well, well, underneath the covers what we actually end up with is
00:06:00.240 | an error message, right, because it's actually using the TypeScript compiler under the hood.
00:06:04.880 | That error message can be used to perform a repair when you are reaching out to a language
00:06:09.780 | model to say, no, no, no, no, no, that's not what I wanted, try again.
00:06:15.080 | And so the key insight is types are all you need.
00:06:20.080 | Types can actually guide and validate, and it becomes a very powerful model because -- whoops!
00:06:27.620 | Well, yes, actually.
00:06:30.840 | That's the key insight that we have with TypeChat.
00:06:32.700 | It's a library on NPM right now.
00:06:34.560 | It's a TypeScript library at the moment.
00:06:38.000 | And basically you have bundled this all together and make it easy to just guide a language model,
00:06:43.120 | perform these queries, and actually make sure that you're actually getting well-typed data
00:06:48.820 | from the language models.
00:06:50.900 | And so you can actually use much more complex examples as well.
00:06:53.680 | You might say, like, I have a coffee shop, and the coffee shop has this schema, these types.
00:06:58.720 | You define them like this, and basically you can use that to combine that with the user
00:07:04.320 | intent and input, and you get well-typed output.
00:07:08.400 | And I'll actually demo that right now.
00:07:12.460 | What I have here is my -- you know, the TypeChat repository cloned, NPM installed, everything's
00:07:20.720 | set up, and we have an examples directory.
00:07:22.720 | And I think if you're just curious to get started with TypeChat, the examples directory
00:07:27.920 | gets you started.
00:07:29.240 | We have a table -- if you look at the readme, we have a table of all of our examples.
00:07:33.820 | They kind of increase in complexity and difficulty, and the first one is like a sentiment thing
00:07:39.200 | where we say if something is positive, negative, or neutral.
00:07:43.280 | But that's so basic, it's like our hello world, I actually want to go back to that coffee shop
00:07:46.960 | example that I showed you just now.
00:07:49.480 | So we have this coffee shop schema, and this is just a bunch of types, right?
00:07:57.320 | You probably have something similar in your preferred language as well.
00:08:01.920 | And what I can do here is I'm just going to run our entry points, and from the command prompt
00:08:08.620 | I actually have a little prompt, and I can actually just make orders here.
00:08:12.000 | So I can say one latte with foam, please.
00:08:19.840 | Ta-da!
00:08:20.840 | Right?
00:08:23.840 | Yeah.
00:08:24.840 | So, you know, it's -- this is the key thing is that it's actually so simple, and it actually
00:08:32.000 | just works very well in a surprising way.
00:08:35.840 | I could just tell you about that, and I could walk off, and that's not really good enough.
00:08:42.560 | I know.
00:08:43.680 | What happens if I say one latte and a medium purple gorilla named Bonsai?
00:08:53.680 | So what actually happened here is technically, when we ran this prompt, this thing succeeded.
00:09:02.400 | But even though we got a successful result, we were able to do this sort of recovery here.
00:09:08.220 | We actually, in our app, are able to say, "I didn't understand the following: a medium purple
00:09:13.160 | gorilla named Bonsai."
00:09:14.520 | And that actually showed up in the JSON.
00:09:16.880 | And the reason that it did is because we have this thing called "unknown text."
00:09:21.240 | So we've started to see these patterns in that instead of doing this sort of prompt engineering,
00:09:26.240 | you're doing schema engineering.
00:09:27.800 | You're able to sort of thread through these results into your app.
00:09:31.920 | Because if you actually, you know, remove this stuff -- and let me show you what this actually
00:09:36.680 | looks like.
00:09:37.680 | If you look at the coffee shop example, this is under 40 lines of code, right?
00:09:42.300 | So the magic here actually comes from we create a model.
00:09:46.420 | We infer it based on your environment settings.
00:09:49.780 | And then the actual magic is that we have this JSON translator.
00:09:52.800 | You give us the contents of your types, you select the type that you're expecting, and then
00:09:57.080 | every single time you need to translate a user intent, you just run this translate function.
00:10:02.120 | Now I'm getting type errors because I removed the type and it's telling me, like, this will
00:10:05.020 | never happen.
00:10:06.020 | Whoops!
00:10:07.020 | Not that.
00:10:08.620 | So if I rerun this thing, and I say one cappuccino -- cappuccino, I can't spell anything today -- and
00:10:20.060 | a purple gorilla named Bonsai.
00:10:24.740 | I want to be precise here.
00:10:30.120 | So I got a bagel with butter because I asked for Bonsai.
00:10:35.240 | And the thing is that the -- what's going to happen is that the language model really doesn't
00:10:40.220 | want to disappoint you.
00:10:41.560 | It really wants to make sure you're getting what you want.
00:10:44.180 | So this is the thing is you can actually define a schema that is rich enough to anticipate
00:10:52.840 | failure, gives you a chance to recover, show that to the user, say, I got this and this
00:10:57.620 | and this and that.
00:10:58.620 | It wasn't so clear on that.
00:11:00.980 | And that's kind of the beauty of this approach.
00:11:03.020 | It's very simple, and it's really just about defining types, which you're going to use in
00:11:07.180 | your application anyway.
00:11:10.020 | Now, there's this other thing that we started encountering when we showed this off to teams
00:11:15.520 | internally.
00:11:17.680 | People said, well, that's all cool.
00:11:19.500 | You're turning coffee into code.
00:11:22.420 | I do, too.
00:11:23.620 | How do I actually do something more rich, like commands?
00:11:28.360 | What if I want to actually script my application in some way?
00:11:32.760 | Well, this approach I just showed you actually works for very simple stuff as well, right?
00:11:36.920 | You can imagine something where you say, schedule an appointment for me, and that turns into
00:11:41.720 | a specific command for a calendar app.
00:11:44.040 | In fact, in our examples, we actually have that.
00:11:48.620 | What if you want to string together multiple things?
00:11:50.780 | Hey, that's just a list of commands, right?
00:11:54.620 | Kind of.
00:11:57.880 | The problem with this is if I want these to kind of thread through to each other, this is
00:12:02.460 | a simple example, so it's just going, you know, run the command, get the output, go to
00:12:07.580 | the input, et cetera, et cetera, et cetera, et cetera.
00:12:11.980 | What if you have something that expects multiple arguments?
00:12:14.660 | What if you want to reuse a result?
00:12:17.140 | Sure seems like you need variables and other things like that here.
00:12:21.340 | So we ask ourselves, is there a thing here where you can imagine you can just generate code
00:12:27.420 | and just take the same approach where types are all you need?
00:12:32.540 | So what if you could just define, here's all the methods that I want you to be able to
00:12:35.340 | call, come back with some code that only calls those methods, and then generates a program
00:12:41.220 | like this.
00:12:43.060 | The problem is that you really want to have some sort of sandboxing and safety constraints
00:12:50.060 | in place, right?
00:12:51.060 | So you might start saying I need availability, I can't just endlessly loop here, so I'm not
00:12:57.180 | going to allow loops, I'm not going to allow lambdas and whatever.
00:13:00.820 | The problem is that even if you decide I'm going to pick a subset of a language like JavaScript
00:13:04.580 | or Python or whatever you have, the language models have seen so much of that code that they're
00:13:10.360 | going to draw outside the lines.
00:13:12.180 | And then you'll hit this failure case, and then you just won't get a result.
00:13:16.060 | You won't get a bad result, you just won't get a result that conforms to what you're expecting.
00:13:19.980 | And then you still have to worry about sandboxing, and then there's all these questions about
00:13:23.580 | synchronous versus asynchronous APIs and all this other stuff too that language models don't
00:13:28.620 | tend to understand because I guess most people don't either.
00:13:31.760 | So what we actually have been trying is we generate a fake language.
00:13:37.980 | We generate a fake language still based on the types, but it's in the form of JSON, actually.
00:13:45.420 | And so you have things like refs, and refs are just references to prior results.
00:13:49.880 | And if you're familiar with, like, if you're a compiler, this may look like SSA, it might
00:13:55.000 | look like an AST, whatever.
00:13:59.060 | But we use that to construct a fake TypeScript program in memory as well, and use that to make
00:14:05.380 | sure that not just are you calling only the methods that are available to you, that you
00:14:11.100 | can only do certain actions, but also that the inputs from prior steps matches up with
00:14:17.020 | the types that you're defining from your API.
00:14:20.260 | And so that kind of comes back to types are all you need.
00:14:24.060 | We have another really simple example for -- we have a math schema.
00:14:30.380 | This is basically a calculator in sheep's clothing.
00:14:33.620 | So if you go back and we run this here, we have another prompt that's an abacus, that's
00:14:40.700 | the closest thing to a calculator I could get.
00:14:43.740 | If we could say something like add 1 to 41 and then divide by 7.
00:14:51.380 | Now, basically what happened here is we made a language model good at math.
00:15:01.320 | So we've also solved a whole other set of -- yeah.
00:15:06.020 | More seriously, though.
00:15:08.040 | So at each of these steps, we're actually performing -- having the language model call a method, form
00:15:12.780 | an operation, and if you actually look at the code here -- math main -- this is all under
00:15:21.220 | 50 lines of code.
00:15:23.060 | We are able to do the same sort of translation.
00:15:25.100 | We have a separate thing called a program translator.
00:15:28.120 | And in that program translator, when you are successfully able to validate your results, you know,
00:15:34.580 | you say if this thing is a success -- or not a success, just jump out, otherwise do some
00:15:38.960 | stuff with it.
00:15:39.900 | We have this evaluate function, and this evaluate function takes a callback, and that callback
00:15:43.840 | is just sort of like this instruction interpreter.
00:15:48.200 | And so you can do this with objects, you can do this with a function, with a switch case
00:15:51.380 | or whatever.
00:15:52.700 | But the point is that this actually allows you to do some richer tasks.
00:15:59.380 | Now there are other approaches for many of these things, and they overlap with what typechat
00:16:03.840 | does.
00:16:04.880 | But the cool thing is that typechat is able to actually give you this level of validation
00:16:08.100 | for both JSON and programs.
00:16:11.740 | And it's something that we're also experimenting with other languages, too.
00:16:15.940 | So for example, people at this conference have been saying, yeah, you know, TypeScript is
00:16:21.140 | very cool, and I agree with them because I work on TypeScript.
00:16:25.320 | But how would I make this work with Python?
00:16:27.560 | And so we have been experimenting with this, and we have been getting fairly good results.
00:16:31.880 | I'm able to do something like the coffee shop with a very similar approach using types.
00:16:38.060 | I'm able to do something similar with the calculator app, just defining methods on a class with comments
00:16:45.100 | and all this other stuff that helps the model do a little bit better.
00:16:48.860 | And it works really well.
00:16:50.300 | We can do more complex examples, too, like we have this CSV example.
00:16:57.140 | Maybe I want to be able to -- well, I'm not going to get into -- oh, pipenv.
00:17:04.920 | The demo gods are going to kill me here.
00:17:10.500 | That.
00:17:11.500 | That.
00:17:11.500 | Brutal.
00:17:12.700 | Okay.
00:17:13.700 | I can just create a program that does this now.
00:17:22.880 | I have this entire API that grabs columns and is able to perform certain operations and then
00:17:28.160 | do joins that do filtering and joining and all this other stuff as well because it just
00:17:32.300 | sort of does this selection based on booleans, so read a CSV, find all the values that equal
00:17:37.060 | NA and then drop the rows.
00:17:39.400 | And so this becomes this sort of powerful approach, and this is just a prototype of the Python stuff
00:17:43.720 | that we've been working on as well.
00:17:45.460 | It's not prime time, and if you want to talk to me about it, I'm definitely game.
00:17:52.200 | So what I want from you all is to try to type chat out.
00:17:57.220 | Reach out.
00:17:58.880 | What I'm here at this conference for is to learn about what you're all trying to build, trying
00:18:03.020 | to help bridge the gap as well between what we're all learning on the cutting edge and making
00:18:08.440 | that more accessible to everyday engineers who have been at this more precise end of the
00:18:11.840 | spectrum, bringing the power of these language models that are so rich to the traditional apps.
00:18:17.260 | Thank you very much.
00:18:18.580 | Come see me at the Microsoft booth, I'll be hanging out for a little bit, and thank you.
00:18:23.080 | Thank you.
00:18:24.080 | Thank you.
00:18:24.080 | Thank you.
00:18:24.080 | Thank you.
00:18:25.080 | Thank you.
00:18:25.080 | Thank you.
00:18:26.080 | Thank you.
00:18:26.080 | Thank you.
00:18:27.080 | Thank you.
00:18:28.080 | Thank you.
00:18:28.080 | Thank you.
00:18:29.080 | We'll see you next time.