back to indexPragmatic AI with TypeChat: Daniel Rosenwasser

00:00:00.000 |
Good afternoon my AI engineering friends. How are we all feeling today? 00:00:18.280 |
There we go. We got some energy even post-launch. All right, you heard. I'm Daniel Rosenwasser. 00:00:25.020 |
I'm the program manager on TypeScript as well as a new little experimental library. I'm here 00:00:29.920 |
to talk about today called TypeChat. Now, this is an AI engineering conference. Everybody 00:00:35.880 |
here has used something like ChatGPT, right? We use it for this continuous flow of information. 00:00:41.440 |
We've been able to prototype things with it, just get useful answers, just by having this 00:00:46.700 |
adorable little chat interface, right? But that's this one end of the spectrum. And on 00:00:51.420 |
the other end of the spectrum, we have our traditional apps. These apps that are looking for this more 00:00:56.760 |
precise sort of data to work with. So the question is, how do we make all of the new AI 00:01:02.720 |
tools? All of these language models that are so powerful, accessible to every engineer 00:01:07.680 |
out there. And so, just to start things off, what if we had this cute -- you know, this little 00:01:13.480 |
app right here. You have some basic user input at the very top, followed by these items. And 00:01:19.080 |
each of these items has a venue name and description. So this just helps me figure out what I need 00:01:23.720 |
to do on a rainy day in Seattle because this is every day in Seattle for me. A lot of weather 00:01:28.780 |
apps at this conference. But the problem that you may find with trying to bridge together these 00:01:36.100 |
language models and these traditional apps is that you find that you need to sort of massage 00:01:41.920 |
the data. You need to sort of like really, really, really pamper the models to give you what you're 00:01:46.640 |
looking for. And even after all that's said and done, by default, these apps will give you 00:01:53.200 |
natural language. Which is great for people, but it's not great for code. 00:01:59.300 |
So, if we just prototype this in, you know, something like a chat view, maybe you'd actually 00:02:03.860 |
use the playground to do this. You'd find yourself saying certain things to pamper, like keep it 00:02:08.260 |
short and do this and put everything on its different line and do whatever. You might find that you're 00:02:15.060 |
starting to glom onto the patterns of what the language model gives you because you've seen it in 00:02:19.860 |
a certain way, right? And you've noticed, oh, well, it gives me this format. Each of these things is on 00:02:24.100 |
its own line. Each of the lines has a leading number. They're always separating the venue name by the 00:02:29.220 |
description by a colon. So I'll just do some basic parsing. Split by new line. Remove the trailing, 00:02:35.300 |
the leading numbers, and then split on the colon. That is a disaster waiting to happen because you 00:02:43.460 |
can't rely on the language model to always do this. And you can't know whether or not you're going to 00:02:49.380 |
have something in the middle of that input that is going to just sort of wreck your parsing strategy, 00:02:54.180 |
right? Parsing natural language is extremely hard, if not a fool's errand for most people. 00:02:59.140 |
The thing that many people at this conference and elsewhere have discovered is you can say, 00:03:07.380 |
"Pretty, pretty, please, give me some JSON." And it works pretty well, right? Here's an example of 00:03:14.740 |
what I'm expecting. Please respond with the answer. And, voila, it comes right back. But there's two 00:03:21.620 |
issues with this. One is just doing that on its own is not enough to guarantee that your app is actually 00:03:27.540 |
going to get the data it's looking for. Because maybe there's an extra property that doesn't seem 00:03:34.180 |
to align. Maybe there's not enough data in the actual response. So you need to do some level of 00:03:38.020 |
validation. But not just that. You can't comprehensively describe all of the things that you want, 00:03:46.260 |
practically. In this case, I have really, really simple schema, or really, really simple example. 00:03:52.260 |
All the objects are uniform. They all have the same properties. End of story, right? But what if 00:03:58.980 |
something is optional? What if something is required but needs to be null in some cases? What if this could be a 00:04:04.500 |
string or a number but never something else? I don't know. So you will not be able to get that far for 00:04:12.900 |
more complex examples because you end up with this combinatorial explosion. So what we found is that you 00:04:18.980 |
can use types. Types are this great description to actually guide the model. Here, I'm just using type 00:04:27.140 |
type definitions and typescript. These are just plain interfaces. All I want is a thing with a list and the 00:04:31.540 |
list has these objects and the objects have these two properties that are both strings on them. And the 00:04:35.540 |
beauty of these type definitions is that the types can guide the model, right? So you can actually use these 00:04:44.580 |
types to tell a model, hey, here's some user input. Here's a user intent. Now use this with the types that I'm actually going to use in my application. 00:04:54.980 |
Throw it through your cool AI service, whatever that is. That maybe OpenAI, Cohere, Anthropic. Maybe it's a local model. Maybe it's Lama 00:05:03.380 |
code. I don't know. But the point is, what we found is that if you use a language model that is sufficiently trained on both 00:05:11.380 |
human prose, natural language, and code, this actually bridges the two worlds together. But like I said, the guidance is not -- it's only half of the problem, right? You need to be able to actually validate what you're getting. 00:05:25.900 |
And that's the key insight is that the types can also validate the results. 00:05:30.720 |
And so what we found is in our experience, you know, we're using TypeScript. 00:05:35.260 |
TypeScript's great for JSON because it's a superset of JavaScript, which is a superset 00:05:38.940 |
of JSON, which means that you can actually construct a miniature little program that 00:05:43.660 |
underneath the hood the TypeScript compiler is using to do that validation. 00:05:49.020 |
And if that all goes well, then great, you have well-typed data from your language model. 00:05:56.500 |
And if it doesn't go well, well, underneath the covers what we actually end up with is 00:06:00.240 |
an error message, right, because it's actually using the TypeScript compiler under the hood. 00:06:04.880 |
That error message can be used to perform a repair when you are reaching out to a language 00:06:09.780 |
model to say, no, no, no, no, no, that's not what I wanted, try again. 00:06:15.080 |
And so the key insight is types are all you need. 00:06:20.080 |
Types can actually guide and validate, and it becomes a very powerful model because -- whoops! 00:06:30.840 |
That's the key insight that we have with TypeChat. 00:06:38.000 |
And basically you have bundled this all together and make it easy to just guide a language model, 00:06:43.120 |
perform these queries, and actually make sure that you're actually getting well-typed data 00:06:50.900 |
And so you can actually use much more complex examples as well. 00:06:53.680 |
You might say, like, I have a coffee shop, and the coffee shop has this schema, these types. 00:06:58.720 |
You define them like this, and basically you can use that to combine that with the user 00:07:04.320 |
intent and input, and you get well-typed output. 00:07:12.460 |
What I have here is my -- you know, the TypeChat repository cloned, NPM installed, everything's 00:07:22.720 |
And I think if you're just curious to get started with TypeChat, the examples directory 00:07:29.240 |
We have a table -- if you look at the readme, we have a table of all of our examples. 00:07:33.820 |
They kind of increase in complexity and difficulty, and the first one is like a sentiment thing 00:07:39.200 |
where we say if something is positive, negative, or neutral. 00:07:43.280 |
But that's so basic, it's like our hello world, I actually want to go back to that coffee shop 00:07:49.480 |
So we have this coffee shop schema, and this is just a bunch of types, right? 00:07:57.320 |
You probably have something similar in your preferred language as well. 00:08:01.920 |
And what I can do here is I'm just going to run our entry points, and from the command prompt 00:08:08.620 |
I actually have a little prompt, and I can actually just make orders here. 00:08:24.840 |
So, you know, it's -- this is the key thing is that it's actually so simple, and it actually 00:08:35.840 |
I could just tell you about that, and I could walk off, and that's not really good enough. 00:08:43.680 |
What happens if I say one latte and a medium purple gorilla named Bonsai? 00:08:53.680 |
So what actually happened here is technically, when we ran this prompt, this thing succeeded. 00:09:02.400 |
But even though we got a successful result, we were able to do this sort of recovery here. 00:09:08.220 |
We actually, in our app, are able to say, "I didn't understand the following: a medium purple 00:09:16.880 |
And the reason that it did is because we have this thing called "unknown text." 00:09:21.240 |
So we've started to see these patterns in that instead of doing this sort of prompt engineering, 00:09:27.800 |
You're able to sort of thread through these results into your app. 00:09:31.920 |
Because if you actually, you know, remove this stuff -- and let me show you what this actually 00:09:37.680 |
If you look at the coffee shop example, this is under 40 lines of code, right? 00:09:42.300 |
So the magic here actually comes from we create a model. 00:09:46.420 |
We infer it based on your environment settings. 00:09:49.780 |
And then the actual magic is that we have this JSON translator. 00:09:52.800 |
You give us the contents of your types, you select the type that you're expecting, and then 00:09:57.080 |
every single time you need to translate a user intent, you just run this translate function. 00:10:02.120 |
Now I'm getting type errors because I removed the type and it's telling me, like, this will 00:10:08.620 |
So if I rerun this thing, and I say one cappuccino -- cappuccino, I can't spell anything today -- and 00:10:30.120 |
So I got a bagel with butter because I asked for Bonsai. 00:10:35.240 |
And the thing is that the -- what's going to happen is that the language model really doesn't 00:10:41.560 |
It really wants to make sure you're getting what you want. 00:10:44.180 |
So this is the thing is you can actually define a schema that is rich enough to anticipate 00:10:52.840 |
failure, gives you a chance to recover, show that to the user, say, I got this and this 00:11:00.980 |
And that's kind of the beauty of this approach. 00:11:03.020 |
It's very simple, and it's really just about defining types, which you're going to use in 00:11:10.020 |
Now, there's this other thing that we started encountering when we showed this off to teams 00:11:23.620 |
How do I actually do something more rich, like commands? 00:11:28.360 |
What if I want to actually script my application in some way? 00:11:32.760 |
Well, this approach I just showed you actually works for very simple stuff as well, right? 00:11:36.920 |
You can imagine something where you say, schedule an appointment for me, and that turns into 00:11:44.040 |
In fact, in our examples, we actually have that. 00:11:48.620 |
What if you want to string together multiple things? 00:11:57.880 |
The problem with this is if I want these to kind of thread through to each other, this is 00:12:02.460 |
a simple example, so it's just going, you know, run the command, get the output, go to 00:12:07.580 |
the input, et cetera, et cetera, et cetera, et cetera. 00:12:11.980 |
What if you have something that expects multiple arguments? 00:12:17.140 |
Sure seems like you need variables and other things like that here. 00:12:21.340 |
So we ask ourselves, is there a thing here where you can imagine you can just generate code 00:12:27.420 |
and just take the same approach where types are all you need? 00:12:32.540 |
So what if you could just define, here's all the methods that I want you to be able to 00:12:35.340 |
call, come back with some code that only calls those methods, and then generates a program 00:12:43.060 |
The problem is that you really want to have some sort of sandboxing and safety constraints 00:12:51.060 |
So you might start saying I need availability, I can't just endlessly loop here, so I'm not 00:12:57.180 |
going to allow loops, I'm not going to allow lambdas and whatever. 00:13:00.820 |
The problem is that even if you decide I'm going to pick a subset of a language like JavaScript 00:13:04.580 |
or Python or whatever you have, the language models have seen so much of that code that they're 00:13:12.180 |
And then you'll hit this failure case, and then you just won't get a result. 00:13:16.060 |
You won't get a bad result, you just won't get a result that conforms to what you're expecting. 00:13:19.980 |
And then you still have to worry about sandboxing, and then there's all these questions about 00:13:23.580 |
synchronous versus asynchronous APIs and all this other stuff too that language models don't 00:13:28.620 |
tend to understand because I guess most people don't either. 00:13:31.760 |
So what we actually have been trying is we generate a fake language. 00:13:37.980 |
We generate a fake language still based on the types, but it's in the form of JSON, actually. 00:13:45.420 |
And so you have things like refs, and refs are just references to prior results. 00:13:49.880 |
And if you're familiar with, like, if you're a compiler, this may look like SSA, it might 00:13:59.060 |
But we use that to construct a fake TypeScript program in memory as well, and use that to make 00:14:05.380 |
sure that not just are you calling only the methods that are available to you, that you 00:14:11.100 |
can only do certain actions, but also that the inputs from prior steps matches up with 00:14:17.020 |
the types that you're defining from your API. 00:14:20.260 |
And so that kind of comes back to types are all you need. 00:14:24.060 |
We have another really simple example for -- we have a math schema. 00:14:30.380 |
This is basically a calculator in sheep's clothing. 00:14:33.620 |
So if you go back and we run this here, we have another prompt that's an abacus, that's 00:14:40.700 |
the closest thing to a calculator I could get. 00:14:43.740 |
If we could say something like add 1 to 41 and then divide by 7. 00:14:51.380 |
Now, basically what happened here is we made a language model good at math. 00:15:01.320 |
So we've also solved a whole other set of -- yeah. 00:15:08.040 |
So at each of these steps, we're actually performing -- having the language model call a method, form 00:15:12.780 |
an operation, and if you actually look at the code here -- math main -- this is all under 00:15:23.060 |
We are able to do the same sort of translation. 00:15:25.100 |
We have a separate thing called a program translator. 00:15:28.120 |
And in that program translator, when you are successfully able to validate your results, you know, 00:15:34.580 |
you say if this thing is a success -- or not a success, just jump out, otherwise do some 00:15:39.900 |
We have this evaluate function, and this evaluate function takes a callback, and that callback 00:15:43.840 |
is just sort of like this instruction interpreter. 00:15:48.200 |
And so you can do this with objects, you can do this with a function, with a switch case 00:15:52.700 |
But the point is that this actually allows you to do some richer tasks. 00:15:59.380 |
Now there are other approaches for many of these things, and they overlap with what typechat 00:16:04.880 |
But the cool thing is that typechat is able to actually give you this level of validation 00:16:11.740 |
And it's something that we're also experimenting with other languages, too. 00:16:15.940 |
So for example, people at this conference have been saying, yeah, you know, TypeScript is 00:16:21.140 |
very cool, and I agree with them because I work on TypeScript. 00:16:27.560 |
And so we have been experimenting with this, and we have been getting fairly good results. 00:16:31.880 |
I'm able to do something like the coffee shop with a very similar approach using types. 00:16:38.060 |
I'm able to do something similar with the calculator app, just defining methods on a class with comments 00:16:45.100 |
and all this other stuff that helps the model do a little bit better. 00:16:50.300 |
We can do more complex examples, too, like we have this CSV example. 00:16:57.140 |
Maybe I want to be able to -- well, I'm not going to get into -- oh, pipenv. 00:17:13.700 |
I can just create a program that does this now. 00:17:22.880 |
I have this entire API that grabs columns and is able to perform certain operations and then 00:17:28.160 |
do joins that do filtering and joining and all this other stuff as well because it just 00:17:32.300 |
sort of does this selection based on booleans, so read a CSV, find all the values that equal 00:17:39.400 |
And so this becomes this sort of powerful approach, and this is just a prototype of the Python stuff 00:17:45.460 |
It's not prime time, and if you want to talk to me about it, I'm definitely game. 00:17:52.200 |
So what I want from you all is to try to type chat out. 00:17:58.880 |
What I'm here at this conference for is to learn about what you're all trying to build, trying 00:18:03.020 |
to help bridge the gap as well between what we're all learning on the cutting edge and making 00:18:08.440 |
that more accessible to everyday engineers who have been at this more precise end of the 00:18:11.840 |
spectrum, bringing the power of these language models that are so rich to the traditional apps. 00:18:18.580 |
Come see me at the Microsoft booth, I'll be hanging out for a little bit, and thank you.