Pragmatic AI with TypeChat: Daniel Rosenwasser

Good afternoon my AI engineering friends. How are we all feeling today? There we go. We got some energy even post-launch. All right, you heard. I'm Daniel Rosenwasser. I'm the program manager on TypeScript as well as a new little experimental library. I'm here to talk about today called TypeChat. Now, this is an AI engineering conference.

Everybody here has used something like ChatGPT, right? We use it for this continuous flow of information. We've been able to prototype things with it, just get useful answers, just by having this adorable little chat interface, right? But that's this one end of the spectrum. And on the other end of the spectrum, we have our traditional apps.

These apps that are looking for this more precise sort of data to work with. So the question is, how do we make all of the new AI tools? All of these language models that are so powerful, accessible to every engineer out there. And so, just to start things off, what if we had this cute -- you know, this little app right here.

You have some basic user input at the very top, followed by these items. And each of these items has a venue name and description. So this just helps me figure out what I need to do on a rainy day in Seattle because this is every day in Seattle for me.

A lot of weather apps at this conference. But the problem that you may find with trying to bridge together these language models and these traditional apps is that you find that you need to sort of massage the data. You need to sort of like really, really, really pamper the models to give you what you're looking for.

And even after all that's said and done, by default, these apps will give you natural language. Which is great for people, but it's not great for code. So, if we just prototype this in, you know, something like a chat view, maybe you'd actually use the playground to do this.

You'd find yourself saying certain things to pamper, like keep it short and do this and put everything on its different line and do whatever. You might find that you're starting to glom onto the patterns of what the language model gives you because you've seen it in a certain way, right?

And you've noticed, oh, well, it gives me this format. Each of these things is on its own line. Each of the lines has a leading number. They're always separating the venue name by the description by a colon. So I'll just do some basic parsing. Split by new line. Remove the trailing, the leading numbers, and then split on the colon.

That is a disaster waiting to happen because you can't rely on the language model to always do this. And you can't know whether or not you're going to have something in the middle of that input that is going to just sort of wreck your parsing strategy, right? Parsing natural language is extremely hard, if not a fool's errand for most people.

The thing that many people at this conference and elsewhere have discovered is you can say, "Pretty, pretty, please, give me some JSON." And it works pretty well, right? Here's an example of what I'm expecting. Please respond with the answer. And, voila, it comes right back. But there's two issues with this.

One is just doing that on its own is not enough to guarantee that your app is actually going to get the data it's looking for. Because maybe there's an extra property that doesn't seem to align. Maybe there's not enough data in the actual response. So you need to do some level of validation.

But not just that. You can't comprehensively describe all of the things that you want, practically. In this case, I have really, really simple schema, or really, really simple example. All the objects are uniform. They all have the same properties. End of story, right? But what if something is optional?

What if something is required but needs to be null in some cases? What if this could be a string or a number but never something else? I don't know. So you will not be able to get that far for more complex examples because you end up with this combinatorial explosion.

So what we found is that you can use types. Types are this great description to actually guide the model. Here, I'm just using type type definitions and typescript. These are just plain interfaces. All I want is a thing with a list and the list has these objects and the objects have these two properties that are both strings on them.

And the beauty of these type definitions is that the types can guide the model, right? So you can actually use these types to tell a model, hey, here's some user input. Here's a user intent. Now use this with the types that I'm actually going to use in my application.

Throw it through your cool AI service, whatever that is. That maybe OpenAI, Cohere, Anthropic. Maybe it's a local model. Maybe it's Lama code. I don't know. But the point is, what we found is that if you use a language model that is sufficiently trained on both human prose, natural language, and code, this actually bridges the two worlds together.

But like I said, the guidance is not -- it's only half of the problem, right? You need to be able to actually validate what you're getting. actually validate what you're getting. And that's the key insight is that the types can also validate the results. And so what we found is in our experience, you know, we're using TypeScript.

TypeScript's great for JSON because it's a superset of JavaScript, which is a superset of JSON, which means that you can actually construct a miniature little program that underneath the hood the TypeScript compiler is using to do that validation. And if that all goes well, then great, you have well-typed data from your language model.

And if it doesn't go well, well, underneath the covers what we actually end up with is an error message, right, because it's actually using the TypeScript compiler under the hood. That error message can be used to perform a repair when you are reaching out to a language model to say, no, no, no, no, no, that's not what I wanted, try again.

And so the key insight is types are all you need. Types can actually guide and validate, and it becomes a very powerful model because -- whoops! Well, yes, actually. That's the key insight that we have with TypeChat. It's a library on NPM right now. It's a TypeScript library at the moment.

And basically you have bundled this all together and make it easy to just guide a language model, perform these queries, and actually make sure that you're actually getting well-typed data from the language models. And so you can actually use much more complex examples as well. You might say, like, I have a coffee shop, and the coffee shop has this schema, these types.

You define them like this, and basically you can use that to combine that with the user intent and input, and you get well-typed output. And I'll actually demo that right now. What I have here is my -- you know, the TypeChat repository cloned, NPM installed, everything's set up, and we have an examples directory.

And I think if you're just curious to get started with TypeChat, the examples directory gets you started. We have a table -- if you look at the readme, we have a table of all of our examples. They kind of increase in complexity and difficulty, and the first one is like a sentiment thing where we say if something is positive, negative, or neutral.

But that's so basic, it's like our hello world, I actually want to go back to that coffee shop example that I showed you just now. So we have this coffee shop schema, and this is just a bunch of types, right? You probably have something similar in your preferred language as well.

And what I can do here is I'm just going to run our entry points, and from the command prompt I actually have a little prompt, and I can actually just make orders here. So I can say one latte with foam, please. Ta-da! Right? Yeah. So, you know, it's -- this is the key thing is that it's actually so simple, and it actually just works very well in a surprising way.

I could just tell you about that, and I could walk off, and that's not really good enough. I know. What happens if I say one latte and a medium purple gorilla named Bonsai? So what actually happened here is technically, when we ran this prompt, this thing succeeded. But even though we got a successful result, we were able to do this sort of recovery here.

We actually, in our app, are able to say, "I didn't understand the following: a medium purple gorilla named Bonsai." And that actually showed up in the JSON. And the reason that it did is because we have this thing called "unknown text." So we've started to see these patterns in that instead of doing this sort of prompt engineering, you're doing schema engineering.

You're able to sort of thread through these results into your app. Because if you actually, you know, remove this stuff -- and let me show you what this actually looks like. If you look at the coffee shop example, this is under 40 lines of code, right? So the magic here actually comes from we create a model.

We infer it based on your environment settings. And then the actual magic is that we have this JSON translator. You give us the contents of your types, you select the type that you're expecting, and then every single time you need to translate a user intent, you just run this translate function.

Now I'm getting type errors because I removed the type and it's telling me, like, this will never happen. Whoops! Not that. So if I rerun this thing, and I say one cappuccino -- cappuccino, I can't spell anything today -- and a purple gorilla named Bonsai. I want to be precise here.

So I got a bagel with butter because I asked for Bonsai. And the thing is that the -- what's going to happen is that the language model really doesn't want to disappoint you. It really wants to make sure you're getting what you want. So this is the thing is you can actually define a schema that is rich enough to anticipate failure, gives you a chance to recover, show that to the user, say, I got this and this and this and that.

It wasn't so clear on that. And that's kind of the beauty of this approach. It's very simple, and it's really just about defining types, which you're going to use in your application anyway. Now, there's this other thing that we started encountering when we showed this off to teams internally.

People said, well, that's all cool. You're turning coffee into code. I do, too. How do I actually do something more rich, like commands? What if I want to actually script my application in some way? Well, this approach I just showed you actually works for very simple stuff as well, right?

You can imagine something where you say, schedule an appointment for me, and that turns into a specific command for a calendar app. In fact, in our examples, we actually have that. What if you want to string together multiple things? Hey, that's just a list of commands, right? Kind of.

The problem with this is if I want these to kind of thread through to each other, this is a simple example, so it's just going, you know, run the command, get the output, go to the input, et cetera, et cetera, et cetera, et cetera. What if you have something that expects multiple arguments?

What if you want to reuse a result? Sure seems like you need variables and other things like that here. So we ask ourselves, is there a thing here where you can imagine you can just generate code and just take the same approach where types are all you need? So what if you could just define, here's all the methods that I want you to be able to call, come back with some code that only calls those methods, and then generates a program like this.

The problem is that you really want to have some sort of sandboxing and safety constraints in place, right? So you might start saying I need availability, I can't just endlessly loop here, so I'm not going to allow loops, I'm not going to allow lambdas and whatever. The problem is that even if you decide I'm going to pick a subset of a language like JavaScript or Python or whatever you have, the language models have seen so much of that code that they're going to draw outside the lines.

And then you'll hit this failure case, and then you just won't get a result. You won't get a bad result, you just won't get a result that conforms to what you're expecting. And then you still have to worry about sandboxing, and then there's all these questions about synchronous versus asynchronous APIs and all this other stuff too that language models don't tend to understand because I guess most people don't either.

So what we actually have been trying is we generate a fake language. We generate a fake language still based on the types, but it's in the form of JSON, actually. And so you have things like refs, and refs are just references to prior results. And if you're familiar with, like, if you're a compiler, this may look like SSA, it might look like an AST, whatever.

But we use that to construct a fake TypeScript program in memory as well, and use that to make sure that not just are you calling only the methods that are available to you, that you can only do certain actions, but also that the inputs from prior steps matches up with the types that you're defining from your API.

And so that kind of comes back to types are all you need. We have another really simple example for -- we have a math schema. This is basically a calculator in sheep's clothing. So if you go back and we run this here, we have another prompt that's an abacus, that's the closest thing to a calculator I could get.

If we could say something like add 1 to 41 and then divide by 7. Now, basically what happened here is we made a language model good at math. So we've also solved a whole other set of -- yeah. More seriously, though. So at each of these steps, we're actually performing -- having the language model call a method, form an operation, and if you actually look at the code here -- math main -- this is all under 50 lines of code.

We are able to do the same sort of translation. We have a separate thing called a program translator. And in that program translator, when you are successfully able to validate your results, you know, you say if this thing is a success -- or not a success, just jump out, otherwise do some stuff with it.

We have this evaluate function, and this evaluate function takes a callback, and that callback is just sort of like this instruction interpreter. And so you can do this with objects, you can do this with a function, with a switch case or whatever. But the point is that this actually allows you to do some richer tasks.

Now there are other approaches for many of these things, and they overlap with what typechat does. But the cool thing is that typechat is able to actually give you this level of validation for both JSON and programs. And it's something that we're also experimenting with other languages, too. So for example, people at this conference have been saying, yeah, you know, TypeScript is very cool, and I agree with them because I work on TypeScript.

But how would I make this work with Python? And so we have been experimenting with this, and we have been getting fairly good results. I'm able to do something like the coffee shop with a very similar approach using types. I'm able to do something similar with the calculator app, just defining methods on a class with comments and all this other stuff that helps the model do a little bit better.

And it works really well. We can do more complex examples, too, like we have this CSV example. Maybe I want to be able to -- well, I'm not going to get into -- oh, pipenv. The demo gods are going to kill me here. That. That. Brutal. Okay. I can just create a program that does this now.

I have this entire API that grabs columns and is able to perform certain operations and then do joins that do filtering and joining and all this other stuff as well because it just sort of does this selection based on booleans, so read a CSV, find all the values that equal NA and then drop the rows.

And so this becomes this sort of powerful approach, and this is just a prototype of the Python stuff that we've been working on as well. It's not prime time, and if you want to talk to me about it, I'm definitely game. So what I want from you all is to try to type chat out.

Reach out. What I'm here at this conference for is to learn about what you're all trying to build, trying to help bridge the gap as well between what we're all learning on the cutting edge and making that more accessible to everyday engineers who have been at this more precise end of the spectrum, bringing the power of these language models that are so rich to the traditional apps.

Thank you very much. Come see me at the Microsoft booth, I'll be hanging out for a little bit, and thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. We'll see you next time.

Pragmatic AI with TypeChat: Daniel Rosenwasser

Transcript