back to indexRunning AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner

00:00:00.000 |
Thank you so much for coming to the workshop. My name is Gabriela de Queiroz and I'm Director 00:00:21.240 |
of AI at Microsoft. I have Pamela here. I'm Pamela and I'm a Python cloud advocate, 00:00:28.140 |
so well done on those people who said Python, but I also I worked in JavaScript for them for quite a long time and I generally like lots of languages. 00:00:40.140 |
Hi, I'm Harold. I'm a PM on VS Code and GitHub Copilot chat. 00:00:46.140 |
Awesome. So today we are going to be talking or showing you how to run an AI application in minutes. 00:00:57.140 |
So we are going to have a lot of like hands on. So be ready to do like some coding, not coding, but like going through some coding using different tools, 00:01:09.140 |
GitHub code spaces, Azure, and other tools that we are going to be talking about. 00:01:15.140 |
But just to give an overview of like the agenda, I'm going to be talking about Microsoft for startups a little bit, some of the partnerships, some of the pain points, and then we go through the AI templates and hands on. 00:01:30.140 |
So Microsoft has a program for startups. So if you have an idea, if you have a startup, you can apply to this program. 00:01:42.140 |
And what I always tell people is you don't have to have a startup per se. But if you have an idea, that's enough to apply for this program. And you get a lot of benefits and benefits that can be 00:01:54.140 |
I'll just skip I can be like credits. So you get up to $150,000 in Azure credits. You also have third party benefits like a lot of like different tools that you can use. 00:02:06.140 |
And then of course, GitHub, Microsoft 365, LinkedIn Premium, and more. You can use all the different models from OpenAI, but also like LAMA, models from Kohari, Mistral, and so on. 00:02:21.140 |
And the piece that I like the most is about the sessions that you can get one on one sessions with people like me, or Pamela, that we volunteer our time to share our knowledge with founders. 00:02:35.140 |
We can talk about maybe like, I don't know, you are hiring and then I'm an expert in hiring. So you come and talk to me and I say, hey, 00:02:42.140 |
these are some of the best practice for you when you are building your team, or you can go to technical sessions and ask more like technical pieces as well. 00:02:52.140 |
And inside this platform, we have like several things other than the benefits that I mentioned and the guidance that I just mentioned is what we call build with AI. 00:03:02.140 |
And inside we have some AI templates that the idea is like we can help you accelerate the AI application piece with some kind of like skeleton in a way. 00:03:17.140 |
So you have something up and running in like few minutes. 00:03:21.140 |
So again, you get cloud credits, you have access to dev tools, you have the AI templates, you have the one-on-one guidance. 00:03:34.140 |
And no matter where you are in your journey, if you have an idea, if you are already building or if you're scaling, this program is for you. 00:03:47.140 |
You have access to all the cutting edge AI tools, so you can innovate and streamline your AI development. 00:03:55.140 |
And on top of like the founders of this program that we have, you also have like different programs that it kind of like, it's like the next step. 00:04:03.140 |
Like let's say you are now scaling, growing, and then you use all the credits. What is next? There is a next. Like, you know, we try to guide you through the whole process. 00:04:14.140 |
So there is something called the Pegasus program, where we help you to co-sell, go to market and so on. 00:04:22.140 |
And then there are some like strategic VC partners and like accelerators that we partner with. 00:04:27.140 |
So we have partnership with Y Combinator, Neo, The Alchemist, etc. 00:04:33.140 |
Pain points for startups, there are a bunch of them. One of them is like, you don't have time. You cannot wait to go to markets. You have to go like as fast as you can. 00:04:46.140 |
You have a lot of like resource constraints. We have some issues with scalability. You have, you don't have the support and guidance. And that's where we are trying to help you with. 00:04:56.140 |
So now we are going to go to the fun part. It's like the AI template. So that's where Pamela is going to show you all the amazing things that you can do with all the different tools. 00:05:10.140 |
All right. So our goal today is potentially having you deploy maybe even three different templates. 00:05:22.140 |
Okay. So we have three different ones all like just, you know, show in the, in the browser, which ones we're going to be deploying. Right. So we have starting, we're going to start simple with this chat application here, just to make sure everything's up and working. 00:05:38.140 |
And then we've got two different rag applications. One of them is a rag on a Postgres database, like rag on a Postgres table that does a SQL filter building. 00:05:47.140 |
And then we have rag on a unstructured document. So here I've got a rag on my personal blog or like a rag on, you know, internal company documents, whatever it is that you're going to, whatever kind of documents you're going to rag on. 00:05:59.140 |
So those are the three templates we're going to be looking at today. And we have it all set up so that you should be able to deploy those templates without spending any of your own money and doing it all through our credits, which is yay. 00:06:12.140 |
Yay. All right. So, um, the first thing you need to do is get this URL. So everybody open this URL on your computer. So it's aka.ms/aie-workshop. 00:06:24.140 |
It should open up a, a word document in the browser that looks like the screenshot you see here. So you can either type in the URL or scan that QR code and get that open on your machine. 00:06:39.140 |
So let's make sure everyone's got it open. Welcome, welcome. So go ahead. Once you've got your computer ready, put this, uh, put this URL in your browser. 00:06:56.140 |
Harold, maybe you can just memorize it and then help anyone who doesn't have it. Yeah. Aie-workshop. Uh, okay. So then let me go to that actual doc here. 00:07:10.140 |
So the first thing you need is a GitHub account. Does anybody here not have a GitHub account? Okay. So everyone here has a GitHub account. Great. If you don't have a GitHub account, you can sign up for one for free right now. And that's it. 00:07:23.140 |
Um, and, um, and that should be fine. Um, the next thing you need is an Azure pass. So this is something that we've got for this workshop for this conference. And this is going to let you deploy stuff on Azure without spending any of your own money. 00:07:40.140 |
So we got a passes for 50 bucks and they're valid for seven days. So if you do want to keep hacking after the workshop, you can keep using your pass. And, uh, after seven days, it'll disappear just like Cinderella and the pumpkin. Uh, so in order to get that as your pass, you do need to have some sort of Microsoft account. 00:08:00.140 |
Microsoft account. So you can use your, like, uh, you can use a personal Microsoft account if you have one. Uh, so if you're, if you like, how do you tell which one you're logged into right now? 00:08:10.140 |
I guess if you just go to outlook.office.com, maybe you know what Microsoft account you're currently logged into. Um, and then you can see some people in the last workshop were like logged into their kid's Minecraft account. 00:08:21.140 |
So just, uh, just, uh, just, you, you need a Microsoft account and you might want to double check to see which one you're currently signed into. If you are signed into a Microsoft account, if you don't have a Microsoft account, no big deal. 00:08:32.140 |
You can make one on the spot. I made one this morning. So, uh, if you do need to make one, you can just make up a new outlook address and set it up that way. Um, so you can also make it as part of this project. 00:08:42.140 |
So we're going to go to this easy check in URL and that's linked from this doc here. So if you don't have this doc, if you just came in, we can help you get the stock open. 00:08:50.140 |
So we can get this URL and, uh, we're going to spend 10 minutes making sure we get through this step since it can be a little, a little tricky. 00:08:59.140 |
So when you go to this check in URL, right, we put this in the browser, it loads, this is what you're going to see. 00:09:05.140 |
And it says I can either create a GitHub account or log in with GitHub. So I'm going to log in with GitHub. 00:09:10.140 |
Cause I already have a GitHub account and I'm logged into this browser with it already. So I'm just going to click on that. 00:09:19.140 |
And so what's that's going to do is create a pass for my GitHub account. And so we get a pass. So each of us will get a different code based off our GitHub account. 00:09:28.140 |
So this is my, you know, basically my Azure pass promo code. So I can copy that. And then there's this button here that says, get on board with Azure. 00:09:37.140 |
This is the next step is to click this. And then we get this screen, which says, okay, this is, you can start. 00:09:49.140 |
And when I click this here, it says what my currently logged in account is. So this is where you should check to make sure you're happy with what account you're logged in with. And you don't want to switch. 00:09:59.140 |
Um, I don't recommend using a corporate account. If you do have a corporate account, like don't just don't use it. It's going to be problematic for various reasons. 00:10:07.140 |
Cause corporate accounts may have restrictions that won't let you deploy things. So we do recommend using some sort of personal account or making up a new account. 00:10:14.140 |
So that's why you see I'm using my Gmail instead of my Microsoft. Uh, so I'll confirm my account and then I can enter the promo code. And that was from this screen. 00:10:26.140 |
So I still have this screen open. So I just go there. I paste it in and then we go S six X Y Y K. I think it's case insensitive submit. 00:10:41.140 |
And then it's going to actually fail for me because I've already set this up on, on this thing here. Um, and this, if you see this, it's because you've already actually gone through this stage. 00:10:51.140 |
Uh, so for you, it should work the first time and then, uh, it'll create the Azure account for you. And if it works, then what we can do is go to portal.azure.com. 00:11:02.140 |
So portal.azure.com and we'll see what, how it loads in. Does a bunch of redirects. And then we can click on subscriptions. And what we should see is there should be at least one subscription that says as your past dash sponsorship. 00:11:23.140 |
So that's our key that we have done this correctly. And as long as we use this subscription, when we're doing our deploys, we will not get charged any money. 00:11:32.140 |
Well, Microsoft will, but you won't. That's the important part. Okay. So we're going to spend 10 minutes to make sure that we can get everyone through this, this stage so that we're all on the same page going forward. 00:11:42.140 |
So if you already got it, that's awesome. You can, um, you know, like look at Harold's, like, uh, basic profile or something. 00:12:01.140 |
So once you have that set up, the next step is the proxy. Um, so I'll just show that, uh, so that you can start playing with that. 00:12:11.140 |
Uh, so here's, it's the next link in here. So the reason we have a proxy is because normally when you're using Azure open AI, you actually have to fill out a form and say how you're going to use Azure open AI. 00:12:24.140 |
And then somebody says, Oh, okay. Yeah. That's a good use of open AI because Microsoft doesn't want people to use AI willy nilly. 00:12:31.140 |
So we, you know, check to make sure that something adheres to our responsible AI principles. 00:12:36.140 |
Uh, we don't have enough time for you to go through that process while we're in a workshop. 00:12:40.140 |
So we've set up a, an Azure open AI proxy that you can use during the workshop with the repos. 00:12:46.140 |
And we've have special instructions for how you can use this proxy with the repos, since you can't use the actual Azure open AI. 00:12:53.140 |
Uh, so this, you can follow the link from the doc and log in with your GitHub account. 00:13:08.140 |
Okay. And so as I'm logged in and then we have an API key and a proxy endpoint, and that's all we need to be able to, uh, to use it as your open AI instance. 00:13:23.140 |
Now, normally I don't like to use keys and I tell everybody to avoid them, but, uh, in this situation, we are going to be using keys and, uh, yeah. 00:13:32.140 |
And these keys will expire at a certain point. 00:13:34.140 |
So we don't have to worry about them being exposed. 00:13:37.140 |
Uh, typically with keys, we'd have to protect them very fiercely so that nobody was using them. 00:13:42.140 |
So you can go ahead and log into this and see your registration details. 00:13:46.140 |
And then you can even play around with the playground. 00:13:49.140 |
This is really similar to the Azure open AI playground or the open AI.com playground. 00:13:54.140 |
If any of you played around with this, uh, you can see here, you can play with the system message. 00:13:59.140 |
That's how you like say like, Oh, you're an AI assistant that constantly makes pirate jokes. 00:14:29.140 |
Uh, so we're going to enter the key, not save it, select a model. 00:14:52.140 |
And then, uh, please, uh, tell audience about open AI. 00:15:05.140 |
And you can see different parameters that we send. 00:15:07.140 |
And these are all getting sent to the open AI SDK. 00:15:09.140 |
So we say the model right here, we've set up two models. 00:15:14.140 |
Those are often the ones you're picking between with open AI. 00:15:19.140 |
If you're doing something with vision, something multimodal, I wouldn't use it. 00:15:23.140 |
Otherwise just based off of some experience we've had with it. 00:15:32.140 |
You can see, you know, with the combination of the system message and the user message. 00:15:41.140 |
We get back a response like this where it describes open AI with lots of ours and mateys and stuff. 00:15:49.140 |
Uh, we can, you know, change different parameters here. 00:15:57.140 |
Um, top P is also roughly about creativity and there's some more advanced stuff there. 00:16:04.140 |
And you can see how many tokens you used on the way out and how many tokens you got on the response. 00:16:09.140 |
So you can play around with this playground to, uh, you know, to try stuff out and make sure that, uh, that you're able to, to use the key. 00:16:20.140 |
So this is just linked off of, um, off of this workshop, right? 00:16:24.140 |
So if you go to the workshop proxy, you log in, you'll get your key in your end point. 00:16:29.140 |
You can go to that playground and you can play around with the playground to check that that's working. 00:16:34.140 |
But we just want to make sure everybody now has an Azure pass and is logged in to the proxy so that you have a key in an endpoint. 00:16:42.140 |
So we'll just check to see if anyone had anything to do that. 00:16:45.140 |
Okay. All right. So here's the, like he, these are, if you're looking for the models, this is generally the page to check. 00:16:53.140 |
Um, so, you know, GP four O, GP four and going down. 00:17:03.140 |
You're saying there's a GP three, five that supports vision. 00:17:12.140 |
Yeah. So we were using that one, but it's a lot slower. 00:17:16.140 |
So that's why we, I've started using four O or this one. 00:17:30.140 |
So we'll just be using the basic GP three, five and GP, uh, just GP three, five today, actually. 00:17:41.140 |
So now we're going to actually get something working. 00:17:45.140 |
So we have this repo here, so you can follow the link from the doc. 00:17:50.140 |
And it has read me's for the three different projects that we can deploy. 00:17:55.140 |
And these read me's are specific to using them with the Azure open AI proxy. 00:18:00.140 |
Uh, so normally you can just use the, the read me's that are on the repos itself. 00:18:04.140 |
But because we are using this Azure open AI proxy, we do have to use a slightly different setup. 00:18:09.140 |
So we've made read me specific, uh, for this, for this workshop. 00:18:13.140 |
Uh, so we can start off on this, uh, open AI chat app quick start and make sure that that's all working. 00:18:22.140 |
So the first step is to open in GitHub code spaces. 00:18:25.140 |
So you can do that by clicking this button here. 00:18:32.140 |
So code spaces will open a VS code in your browser with a developer environment for that repo. 00:18:41.140 |
So you can actually use code spaces on any GitHub repo. 00:18:44.140 |
So you go to any GitHub repo, you click on code and you can make a code space for it. 00:18:48.140 |
So it's a way that you can start hacking on any repo, uh, very quickly. 00:18:53.140 |
So you can open this button here to open in code spaces. 00:18:57.140 |
And, uh, I'll just go ahead and make a new one. 00:19:04.140 |
So this is going to take a few minutes to load. 00:19:14.140 |
Cause what it's doing is that it's creating the environment for this repository. 00:19:23.140 |
So if you actually have like, if you use VS code locally and you've got like extensions that you use locally, 00:19:28.140 |
it's actually potentially sinking those extensions and, uh, enabling them, them here. 00:19:37.140 |
Um, but yeah, you can see in the bottom here as it's setting up and we'll just wait for it. 00:19:45.140 |
So this is, you know, the slowest part of using code spaces is just the loading. 00:19:53.140 |
If you want faster code spaces, there's pre-builds available as well. 00:19:58.140 |
And I do have them on the third repo, but I think I don't have it on this one. 00:20:02.140 |
So I, I should have remembered to do pre-builds for all the repos. 00:20:06.140 |
And the slowest part is probably installing all the dependencies and the builds. 00:20:09.140 |
It's basically it's doing all the things you would do when you install it locally, 00:20:19.140 |
Let's see what the, you can even watch, can we watch the logs for this one? 00:20:30.140 |
So you, if you like this sort of thing, like if you like watching Docker containers build, 00:20:37.140 |
So you can actually watch it as it, um, builds everything here. 00:20:43.140 |
And now it's downloading all the requirements. 00:20:48.140 |
So all the examples that we're going through today have a Python backend and then some sort 00:20:54.140 |
Uh, this one has what we call like a vanilla JavaScript front end. 00:20:57.140 |
As in, I just wrote some JavaScript in a script tag. 00:21:00.140 |
Uh, but then the other ones are much fancier. 00:21:02.140 |
So they've got a full type script and a build system and react components, uh, using the 00:21:07.140 |
Microsoft fluent UI, uh, you know, web framework. 00:21:10.140 |
So you can kind of see the range of front ends there. 00:21:15.140 |
So you can see it's, you know, it's still going through the process, but at least now, uh, we 00:21:19.140 |
can see the file Explorer has loaded so we can, uh, explore the files here. 00:21:24.140 |
And, uh, and I'll show, I'll go ahead and show the, the code. 00:21:29.140 |
If you're interested in the code, uh, it is in the source folder. 00:21:33.140 |
Uh, we are using a court application and I think nobody has heard of court, but, uh, has 00:21:47.140 |
And one day it might be brought back into flask. 00:21:50.140 |
And it just, you just take your flask code and you put a sinks in it. 00:21:58.140 |
So, uh, if you haven't done async before in Python, async is a way that I've used async with 00:22:04.140 |
your functions, they become co-routines and then they can be paused and waited on. 00:22:07.140 |
And it's important to use async when we're building applications with AI, because we have these 00:22:14.140 |
really long blocking calls to an AI API, right? 00:22:17.140 |
So we make a call to an LLM and we send off our requests. 00:22:19.140 |
And these LLMs, they can take like two seconds, five seconds, 10 seconds, right? 00:22:24.140 |
And while that's happening, we ideally want to be able to handle other user requests coming in. 00:22:33.140 |
So if we use an async framework, then while we're making IO calls, we can handle other user requests that are coming in. 00:22:41.140 |
So all of the ones that we see today have an async backend, either court or fast API. 00:22:49.140 |
So fast API is the one most people know of as the async framework. 00:22:52.140 |
Um, so I, you know, I, I like both of them fairly equally. 00:23:00.140 |
Um, but I just want to make sure people know about the value of async frameworks. 00:23:07.140 |
Uh, if you want to look at the code there, so it is now finished. 00:23:22.140 |
And if for some reason your terminal like goes away, sometimes this happens to the code space. 00:23:30.140 |
So I just click the plus and that'll give me a new terminal, right? 00:23:40.140 |
Um, but actually the first thing we're going to do is that there's a dot EMV dot sample. 00:23:46.140 |
We're going to make a dot EMV file based off of that. 00:23:49.140 |
So I'm going to make a new file and I can do that using this little new file button up here. 00:23:55.140 |
So I'll just click that say new file and I'll type dot EMV. 00:24:03.140 |
Um, and then I'm just going to paste the dot EMV in there. 00:24:06.140 |
You can even rename dot EMV dot sample to dot EMV. 00:24:11.140 |
Um, and then we need to fill in these values to match the values of the proxy. 00:24:15.140 |
So we'll go to the proxy and let's see, where's my proxy open here. 00:24:23.140 |
So I'm going to go ahead and fill in this one. 00:24:27.140 |
So the end point should start with HTTP and end with slash V1 and look like that in the middle. 00:24:36.140 |
That's where we'll be sending our opening requests. 00:24:40.140 |
So we'll copy that and it'll look like that or slightly different for you. 00:24:46.140 |
And then the deployment is going to be the name of the deployment is GPT-35 turbo. 00:24:53.140 |
Uh, and that's also the name of the model in this case. 00:25:03.140 |
So on open AI.com, you just pick what model you're going to use and that's all you need. 00:25:07.140 |
With Azure open AI, you have to make deployments based off of the model. 00:25:11.140 |
So you actually have a bunch of deployments and you could actually have multiple deployments of a GPT-35 turbo model that have different names. 00:25:17.140 |
So when you're working with Azure open AI, you have to know the deployment name, not just the model name. 00:25:22.140 |
So that's one of the complexities of, of using Azure open AI, but it does give you more flexibility. 00:25:27.140 |
Cause you can say, Oh, this deployment is going to have 20 tokens per minute. 00:25:30.140 |
And this was going to have 30 tokens per minute. 00:25:33.140 |
And then you can like say, which of your colleagues can use what? 00:25:35.140 |
Like if they're all trying to like use up your deployment or whatever. 00:25:37.140 |
Uh, so as it's more flexibility, but you do have to specify it. 00:25:44.140 |
So this is just so that I can run a, a local server and I'm putting local server in quotes. 00:25:50.140 |
Cause I'm going to run a local server inside GitHub code spaces. 00:25:55.140 |
So it's actually running a local server, not on my actual machine, but inside the GitHub code spaces development environment. 00:26:02.140 |
Uh, so to do that, I grab, I'll grab the command here. 00:26:06.140 |
That's going to run the court app and just give it. 00:26:11.140 |
And when with code spaces, you do have to allow, so you'll see this little thing that pops up. 00:26:16.140 |
So if you ever want to copy paste, you have to allow for the terminal. 00:26:22.140 |
And then I paste it and then you can see that it says it's running on this URL. 00:26:28.140 |
Now you can't just paste this URL in the browser. 00:26:32.140 |
So if I paste in the browser, I'm going to get an error because this is not running on my local machine. 00:26:40.140 |
One way is that if you just click on it, uh, like, uh, option click, at least on my Mac. 00:26:44.140 |
So I mouse over, it'll tell me what to do mouse over option. 00:26:49.140 |
So code spaces will actually detect that you're clicking on a local URL and it'll turn it into a code space port URL. 00:26:57.140 |
And it's this funky URL up here, improved disco for me. 00:27:01.140 |
Um, and, uh, and that's actually, you know, like local for that GitHub machine. 00:27:08.140 |
Another way that you might like more is you go to your ports tab and you're going to find it listed here. 00:27:14.140 |
And, uh, we'll see the, you know, the forwarded address and we can click on that, or we can even click the glue globe icon and we get to the same URL. 00:27:25.140 |
So there's many ways you can get to this locally running URL, uh, and, uh, and get to the special code space URL bit. 00:27:32.140 |
And you can even change your port visibility. 00:27:34.140 |
If you want to like share it with a colleague, if you were just, or in a class, you can change it to public. 00:27:39.140 |
And then you could actually ping this URL to someone else. 00:27:43.140 |
Like you're not going to use this for like, you know, your deployed URL, but it's fun. 00:27:48.140 |
So now I've got this running locally and now we can type stuff and be like, what's the weather in San Francisco? 00:28:13.140 |
It's always good when it refuses to answer something it shouldn't know. 00:28:16.140 |
Um, so we could go ahead and like, you know, I could change this now and change the system message. 00:28:23.140 |
And let's see, where's our system message in here. 00:28:30.140 |
So right now my system message is just, you are a helpful assistant. 00:28:32.140 |
I'll be like, you are a assistant that cannot resist a good pasta joke. 00:29:00.140 |
It looks like you might have been quite saucy today. 00:29:05.140 |
You might end up feeling like a soggy middle. 00:29:18.140 |
Like when we're developing, we can just test things, test things locally here. 00:29:22.140 |
The next thing we're going to do, once we're happy with it, we're like, this is the best app. 00:29:27.140 |
Uh, so then we move on to the deployment instructions. 00:29:35.140 |
So this is going to log in to our Azure account that we made earlier. 00:29:44.140 |
And, uh, this is going to give us a device code that we're going to paste into this OAuth browser flow. 00:29:56.140 |
I think that's my Azure account that I'm using for this. 00:29:58.140 |
And then I go and I take this and I paste it in and I'm going to pick my account. 00:30:08.140 |
I'm going to use this one continue and, uh, okay. 00:30:21.140 |
So you just want to make sure that you log into the account that you got with the past, right? 00:30:26.140 |
Whatever account you use for the past, that's what you want to log into. 00:30:30.140 |
The next step is to create, or Gabriella, should I pause? 00:30:36.140 |
Like, should we get through the local step first? 00:30:47.140 |
We can pause and see if everyone's got the local one running, actually. 00:30:53.140 |
So let's just pause and see if there's any questions with getting the local one running. 00:30:57.140 |
So yeah, someone asked, like, can we just run this locally? 00:31:01.140 |
We like to use GitHub code spaces in workshops because that reduces the number of potential development environment issues. 00:31:06.140 |
If you want to run it locally, you can either run it, you know, just with a Python virtual environment and you just have to install all the requirements. 00:31:16.140 |
Or you can run it with VS code using the dev containers extension and that will do the Dockerize environment for you. 00:31:22.140 |
If you want kind of the benefit of the Dockerize environment without, you know, being in the browser and having to pay, potentially pay for code spaces. 00:31:30.140 |
So we should know also that GitHub code spaces, you have a limit of some number of hours a month, either 60 or 120. 00:31:41.140 |
So you're not going to go over that today, but eventually you could go over that if you use code spaces a lot. 00:31:47.140 |
So if you're local, right, I think I have mine open locally as well. 00:31:51.140 |
And I'm just, yeah, locally, I'm just using a Python virtual M. 00:31:55.140 |
So you're also welcome to try these things out locally. 00:31:57.140 |
If you like local environments, just, you know, be a good person and make a Python virtual M to manage your Python dependencies. 00:32:16.140 |
So you saw me do, you saw me do the login here. 00:32:22.140 |
So you should see something like this happen from inside code spaces. 00:32:26.140 |
And the next step is to make a new AZD environment. 00:32:31.140 |
So AZD is this tool we're using for deployment. 00:32:36.140 |
You can just call it like chat app, whatever you want to call it. 00:32:38.140 |
And then what that does is it actually makes this dot as your folder and it makes this chat app folder inside. 00:32:46.140 |
And that's where it's going to store all of our deployment environment variables. 00:32:50.140 |
So we need to set all and configure anything we want to customize about our deployment. 00:32:55.140 |
We're going to configure that now and it's going to get update. 00:33:01.140 |
So the next thing we're going to do is set all these AZD environment variables. 00:33:05.140 |
So the AZD environment variables are different from the ones we just saw in the .env. 00:33:11.140 |
AZD environment variables are for deployment. 00:33:13.140 |
Sometimes we use the same, but a lot of times we want our local environment to be slightly different from our deployed environment. 00:33:20.140 |
So we have two different ways of setting those variables. 00:33:26.140 |
So this is just going to tell it to not create an Azure OpenAI because we're using the proxy. 00:33:31.140 |
And then we're going to set the name of the deployment to GPT-35 Turbo. 00:33:39.140 |
So I'm going to paste this and then I'm going to delete, delete, delete. 00:33:42.140 |
Gosh, that's what happens when you have Wi-Fi issues actually, is you see it with the typing. 00:33:58.140 |
I'm going to delete how we're going to do this. 00:34:19.140 |
Now, if I've done it correctly, if I look at my dot Azure folder for that environment I created, I should see a dot EMV that looks like this. 00:34:28.140 |
So this is a dot EMV that's inside the dot Azure folder. 00:34:31.140 |
So this is what is going to be used for the deployment. 00:34:34.140 |
And it's going to tell it, you know, this is how it's going to set up the Azure open AI connection. 00:34:41.140 |
And now I'm just going to do, I'm just going to type type. 00:34:52.140 |
So what AZD up is doing is that it's actually deciding, it's doing several stages. 00:35:04.140 |
Uh, if you had two subscriptions, you would want to pick the sponsorship one. 00:35:13.140 |
Typically you just choose one that's close to you. 00:35:21.140 |
The first step is that it's actually packaging up the code that it's going to deploy later. 00:35:26.140 |
Uh, in this case, we're deploying to Azure container apps. 00:35:29.140 |
So it's packaging up a Docker container file. 00:35:31.140 |
So it's actually literally building a Docker container right now. 00:35:34.140 |
So if you do like working with Docker, Azure container apps is a great fit. 00:35:41.140 |
So we deploy a lot of stuff there, but we also are going to be using Azure app service for one of the later templates. 00:35:46.140 |
Uh, so we've got lots of ways to deploy on Azure. 00:35:51.140 |
The step after this is where it's actually going to create Azure resources. 00:35:55.140 |
So it's going to create the container apps can create a container registry, 00:35:59.140 |
create a container apps environment and create a log analytics workspace. 00:36:03.140 |
So these are all the components of a containerized app on Azure. 00:36:07.140 |
And, uh, you know, it's multiple components and we have to stitch them together. 00:36:11.140 |
The way we stitch them together is using infrastructure as code. 00:36:18.140 |
Okay. So we have our own version of Terraform. 00:36:20.140 |
It's called bicep and it is, uh, infrastructure as code, 00:36:24.140 |
which means we're declaring what resources we want to make. 00:36:29.140 |
Right. So we say, oh, we want to make log analytics. 00:36:34.140 |
We want to make, you know, the actual container apps image, and then we're going to assign some roles. 00:36:39.140 |
Right. So all of that is declared in this bicep file. 00:36:43.140 |
So that way you have repeatable repeatable processes for provisioning. 00:36:48.140 |
And this is really helpful when you're making complex applications on Azure. 00:36:51.140 |
Cause you might have like 10 different things you're using, right? 00:36:54.140 |
Uh, you might have a Postgres and a key vault and a Redis cache and, uh, log analytics and app service. 00:37:03.140 |
So you can declare what that, you know, what that infrastructure looks like. 00:37:07.140 |
And then, uh, and then put that in a bicep file and then deploy it. 00:37:12.140 |
So if you were really into Terraform and very comfortable with it, you could totally use Terraform tier as well. 00:37:21.140 |
But if you want to send a PR with Terraform, I'll, I'll review it and just stamp it. 00:37:29.140 |
So what you can see here is that it is actually creating, uh, the resources right now. 00:37:39.140 |
So this is the point where I usually fold my laundry. 00:37:42.140 |
Um, uh, because it can take some amount of time, uh, or you can even get an error. 00:37:50.140 |
So I already made one in central us for the earlier demo. 00:37:56.140 |
So for this Azure pass, there is a constraint of one container app per region, which is why we said in the readme that you should pick a region that you haven't picked before. 00:38:09.140 |
So, uh, what I can do is I'm just going to make a, I'll just make a new environment. 00:38:17.140 |
Um, now what you shouldn't run into this cause this would be your first, uh, your first environment. 00:38:24.140 |
Uh, so chat app two, and I'll just copy and paste. 00:38:48.140 |
So then it'll do the up again, but I have one of these already, already deployed. 00:38:57.140 |
Deployed is going to look pretty darn similar to what it looks like locally. 00:39:05.140 |
It looks pretty much the same as what it looks like running locally. 00:39:11.140 |
And you'll see this URL displayed in the terminal. 00:39:14.140 |
Once it finishes successfully deploying, you'll see this displayed. 00:39:18.140 |
Let me see if I have that in my history anywhere from earlier today. 00:39:44.140 |
Uh, some of them, all the ones in core are actually from a shared repo that we just copy 00:39:49.140 |
We're trying to move towards something called AVM as your verified modules, which are bicep 00:39:53.140 |
files that are maintained and have security best practices in them. 00:40:00.140 |
But basically with bicep files, like you can use ones from a central registry. 00:40:03.140 |
You can use ones from your own private registry if you're doing a lot of them. 00:40:07.140 |
Uh, or you can just use, you know, ones inside the folder. 00:40:10.140 |
Um, so there's a lot of techniques you can use depending on how much bicep you're using. 00:40:18.140 |
So now it's starting over and deploying again. 00:40:21.140 |
So let's walk around and, or any questions on what I showed here? 00:40:31.140 |
I saw some issues with like naming, which I run into all the time. 00:40:37.140 |
The safest thing is do short names with no symbols in them and nothing fancy. 00:40:43.140 |
Uh, if you do have a naming rule, you can just always do AZM new and make a new environment 00:40:52.140 |
Uh, but generally the issues you run into with deployment are usually related to naming, 00:41:10.140 |
And, uh, and these are ones that you can, uh, that you can also start trying to deploy now. 00:41:18.140 |
So the first one, um, actually the, the, these two are both about rag. 00:41:28.140 |
Like, uh, tell me what Pamela Fox, uh, likes to code on. 00:41:49.140 |
So then, but then if I go to, um, this one right here, tell me what Pamela Fox likes to code on. 00:42:03.140 |
Um, and this is, so basically what we're trying to show is that if we just ask an LLM to answer 00:42:09.140 |
a question, it is, it's very possible that it's just going to make something up. 00:42:17.140 |
I mean, in this case, it says it doesn't know what I like to code in. 00:42:20.140 |
I think I should have said like code in, um, you know, like here, like what Python frameworks 00:42:26.140 |
Um, so, you know, if it doesn't know the answer, it'll say, uh, in this case, yeah, in this case, 00:42:30.140 |
it doesn't know the answer because this is actually using the rag technique in order to answer questions 00:42:38.140 |
Um, so those are our last two samples are about rag. 00:42:44.140 |
Uh, so the general approach of rag is that we get a user question. 00:42:48.140 |
We use that user question to search some sort of database or search engine. 00:42:53.140 |
We get back matching search results for that user question. 00:42:56.140 |
And then we send those, uh, to the large language model and say, here's a user question. 00:43:03.140 |
Now answer the question according to the sources. 00:43:06.140 |
And so now we can make customized applications that can actually synthesize and answer questions 00:43:19.140 |
If you've got an existing database and you want to be able to ask questions about that database 00:43:25.140 |
and have the LLM answer accurately based on that. 00:43:28.140 |
So for the example, uh, you know, database that I'm using, I have product, right? 00:43:37.140 |
So, uh, you know, our table is storing all the products for this website. 00:43:41.140 |
So I can say, okay, what is the best shoe for hiking? 00:43:46.140 |
So then it's going to go and search the database rows and get back matching rows and then come back and say, 00:43:54.140 |
okay, this blah, blah, blah, blah, blah, blah, blah, blah. 00:43:58.140 |
So one of the key points of rag is to have citations so that users can verify where the information come from 00:44:04.140 |
and see that it's actually legit information. 00:44:07.140 |
And we can also look at the, uh, the process for this rag flow here. 00:44:13.140 |
When we look on the, the thought process here. 00:44:16.140 |
And as, as we'll let you see is that this rag flow is a multi-step process. 00:44:22.140 |
So the first process is actually what we call like the query rewriting phrase or the query cleanup phrase. 00:44:30.140 |
And we ask the LLM like, Hey, here's a user question. 00:44:35.140 |
Cause a lot, a user question may not be that well formula, right? 00:44:38.140 |
Like, uh, please tell me about the best shoes for hiking now. 00:44:45.140 |
Okay. So, you know, there's like a user query and, uh, you know, that's probably not the optimal search query for, uh, for a search. 00:44:54.140 |
So if we look now at the thought process, we can see that the LLM actually turned that whole long thing into best shoes for hiking. 00:45:07.140 |
Then we get back the resulting rows from the database. 00:45:10.140 |
And then this is our call to the model that says, Hey, you need to answer questions. 00:45:26.140 |
And then, you know, we're able to use it with different sorts of, uh, data, data sources. 00:45:35.140 |
So you can get that set up following really similar steps to the, to the other one. 00:45:41.140 |
And you can even run that one locally first as well on just on a local Postgres database. 00:45:49.140 |
Uh, this one is a little more fancy cause you've got a react front in there. 00:45:55.140 |
You're going to set similar variables and, uh, run it up. 00:45:58.140 |
So if you're interested in that, you can start, uh, going through those steps and then you can customize it. 00:46:04.140 |
The other kind of rag that we have is rag on documents. 00:46:10.140 |
So if you're trying to ask questions about unstructured documents, like you've got a bunch of PDFs or Word docs, Excel files, 00:46:16.140 |
uh, anything like that, you can actually put those into a search index and then search that. 00:46:23.140 |
So the example we have for that is rag with Azure AI search. 00:46:26.140 |
And, uh, it's a really, really full featured example. 00:46:30.140 |
We've had it for the last like more than a year now, and we've had thousands of developers deploy with it and put it into production. 00:46:36.140 |
And so it's been used for a ton of use cases and it's got a lot of features, uh, speech, voice, vision, user access control, lots of, lots of cool things in it. 00:46:45.140 |
Uh, so let me show, that was the one I was actually showing earlier with my blog, right? 00:46:50.140 |
So here's, you know, I made a version of it that's just based off my blog posts and, uh, you know, it can cite my blog posts. 00:46:58.140 |
I've also got this one here, which is for an internal company handbook, which is a very popular way of using it as well. 00:47:05.140 |
And so you can see for each of them, we can, you know, click on the citations and, uh, and yeah. 00:47:11.140 |
So now this is a bit more complicated because here we have a multi-page document. 00:47:17.140 |
We can't just send an entire 31 page PDF to the LLM. 00:47:21.140 |
Cause for a lot of our LLMs, it's going to go beyond the context window, right? 00:47:26.140 |
A lot of our LLMs have a context window limit. 00:47:28.140 |
So typically that's around 8K, 8,000 tokens, uh, can go up to 32K, even 128K we're seeing. 00:47:36.140 |
Um, but typically they do have some sort of context window. 00:47:39.140 |
And even if they don't have some sort of context window, LLMs can get lost. 00:47:43.140 |
If you give them too information, too much information, there's a research paper called lost in the middle, where they did a study to see if they throw too much information. 00:47:50.140 |
information at LLM, like at what point it stops paying attention. 00:47:53.140 |
So we generally want to send the LLM the most relevant chunks. 00:47:57.140 |
So what we do is that we first have this data ingestion phase that will take a PDF or whatever document takes a document. 00:48:07.140 |
And we do that with Azure document intelligence, which is very good at extracting text from all sorts of documents. 00:48:14.140 |
We chunk up the text into like good sized chunks, usually around 500 tokens each. 00:48:19.140 |
Then we store each of those chunks in the search index along with their embeddings. 00:48:25.140 |
And that's what we actually search on and send. 00:48:29.140 |
So if we look at the search results here, we can actually see that the search results are just chunks from the PDF, where we say, here's the chunk. 00:48:45.140 |
So this is the most complicated of our architectures because we do have to have that data ingestion phase. 00:48:51.140 |
And that means we have to have, you know, a script or a process that does that ingestion stage. 00:48:56.140 |
And, you know, here we can do it locally or in the cloud. 00:49:04.140 |
So we have, you know, we have another 40 minutes. 00:49:07.140 |
So, and we have like a good ratio here of helpers to y'all. 00:49:12.140 |
So if either of those sound compelling to you, like sound like a use case that you're interested in, then you can try to deploy them now and see how they work. 00:49:24.140 |
So once again, you just go to the app templates workshop repo, and you can either pick rag on Postgres or rag with AI search, and then start going through the steps to try it out. 00:49:38.140 |
So it's good to start to deploy, you know, now, because they take, they got a lot more infrastructure to set up. 00:49:45.140 |
And then for the AI search, it's got to do the whole ingestion step. 00:49:48.140 |
And that ingestion step takes a certain amount of time as well. 00:49:55.140 |
For the ingestion step, are you using any libraries for the jumping and all that stuff? 00:50:02.140 |
So when this sample was first created, it was like last April. 00:50:05.140 |
It was before there was like really good established libraries. 00:50:14.140 |
Now, if you're going to use a library, the big thing I would make sure you're doing is using a token based chunker. 00:50:22.140 |
A lot of the splitters out there are doing character based splitting, which is probably fine if you're doing English only documents, but we do have lots of international customers. 00:50:31.140 |
And as soon as you start doing non English documents, then you really want to do stuff based off of tokens and not characters. 00:50:37.140 |
Because imagine you take like a Chinese document and you you'd say like, oh, my chunks are a thousand characters long. 00:50:44.140 |
You can like go over the context window really fast. 00:50:46.140 |
So we have token based chunking that we've implemented here. 00:50:50.140 |
There are there is token based chunking available in link chain. 00:50:53.140 |
So if you're going to use link chain, the thing to do is find my colleague's blog post where he talked about it. 00:51:02.140 |
OK, yeah, where can we see, especially if you're doing anything non English. 00:51:06.140 |
He basically analyzed all the splitters from link chain to figure out which of them properly worked with token based splitting and with CJK languages in particular. 00:51:23.140 |
He actually told my manager Anthony, he worked on it. 00:51:26.140 |
But link chain and llama index both do a lot of this stuff. 00:51:32.140 |
They just, you know, they take care of behind the scenes. 00:51:35.140 |
So what you need is you need the splitting and you can get that from basically from link chain because llama index uses link chain. 00:51:42.140 |
So I would just say use use link chain probably with this one so you can specify the chunk size. 00:51:53.140 |
You just use the OpenAI SDK and we do the batch embeddings with that so that we can do a bunch at a time. 00:52:01.140 |
So the hard part is really the extracting the text. 00:52:05.140 |
So there we either use Azure document intelligence in the cloud or we do have some local parsers, too. 00:52:12.140 |
If somebody doesn't want to use document intelligence, we use like PI PDF. 00:52:15.140 |
We use our own CSV parser because that's straightforward HTML for my blog. 00:52:22.140 |
I just use beautiful soup, which is the Python package that does HTML parsing, right? 00:52:26.140 |
Because I thought I could do a better job at it. 00:52:28.140 |
So this one I just use beautiful soup to extract the text. 00:52:35.140 |
So, yeah, there is actually a surprising amount of things that we've written ourselves for the AI search repo. 00:52:41.140 |
If we were going to do it today, we'd probably use the link chain splitter at least. 00:52:56.140 |
So generally with BICEP, what it does is that it tries to figure out, and BICEP is really compiled down to ARM, and ARM is just JSON. 00:53:24.140 |
So what you're actually doing is what's called an ARM-based deployment. 00:53:28.140 |
So with ARM-based deployments, what they try to do is figure out what does your resource currently look like? 00:53:33.140 |
What are you saying you want it to look like? 00:53:35.140 |
And what changes does it need to make happen? 00:53:38.140 |
So, yeah, we'll probably switch over to ABM in a lot of our samples. 00:53:44.140 |
And we're probably just going to make sure, like, we're trying to make it not have a change. 00:53:48.140 |
But if you want it to change, then that's fine. 00:53:50.140 |
So you should totally be able to switch between ABM, not ABM, as you decide, as you see fit. 00:53:57.140 |
And the important thing is just -- it'll figure out the difference and just make sure you are on board with any changes that come up. 00:54:06.140 |
There is, like -- so if you're doing -- there's this AZ deployment command that does what if, and that tells you, like, actually tells you what resources will change. 00:54:16.140 |
I want to figure out how we can do that with AZD. 00:54:21.140 |
So that might be what we try when we consider switching to ABM. 00:54:24.140 |
Because we want to switch to ABM so that we don't have to maintain our own modules. 00:54:29.140 |
But we just want to make sure that we are aware of any configuration changes that could happen. 00:54:43.140 |
Yeah, AZD is a command line tool that, you know, does the Arm-based deployment and also does code deployment, code upload, right? 00:54:56.140 |
So azure.yaml says this is the code that you're going to deploy to this host. 00:55:04.140 |
It does provisioning, which is basically doing an Arm-based deployment, which is equivalent to if you're doing AZ -- 00:55:11.140 |
AZ Deployment, if you know the Azure CLI, it's this AZ Deployment command. 00:55:19.140 |
And then it's also doing packaging and code deployment. 00:55:23.140 |
So if you've ever done like, I don't know, if you've ever done like Web App Up, that's where you deploy code up to App Service. 00:55:31.140 |
So AZD is trying to do the whole workflow of you need to provision your resources and you need to deploy your code. 00:55:37.140 |
And we're trying to make this central way of doing it across all of our offerings. 00:55:43.140 |
Because right now with Azure -- you know Azure -- but we've got like a billion different ways of doing things across all the different things. 00:55:47.140 |
And AZD is trying to make a more common way of doing it. 00:55:51.140 |
So if you look at my GitHub repo, I'm kind of a huge AZD fangirl. 00:55:57.140 |
So you can see all of these repos are all AZD-ified, almost all. 00:56:03.140 |
Because to me it's the best way to deploy because it's repeatable. 00:56:07.140 |
So if you are looking for examples, I have quite a few here. 00:56:13.140 |
But yeah, so we're -- you know, we should be able to do it on different hosts, container apps, functions, app service, Kubernetes, et cetera. 00:56:21.140 |
And, you know, with all this -- all the different possible bicep. 00:56:26.140 |
So what happens after you go to production, like observability and all that? 00:56:38.140 |
So we do have a -- like generally there's lots of docs under Azure Search OpenAI demo. 00:56:43.140 |
So we do actually have a productionizing guide. 00:56:46.140 |
You also asked specifically about observability. 00:56:49.140 |
We do integrate with Application Insights with OpenTelemetry. 00:56:57.140 |
I don't know if you've seen it, but it's an observability platform. 00:57:01.140 |
But by default we're using Azure Application Insights with the OpenTelemetry packages to bring everything in there. 00:57:07.140 |
But we do have a whole productionizing guide that talks about, you know, how are you going to scale things? 00:57:12.140 |
You know, if you need a load balance, your OpenAI capacity. 00:57:22.140 |
So I've run quite a few load tests for this one. 00:57:30.140 |
It's basically like the new form of testing for this world. 00:57:39.140 |
But basically like you want to be running evaluations to see if you are getting quality results from your LLM. 00:57:48.140 |
Because a lot of times you might run -- here's the thing. 00:57:50.140 |
You know, I show those sample questions all the time and they perform great. 00:57:57.140 |
Like you might make a prompt tweak and be like, oh, this prompt tweak was so good. 00:58:02.140 |
You have to run an evaluation across a huge number of samples to make sure that it's actually running. 00:58:09.140 |
Like I run it across like 200 samples is probably like the minimum of what you should do. 00:58:13.140 |
But you have to run evaluations in order to see -- do you -- I'm assuming you run evaluations for co-pilot chat, right? 00:58:24.140 |
You want to come up and like talk about evaluation? 00:58:25.140 |
Because you're like -- you're -- I mean, Harold is like running an actual -- because basically you're making co-pilot chat. 00:58:52.140 |
So if you do add workspace, if you ever try that in co-pilot chat, we actually run a local sparse index. 00:59:01.140 |
So that's basically your classic how Google works, just looking up words on the internet and documents. 00:59:10.140 |
We also do a semantic index against index that GitHub.com maintains and ranking those in using another 00:59:21.140 |
So rack basically becomes a series of indexes. 00:59:26.140 |
You created some keywords up front based on the search. 00:59:32.140 |
Anytime we have changes, we have one test set that can run on each PR and then a larger test 00:59:39.140 |
that we run daily that has a lot more repositories from across different languages. 00:59:50.140 |
So we have a subset that's more unit test driven. 00:59:52.140 |
Where it's like, can it answer questions for this? 00:59:54.140 |
Does it hit any issues we've seen in the past? 00:59:57.140 |
So it's more unit test style where it's like, does it behave as it did before? 01:00:05.140 |
I mean, that's the first, the biggest thing we invested on early on because we found it's 01:00:09.140 |
so easy to get lost in prompt crafting and assume how reg works and assume how it works in the wild. 01:00:22.140 |
I don't know if you can show your vowels, but here, like I can show vowels for, um, on the Azure AI search. 01:00:47.140 |
So these are a bunch of valuations that, um, uh, I'm going to show you how this is. 01:01:02.140 |
So these are a bunch of valuations that, uh, run fairly recently. 01:01:07.140 |
So, um, with these evaluations, I do GPT metrics and then I also do basically like regular expression 01:01:14.140 |
How are your metrics usually GPT metrics or code tests? 01:01:23.140 |
So with these GPT metrics, what they're actually doing is, um, sending the original answer, 01:01:29.140 |
uh, sending, sending the ground truth answer, uh, which is generated synthetically. 01:01:34.140 |
And then also sending the new answer to a LLM and saying, Hey, rate this from one to five. 01:01:42.140 |
And this is, this is the actual prompt that gets sent is like, okay, you know, rate this, 01:01:51.140 |
And then I also check whether citations match across ground truth and not ground truth. 01:01:59.140 |
So my favorite is just this one, this, uh, citation match here. 01:02:03.140 |
So I'm just making sure that the answer contains the, at least the citations that were in the 01:02:12.140 |
A lot of times I'm looking at retrieval parameters. 01:02:14.140 |
Cause for rag, the retrieval is, makes a big difference. 01:02:17.140 |
So here I was comparing stuff like, what if I use text only? 01:02:26.140 |
I was trying with different retrieval amounts. 01:02:28.140 |
Like if I retrieve five versus 10 versus three, what do I get out? 01:02:33.140 |
I just say like, I've tried so many like tweaks on our prompt and I've never managed to actually 01:02:40.140 |
Uh, so we still haven't ever, ever changed the prompt because I, I haven't really proved that 01:02:45.140 |
anything is sufficiently better or I'm just a really bad prompt engineer. 01:02:50.140 |
I've none of my prompt engineering ever moves the needle. 01:02:53.140 |
For me, the only thing that moves the needle is retrieval parameters. 01:02:56.140 |
Like how you're working with your search engine or changing the model entirely. 01:03:01.140 |
Changing a GPT four has a big difference than GPT three, five. 01:03:05.140 |
Uh, so that should be part of your, uh, you're putting in production for sure is to make sure that 01:03:29.140 |
I try to like take a lot of things to production, but they were all like early projects. 01:03:35.140 |
I had a hard time evaluating which vector store to use and how much to challenge. 01:03:55.140 |
I've also run all those on our sample data too. 01:03:57.140 |
But I think what I've discovered is really helps to run the evaluations on stuff that, you know, because then like, cause this is the summary. Like you can kind of look at the summary and be like, okay, I guess like things better. 01:04:00.140 |
But then what I usually look at is like, I actually look at the changes between, um, between two runs and be like, okay, well, what was the difference between, uh, the baseline? 01:04:17.140 |
And then, uh, you know, the, maybe, what was it? Vector only vector, no ranker. Okay. And then I'll just look at things that changed on citation match. Okay. 01:04:30.140 |
Okay. So this is what I usually do is I look at the overall stuff and then I look and I compare the answers across my ground truth and the, and the new one with the parameters. And so then I can better reason about it. 01:04:43.140 |
But you really have to know your domain in order to be able to evaluate, evaluate your evaluations. Um, but it also helps if other people have run it for you. 01:04:54.140 |
So this is a really good blog post from the AI search team that I always reference where they ran massive queries, looking at hybrid search versus vector search versus tech search. 01:05:03.140 |
And, um, you know, and they found that hybrid retrieval was semantic ranking outperforms vector only search. 01:05:10.140 |
So I ran my own versions of that and, um, and recently, uh, blogged about it, but it's basically the stats that I was just showing where, uh, what I found actually for my use case vector on its own. 01:05:22.140 |
Did horribly like really, really, really, really badly. Uh, where is it? Um, so vector only got a groundedness of 2.79, which is really, really low text only got 4.87. 01:05:35.140 |
So part of that is because as your AI search is really good at full text search, like incredibly good at it. 01:05:40.140 |
It does all spell check, stemming, everything you could imagine. Um, hybrid, which is where you take vector and text and then you merge them using this algorithm called reciprocal rank fusion, which you can actually do this. 01:05:51.140 |
See that the algorithm is just this. It's just, uh, you're just doing a little math here to combine, uh, rank scores. 01:05:58.140 |
Um, so just a basic hybrid like that, the groundedness is only 3.26. 01:06:02.140 |
So you can see hybrid on its own is worse than text only. And that's because vector results can add so much noise. 01:06:09.140 |
You accidentally grab the wrong, like distracting things. Uh, what I found actually is like, if I ever accidentally vectorize like an empty string or something close to an empty string is similar to everything. 01:06:22.140 |
I don't know what this is about the open AI embedding space, but if you accidentally vectorize an empty string or even like, uh, we have vision as a feature in the Azure open ice search demo. 01:06:32.140 |
If I was helping a customer this, this week and they were finding that so many of the results were getting this blank blue page because apparently this blank blue page, the vector for it. 01:06:44.140 |
And this is a vector via a different model that as your computer vision model, the vector for it was just matching everything. 01:06:49.140 |
So you gotta like, you gotta be really careful with vector spaces. Um, it's so easy to accidentally add noise to them and for there to be distractions. 01:06:58.140 |
So hybrid on its own only, you know, got like 3.26. Once I use hybrid with semantic ranker, then I got the best results, but only by a couple percentage points. 01:07:08.140 |
Now hybrid of semantic ranker is semantic ranker. That's a feature of Azure AI search, which is actually another machine learning model. 01:07:14.140 |
It's a, it's called a cross encoder model, but basically they actually had humans rank results according to queries. They use it for Bing. 01:07:20.140 |
So they said, Hey humans, here's 10 search results for a query, rank these from one to 10 and tell us what's the best. 01:07:26.140 |
So they train a whole model based off a bunch of human data. And then they get back like this, this, uh, this model that they can then use for any arbitrary, uh, ranking of user query along with results. 01:07:37.140 |
So basically hybrid with being ranker gets you the best. But if I was going to have to, like, if I was on a desert island and like I, I could pick between vector and text, uh, I would use text at least for Azure AI search. 01:07:49.140 |
It's going to depend how good your full text search, right? If you're doing full text search with like SQL light, which I don't even know if it's the ports. 01:07:55.140 |
We'll take search. It's not going to do very well. Yeah. 01:07:58.140 |
Um, you, so you're using tipative for your co-pilot chat. You said, right. 01:08:08.140 |
For this one, for this, for the at workspace, right? Yeah. Um, for, for Azure AI search, they're using, uh, they're using several things, but they're using one of the things they use is Lucene, which is, um, 01:08:22.140 |
a search library. And it's got stuff like spell checking and tokenization and stuff like that. Um, so they're doing a lot, but I guess, and they're also using BM 25, which I think is basically tipative. 01:08:37.140 |
Right. Okay. Yeah. So BM 25, that's what you want to look for. Um, is, uh, we got a search result here. Yeah. So if it's, if something is using BM 25, I think that's basically the best full text. 01:08:51.140 |
I think that's the best full text right now. So, um, that's what you want to look for is you just want to look for a good full text option. 01:08:57.140 |
Yeah. Yeah. It's overwhelming. That's why I love when, you know, people put out research. So we'd be like, okay, great. Cause this also has the optimal, um, chunk size. 01:09:07.140 |
That's why I was saying we do 500 tokens because they did the, they did the, the work here and said, okay, the optimal is 512 tokens. Great. 01:09:15.140 |
That's what we're going to use. Now, obviously for your particular use case, it can be different, but we can't all run like 20 hundred different tests to see what the optimal, uh, you know, thing is. So it's really nice when people, you know, document what worked well for them. 01:09:37.140 |
Well, we only need to update the vectors if the data changes or if we're changing our embedding model. So if we change our embedding model, we have to update everything to use a new embedding model. Right. 01:09:59.140 |
Cause now opening. I has these new embedding models. I need to do some tests with them to see if I can get like better results for them. Um, so in that case I would, I would rerun everything. 01:10:11.140 |
So probably what I want to do is set up like an, uh, a separate AI search index, like for this one, which has uses one of the new embedding models, uh, embedding three. 01:10:21.140 |
And I have to decide how many dimensions to use, um, and then compare it to see how much better results are. I'm told that generally the results are better, but I'm trying, have you tried any of them? 01:10:33.140 |
Oh, you're switching to the new one. What dimension are you going to use? 01:10:43.140 |
Yeah. You can do five 12 too. Yeah. Yeah. So you can, that's the thing is it's so, it's so many options now. 01:10:52.140 |
Yeah. Oh, and you can run any V tests. You have customers. Yeah. Um, so yeah, but you're going to have to re-index everything. 01:11:02.140 |
Um, so that's, that's when you would, uh, have to update stuff. Is it the content changes or if the model changes? Um, 01:11:12.140 |
Um, and then, and then test that. Yeah, I do want to try out the new ones. They should redo this one too. 01:11:21.140 |
It's too many decisions. Cool. Any other questions? Harold, do you want to show stuff in, uh, workspace? 01:11:38.140 |
Let's just see a rag in action. Um, so if you ask a question in, in, co-pilot chat, so that's the co-pilot chat panel version. 01:11:49.140 |
There's also another one that's in line. So if you open up, this is a natural input, uh, we call in line chat in, in the, in your code. 01:11:59.140 |
Basically letting you apply code directly or natural language directly to your code, which is always nice. 01:12:04.140 |
You don't have to think about the response. You have to think about just, you know what you want and you want to AI to do it for you. 01:12:10.140 |
But in, in the side panel, most of the time, what you will run into is this, you're going to run things. Let's pick a function. 01:12:22.140 |
So compare the tests. Now I've code selected on the right and on the left. I can ask things about the code I have selected. 01:12:33.140 |
That's the surefire way to get good results, have code selected and talk about it. 01:12:38.140 |
And you already see that, that we do some magic in our responses. So everything is code highlighted. 01:12:44.140 |
So actually you jump to the different aspects, um, that, that are being used and even to dependencies. 01:12:52.140 |
So it found that there's a dependency, so you can also jump to that. 01:12:56.140 |
So now going back to here, let's, let's see which tests actually are defined in a repository. 01:13:02.140 |
And because they're, I want to talk basically about the whole workspace. 01:13:05.140 |
And that's why I can't just say which tests are defined or how are benchmarks being run. 01:13:16.140 |
So a general question that you would go otherwise to a colleague who hopefully knows this and hopefully they're in the same time zone and they know this. 01:13:23.140 |
But now I can actually send this to add workspace. 01:13:26.140 |
And that's where we kicking in this, this whole rag agents scheme. 01:13:30.140 |
So this repo is probably not indexed on the github.com site. 01:13:34.140 |
So if you're in code for enterprise, you will get a semantic index that github keeps updating for you. 01:13:39.140 |
They also have a few open source repos indexed. 01:13:42.140 |
But in this case, this is all happening now in VS code itself. 01:13:48.140 |
And actually we see that sparse indexing is usually on par similar to what you see with the text based retrieval that this works really well. 01:13:58.140 |
Yeah, so first we do same as you have an Azure search where it does find more words for what you're potentially looking for that are fitting with the repository. 01:14:18.140 |
So we also do stemming and that's one that's the first LLM call. 01:14:23.140 |
Then the T5DF will find all those results and then we do the re-ranking on top. 01:14:29.140 |
So and that actually gets us actually mostly better than doing a full vector search on the same topic. 01:14:52.140 |
So these are all the things it found and the chunks it found it in. 01:14:56.140 |
So what we do, what you'll see is we actually do semantic chunking. 01:15:01.140 |
So for most languages we look at function segments, we look at specific blocks of code and that's where we found the most impact as well. 01:15:10.140 |
So people brought up chunking as a big, big area of improvements and that's what we also have in our code that it's the chunking is the biggest impact finger from what you see. 01:15:24.140 |
Which helps that we have all the languages, the knowledge around the team like Python, right? 01:15:37.140 |
It works locally and it works slightly faster if you already have an online index where we can retrieve the semantic index from in action. 01:15:57.140 |
Oh, so dot prompty is a, it's a new prompt format. 01:16:13.140 |
So this was announced at, what's the top, build? 01:16:24.140 |
So, it's a way of, it's like an artifact for prompt. 01:16:29.140 |
Because right now, like, you might store your prompt as a multi-line string variable. 01:16:39.140 |
And it's like, we store them in all kinds of formats across the repo. 01:16:43.140 |
So it's actually a Jinja template plus the YAML at the top. 01:16:48.140 |
So the YAML describes the metadata of the prompts. 01:16:50.140 |
And then the Jinja template, you know, it's a template that you can pass things into. 01:16:54.140 |
So this is used by Prompt Flow, but it's also used by Azure AI Studio. 01:17:01.140 |
And the goal is, and I think maybe Linkchain might have support for it, now or soon. 01:17:06.140 |
But the goal is just to have a common way of representing prompts. 01:17:09.140 |
So we'll probably try to use this in more of our steps on the board. 01:17:22.140 |
Yeah, so I'm using, this is using the Prompt Flow evals package, which has a bunch more things 01:17:32.140 |
They actually, I wrote my own CLI as UI on top of this, but they have one too. 01:17:40.140 |
Do you run the evals in your CI pipeline somewhere? 01:17:42.140 |
If you look at Azure devs on this one, it does actually run them. 01:17:44.140 |
I'm just running them as a smoke test for this repo. 01:17:46.140 |
But you can see what I've done is that I have a target URL. 01:17:49.140 |
So that's generally what you'd want to do is you need to run the eval, and then you can 01:18:07.140 |
run an eval against your live, or like for you, you're doing a VR build. 01:18:12.140 |
So there you want to run it against your VR build. 01:18:15.140 |
So the tricky thing is just making sure you have a way of contacting your thing with everything, 01:18:20.140 |
all the production setup, all that has your stuff in it. 01:18:25.140 |
So yeah, I would ideally have it as a CI step for every one of our repos, and I'm just figuring 01:18:33.140 |
out the right way of setting up the target URL and all that stuff. 01:18:37.140 |
Especially because most people aren't making public-facing apps. 01:18:41.140 |
Most people are either putting it behind user-off or putting it in a v-net. 01:18:45.140 |
So we need evaluation flows that both can use your production resources because that's how you know it's working. 01:18:52.140 |
But then also work with however your app is deployed. 01:18:57.140 |
So I think you can certainly figure out how to set it up for your situation. 01:19:02.140 |
I'm still figuring out how to set it up in the general case. 01:19:07.140 |
But the thing to keep in mind is that evaluations are slow if you're doing GP metrics, right? 01:19:12.140 |
I mean, generally they're slow because all of these calls are slow. 01:19:15.140 |
You saw how much time it took to get back a response, right? 01:19:18.140 |
They're much slower than the traditional unit tests. 01:19:20.140 |
So you do not want to casually run an evaluation. 01:19:23.140 |
They're also expensive, especially if you're doing, well, first because the LLM calls happening behind the scenes. 01:19:28.140 |
And if you're using GP metrics, because I'm doing all these GP metrics like relevance and readiness, 01:19:34.140 |
So you want to have like a higher barrier to running than with normal unit tests, right? 01:19:45.140 |
How do you know that something hasn't changed? 01:19:52.140 |
This one is like a repo that works with other repos. 01:19:54.140 |
You don't know if the app has changed behind the scenes. 01:20:04.140 |
So we look at each test and we only rerun them when any of the prompts, when the input basically change. 01:20:09.140 |
So if you imagine like an OpenAI proxy that you could set up, if it's the same similar to what they do, 01:20:15.140 |
I think OpenAI has like the seed variable, which is basically caching, but they don't tell you. 01:20:21.140 |
And it's basically if nothing changes in a prompt, it just sends back the old response. 01:20:26.140 |
Oh, so you implement caching in compiler chat, you mean? 01:20:32.140 |
Not in compiler, in our testing infrastructure. 01:20:35.140 |
Some people also implement caching in the RAG application itself. 01:20:38.140 |
I still don't know how often you're going to get the same question. 01:20:50.140 |
So can this also, like is this just for OpenAI or can this work with Mestrel and all the 01:21:02.140 |
I mean mine just, this one I just hit up the URL and get back the answer. 01:21:14.140 |
So with the starter templates, right now they're all configured with OpenAI. 01:21:18.140 |
And so you can swap out like different OpenAI models so you can do before. 01:21:22.140 |
But they don't work with the new non-OpenAI models because we can't necessarily use the OpenAI 01:21:31.140 |
I think there is actually a way to use OpenAI SDK with them, but we're supposed to pretend 01:21:35.140 |
So there is this new SDK and I haven't, I haven't messed with it yet. 01:21:40.140 |
I don't know if you have, but as your AI inference, have you seen it? 01:21:49.140 |
And yeah, so this is, this is what to use for everything that's not OpenAI. 01:22:00.140 |
The thing I don't love about this is that this is Azure specific. 01:22:03.140 |
Because right now we use the opening SDK, which is like not Azure specific exactly. 01:22:12.140 |
So if we ported to this, then probably we would just, it would just work with everything. 01:22:20.140 |
So we just have to decide whether to port everything up to this, out to this so that we can use all the, all the modules, all the models. 01:22:39.140 |
But we would also need to make the bicep for it. 01:22:41.140 |
That's the other thing I haven't done is I haven't, cause I try to set a bicep for everything. 01:22:46.140 |
So typically bicep creates your Azure open AI instance. 01:22:49.140 |
If you're using mistral or llama, you would want bicep to, you'd probably want bicep to create that as well. 01:22:56.140 |
Um, and so that would be different bicep as an addition. 01:23:02.140 |
Cause basically what you do is you go to the issue tracker, you file a request. 01:23:05.140 |
And then if enough people ask for it, we're like, okay, guess we're going to do it then. 01:23:13.140 |
Um, but that's how we figure out what, you know, what it is that people are looking for. 01:23:17.140 |
Cause it is really nice to be able to swap out models. 01:23:19.140 |
Cause right now all of the samples do work with OLAMA. 01:23:21.140 |
So if you have like, uh, OLAMA running locally, here's my little OLAMA up there. 01:23:26.140 |
Um, you know, you can run like five, three and stuff like that. 01:23:29.140 |
Uh, you just go, you know, you go to your terminal and you're like OLAMA. 01:23:33.140 |
I think, I don't know if I typed by three correct, but, um, and so they do all run with, uh, OLAMA things, 01:23:42.140 |
but none of the OLAMA models have really been sufficient for RAG in my experience. 01:23:47.140 |
Like I, I run them just to check, but they, they all fail to get, um, follow directions. 01:23:54.140 |
Uh, cause I just think they're not, they don't have enough parameters. 01:23:56.140 |
Like these are like three B seven B et cetera. 01:23:59.140 |
So, um, they don't provide citations correctly. 01:24:03.140 |
Have you had more success with like five, three mini. 01:24:06.140 |
You see every model, of course, prompt changes. 01:24:11.140 |
So, so out of the gate, like I haven't had success using any of the small language models for RAG. 01:24:18.140 |
I'm sure the big versions of them would work much better. 01:24:21.140 |
So I do want to try out like the 70 B I've done, I've done up to seven B cause that's like how far I can go up locally. 01:24:29.140 |
I can't go much more than that just from space reasons. 01:24:32.140 |
So for them, what happens is that like they'll answer the questions fine. 01:24:36.140 |
The issue is that we need citations to be in good format. 01:24:39.140 |
Cause these are actually come back as bracketed square brackets. 01:24:42.140 |
And they just don't reliably come back with square bracketed citations, which doesn't sound like a big deal, but like we're trying to make clickable citations here. 01:24:50.140 |
Uh, so that's the issue I've had is that I, I think they're fine at synthesizing the information, but they don't follow the syntax directions in terms of the citations. 01:24:59.140 |
Um, and they're kind of more, maybe more likely to, uh, make stuff up if I ask an off topic question. 01:25:15.140 |
I think that that's where like finding that this one thing, maybe not like the expert full answers that, that follows the format, but like one of the smaller paths. 01:25:26.140 |
But you can't, so most of them don't support function calling off the bat. 01:25:30.140 |
So you're doing, would you do re-ranking with just a simple, you'd have to figure out what syntax they come back with. 01:25:40.140 |
If you can turn something into a coding task, you're good. 01:25:44.140 |
That's another form of rag is like, uh, I was telling some last time, like, uh, these are, these are all like doing like kind of rag on just a few documents at a time. 01:25:54.140 |
If you're trying to like analyze a whole database or like a huge number of documents, then you really want to like actually, uh, use like a SQL query, like with like aggregate functions or do like a pandas query. 01:26:06.140 |
So at PyCon, we did a demo where you like upload a CSV and then you, you say like, oh, I want to count the top restaurants in it. 01:26:13.140 |
And then it just comes up with the pandas code and then it runs the pandas code in a sandbox environment. 01:26:19.140 |
So that's, that's another like increasingly common, uh, form of rag where if you want to like come up with insights and analysis and, and that sort of thing, then you want to consider a different architecture where you're actually going to have the LLM generate pandas code or SQL code. 01:26:35.140 |
It's very good at both of those and then run those in a safe way. 01:27:07.140 |
So yeah, what you're describing, that's the same, the same approach. 01:27:11.140 |
So that would be, yeah, we, we're trying, um, I know Daniel's actually experimenting with type, Daniel's the creator of type chat. 01:27:21.140 |
He is experimenting with type chat with the local models. 01:27:26.140 |
Cause we tried, we did also try type chat with five, three locally to see if we could use it instead of function calling with open AI. 01:27:38.140 |
Um, but I think Daniel like maybe has to like beat the prompts and maybe they'll end up working better. 01:27:46.140 |
Maybe the bigger one, but not, not the smaller one. 01:28:18.140 |
Well, you have the passes for seven days, so feel free to keep deploying. 01:28:26.140 |
Uh, if you have any feedback for the workshop, tell us, or we have surveyed in which I assume