Running AI Application in Minutes w/ AI Templates: Gabriela de Queiroz, Pamela Fox, Harald Kirschner

00:00:00.000 | Thank you so much for coming to the workshop. My name is Gabriela de Queiroz and I'm Director

00:00:21.240 | of AI at Microsoft. I have Pamela here. I'm Pamela and I'm a Python cloud advocate,

00:00:28.140 | so well done on those people who said Python, but I also I worked in JavaScript for them for quite a long time and I generally like lots of languages.

00:00:40.140 | Hi, I'm Harold. I'm a PM on VS Code and GitHub Copilot chat.

00:00:46.140 | Awesome. So today we are going to be talking or showing you how to run an AI application in minutes.

00:00:57.140 | So we are going to have a lot of like hands on. So be ready to do like some coding, not coding, but like going through some coding using different tools,

00:01:09.140 | GitHub code spaces, Azure, and other tools that we are going to be talking about.

00:01:15.140 | But just to give an overview of like the agenda, I'm going to be talking about Microsoft for startups a little bit, some of the partnerships, some of the pain points, and then we go through the AI templates and hands on.

00:01:30.140 | So Microsoft has a program for startups. So if you have an idea, if you have a startup, you can apply to this program.

00:01:42.140 | And what I always tell people is you don't have to have a startup per se. But if you have an idea, that's enough to apply for this program. And you get a lot of benefits and benefits that can be

00:01:54.140 | I'll just skip I can be like credits. So you get up to $150,000 in Azure credits. You also have third party benefits like a lot of like different tools that you can use.

00:02:06.140 | And then of course, GitHub, Microsoft 365, LinkedIn Premium, and more. You can use all the different models from OpenAI, but also like LAMA, models from Kohari, Mistral, and so on.

00:02:21.140 | And the piece that I like the most is about the sessions that you can get one on one sessions with people like me, or Pamela, that we volunteer our time to share our knowledge with founders.

00:02:35.140 | We can talk about maybe like, I don't know, you are hiring and then I'm an expert in hiring. So you come and talk to me and I say, hey,

00:02:42.140 | these are some of the best practice for you when you are building your team, or you can go to technical sessions and ask more like technical pieces as well.

00:02:52.140 | And inside this platform, we have like several things other than the benefits that I mentioned and the guidance that I just mentioned is what we call build with AI.

00:03:02.140 | And inside we have some AI templates that the idea is like we can help you accelerate the AI application piece with some kind of like skeleton in a way.

00:03:17.140 | So you have something up and running in like few minutes.

00:03:21.140 | So again, you get cloud credits, you have access to dev tools, you have the AI templates, you have the one-on-one guidance.

00:03:34.140 | And no matter where you are in your journey, if you have an idea, if you are already building or if you're scaling, this program is for you.

00:03:47.140 | You have access to all the cutting edge AI tools, so you can innovate and streamline your AI development.

00:03:55.140 | And on top of like the founders of this program that we have, you also have like different programs that it kind of like, it's like the next step.

00:04:03.140 | Like let's say you are now scaling, growing, and then you use all the credits. What is next? There is a next. Like, you know, we try to guide you through the whole process.

00:04:14.140 | So there is something called the Pegasus program, where we help you to co-sell, go to market and so on.

00:04:22.140 | And then there are some like strategic VC partners and like accelerators that we partner with.

00:04:27.140 | So we have partnership with Y Combinator, Neo, The Alchemist, etc.

00:04:33.140 | Pain points for startups, there are a bunch of them. One of them is like, you don't have time. You cannot wait to go to markets. You have to go like as fast as you can.

00:04:46.140 | You have a lot of like resource constraints. We have some issues with scalability. You have, you don't have the support and guidance. And that's where we are trying to help you with.

00:04:56.140 | So now we are going to go to the fun part. It's like the AI template. So that's where Pamela is going to show you all the amazing things that you can do with all the different tools.

00:05:10.140 | All right. So our goal today is potentially having you deploy maybe even three different templates.

00:05:22.140 | Okay. So we have three different ones all like just, you know, show in the, in the browser, which ones we're going to be deploying. Right. So we have starting, we're going to start simple with this chat application here, just to make sure everything's up and working.

00:05:38.140 | And then we've got two different rag applications. One of them is a rag on a Postgres database, like rag on a Postgres table that does a SQL filter building.

00:05:47.140 | And then we have rag on a unstructured document. So here I've got a rag on my personal blog or like a rag on, you know, internal company documents, whatever it is that you're going to, whatever kind of documents you're going to rag on.

00:05:59.140 | So those are the three templates we're going to be looking at today. And we have it all set up so that you should be able to deploy those templates without spending any of your own money and doing it all through our credits, which is yay.

00:06:12.140 | Yay. All right. So, um, the first thing you need to do is get this URL. So everybody open this URL on your computer. So it's aka.ms/aie-workshop.

00:06:24.140 | It should open up a, a word document in the browser that looks like the screenshot you see here. So you can either type in the URL or scan that QR code and get that open on your machine.

00:06:39.140 | So let's make sure everyone's got it open. Welcome, welcome. So go ahead. Once you've got your computer ready, put this, uh, put this URL in your browser.

00:06:56.140 | Harold, maybe you can just memorize it and then help anyone who doesn't have it. Yeah. Aie-workshop. Uh, okay. So then let me go to that actual doc here.

00:07:10.140 | So the first thing you need is a GitHub account. Does anybody here not have a GitHub account? Okay. So everyone here has a GitHub account. Great. If you don't have a GitHub account, you can sign up for one for free right now. And that's it.

00:07:23.140 | Um, and, um, and that should be fine. Um, the next thing you need is an Azure pass. So this is something that we've got for this workshop for this conference. And this is going to let you deploy stuff on Azure without spending any of your own money.

00:07:40.140 | So we got a passes for 50 bucks and they're valid for seven days. So if you do want to keep hacking after the workshop, you can keep using your pass. And, uh, after seven days, it'll disappear just like Cinderella and the pumpkin. Uh, so in order to get that as your pass, you do need to have some sort of Microsoft account.

00:08:00.140 | Microsoft account. So you can use your, like, uh, you can use a personal Microsoft account if you have one. Uh, so if you're, if you like, how do you tell which one you're logged into right now?

00:08:10.140 | I guess if you just go to outlook.office.com, maybe you know what Microsoft account you're currently logged into. Um, and then you can see some people in the last workshop were like logged into their kid's Minecraft account.

00:08:21.140 | So just, uh, just, uh, just, you, you need a Microsoft account and you might want to double check to see which one you're currently signed into. If you are signed into a Microsoft account, if you don't have a Microsoft account, no big deal.

00:08:32.140 | You can make one on the spot. I made one this morning. So, uh, if you do need to make one, you can just make up a new outlook address and set it up that way. Um, so you can also make it as part of this project.

00:08:42.140 | So we're going to go to this easy check in URL and that's linked from this doc here. So if you don't have this doc, if you just came in, we can help you get the stock open.

00:08:50.140 | So we can get this URL and, uh, we're going to spend 10 minutes making sure we get through this step since it can be a little, a little tricky.

00:08:59.140 | So when you go to this check in URL, right, we put this in the browser, it loads, this is what you're going to see.

00:09:05.140 | And it says I can either create a GitHub account or log in with GitHub. So I'm going to log in with GitHub.

00:09:10.140 | Cause I already have a GitHub account and I'm logged into this browser with it already. So I'm just going to click on that.

00:09:19.140 | And so what's that's going to do is create a pass for my GitHub account. And so we get a pass. So each of us will get a different code based off our GitHub account.

00:09:28.140 | So this is my, you know, basically my Azure pass promo code. So I can copy that. And then there's this button here that says, get on board with Azure.

00:09:37.140 | This is the next step is to click this. And then we get this screen, which says, okay, this is, you can start.

00:09:49.140 | And when I click this here, it says what my currently logged in account is. So this is where you should check to make sure you're happy with what account you're logged in with. And you don't want to switch.

00:09:59.140 | Um, I don't recommend using a corporate account. If you do have a corporate account, like don't just don't use it. It's going to be problematic for various reasons.

00:10:07.140 | Cause corporate accounts may have restrictions that won't let you deploy things. So we do recommend using some sort of personal account or making up a new account.

00:10:14.140 | So that's why you see I'm using my Gmail instead of my Microsoft. Uh, so I'll confirm my account and then I can enter the promo code. And that was from this screen.

00:10:26.140 | So I still have this screen open. So I just go there. I paste it in and then we go S six X Y Y K. I think it's case insensitive submit.

00:10:41.140 | And then it's going to actually fail for me because I've already set this up on, on this thing here. Um, and this, if you see this, it's because you've already actually gone through this stage.

00:10:51.140 | Uh, so for you, it should work the first time and then, uh, it'll create the Azure account for you. And if it works, then what we can do is go to portal.azure.com.

00:11:02.140 | So portal.azure.com and we'll see what, how it loads in. Does a bunch of redirects. And then we can click on subscriptions. And what we should see is there should be at least one subscription that says as your past dash sponsorship.

00:11:23.140 | So that's our key that we have done this correctly. And as long as we use this subscription, when we're doing our deploys, we will not get charged any money.

00:11:32.140 | Well, Microsoft will, but you won't. That's the important part. Okay. So we're going to spend 10 minutes to make sure that we can get everyone through this, this stage so that we're all on the same page going forward.

00:11:42.140 | So if you already got it, that's awesome. You can, um, you know, like look at Harold's, like, uh, basic profile or something.

00:12:01.140 | So once you have that set up, the next step is the proxy. Um, so I'll just show that, uh, so that you can start playing with that.

00:12:11.140 | Uh, so here's, it's the next link in here. So the reason we have a proxy is because normally when you're using Azure open AI, you actually have to fill out a form and say how you're going to use Azure open AI.

00:12:24.140 | And then somebody says, Oh, okay. Yeah. That's a good use of open AI because Microsoft doesn't want people to use AI willy nilly.

00:12:31.140 | So we, you know, check to make sure that something adheres to our responsible AI principles.

00:12:36.140 | Uh, we don't have enough time for you to go through that process while we're in a workshop.

00:12:40.140 | So we've set up a, an Azure open AI proxy that you can use during the workshop with the repos.

00:12:46.140 | And we've have special instructions for how you can use this proxy with the repos, since you can't use the actual Azure open AI.

00:12:53.140 | Uh, so this, you can follow the link from the doc and log in with your GitHub account.

00:12:59.140 | Uh, I'll log out so I can show that.

00:13:02.140 | log in with GitHub.

00:13:08.140 | Okay. And so as I'm logged in and then we have an API key and a proxy endpoint, and that's all we need to be able to, uh, to use it as your open AI instance.

00:13:23.140 | Now, normally I don't like to use keys and I tell everybody to avoid them, but, uh, in this situation, we are going to be using keys and, uh, yeah.

00:13:32.140 | And these keys will expire at a certain point.

00:13:34.140 | So we don't have to worry about them being exposed.

00:13:37.140 | Uh, typically with keys, we'd have to protect them very fiercely so that nobody was using them.

00:13:42.140 | So you can go ahead and log into this and see your registration details.

00:13:46.140 | And then you can even play around with the playground.

00:13:49.140 | This is really similar to the Azure open AI playground or the open AI.com playground.

00:13:54.140 | If any of you played around with this, uh, you can see here, you can play with the system message.

00:13:59.140 | That's how you like say like, Oh, you're an AI assistant that constantly makes pirate jokes.

00:14:07.140 | Yeah.

00:14:08.140 | Uh, and then we update the system message.

00:14:13.140 | Oh, private.

00:14:16.140 | I wonder what it'll do.

00:14:18.140 | Uh, there we go.

00:14:21.140 | And then, um, let's see.

00:14:24.140 | Oh, enter my API key.

00:14:25.140 | Okay.

00:14:26.140 | So we need to enter the key.

00:14:27.140 | I actually have never used this before.

00:14:29.140 | Uh, so we're going to enter the key, not save it, select a model.

00:14:33.140 | Okay.

00:14:34.140 | Okay.

00:14:35.140 | So we select the model over here.

00:14:37.140 | Uh, so we've got three, five turbo.

00:14:39.140 | Oh, I didn't know we had four too.

00:14:42.140 | You set up four as well.

00:14:44.140 | Cool.

00:14:45.140 | I can use four.

00:14:46.140 | Four is better.

00:14:47.140 | All right.

00:14:48.140 | And four is slower, but better.

00:14:51.140 | Uh, okay.

00:14:52.140 | And then, uh, please, uh, tell audience about open AI.

00:14:59.140 | Open AI.

00:15:00.140 | Okay.

00:15:01.140 | All right.

00:15:05.140 | And you can see different parameters that we send.

00:15:07.140 | And these are all getting sent to the open AI SDK.

00:15:09.140 | So we say the model right here, we've set up two models.

00:15:12.140 | GPT three, five turbo, GPT four.

00:15:14.140 | Those are often the ones you're picking between with open AI.

00:15:17.140 | Although now you've got GPT four.

00:15:18.140 | Oh, that's a good choice.

00:15:19.140 | If you're doing something with vision, something multimodal, I wouldn't use it.

00:15:23.140 | Otherwise just based off of some experience we've had with it.

00:15:27.140 | Um, but it is a great one.

00:15:29.140 | GP four.

00:15:30.140 | Oh, is good for vision as here.

00:15:32.140 | You can see, you know, with the combination of the system message and the user message.

00:15:37.140 | So this is what we call a user message.

00:15:39.140 | This is what we call a system message.

00:15:40.140 | Those combined together.

00:15:41.140 | We get back a response like this where it describes open AI with lots of ours and mateys and stuff.

00:15:49.140 | Uh, we can, you know, change different parameters here.

00:15:52.140 | Like how many tokens it should send back.

00:15:55.140 | The temperature is roughly the creativity.

00:15:57.140 | Um, top P is also roughly about creativity and there's some more advanced stuff there.

00:16:04.140 | And you can see how many tokens you used on the way out and how many tokens you got on the response.

00:16:09.140 | So you can play around with this playground to, uh, you know, to try stuff out and make sure that, uh, that you're able to, to use the key.

00:16:20.140 | So this is just linked off of, um, off of this workshop, right?

00:16:24.140 | So if you go to the workshop proxy, you log in, you'll get your key in your end point.

00:16:29.140 | You can go to that playground and you can play around with the playground to check that that's working.

00:16:34.140 | But we just want to make sure everybody now has an Azure pass and is logged in to the proxy so that you have a key in an endpoint.

00:16:42.140 | So we'll just check to see if anyone had anything to do that.

00:16:45.140 | Okay. All right. So here's the, like he, these are, if you're looking for the models, this is generally the page to check.

00:16:53.140 | Um, so, you know, GP four O, GP four and going down.

00:17:00.140 | Those are the GP four models.

00:17:02.140 | GP five.

00:17:03.140 | You're saying there's a GP three, five that supports vision.

00:17:06.140 | Oh, four tuber with vision.

00:17:10.140 | This one. Yeah.

00:17:12.140 | Yeah. So we were using that one, but it's a lot slower.

00:17:15.140 | Yeah.

00:17:16.140 | So that's why we, I've started using four O or this one.

00:17:21.140 | This is a GA.

00:17:22.140 | Um, um, okay.

00:17:26.140 | All right.

00:17:27.140 | Yep. So you just want to compare those.

00:17:30.140 | So we'll just be using the basic GP three, five and GP, uh, just GP three, five today, actually.

00:17:35.140 | And then also the embedding models.

00:17:36.140 | Okay. So is everybody set up with the proxy?

00:17:39.140 | Okay.

00:17:40.140 | All right.

00:17:41.140 | So now we're going to actually get something working.

00:17:45.140 | So we have this repo here, so you can follow the link from the doc.

00:17:50.140 | And it has read me's for the three different projects that we can deploy.

00:17:55.140 | And these read me's are specific to using them with the Azure open AI proxy.

00:18:00.140 | Uh, so normally you can just use the, the read me's that are on the repos itself.

00:18:04.140 | But because we are using this Azure open AI proxy, we do have to use a slightly different setup.

00:18:09.140 | So we've made read me specific, uh, for this, for this workshop.

00:18:13.140 | Uh, so we can start off on this, uh, open AI chat app quick start and make sure that that's all working.

00:18:22.140 | So the first step is to open in GitHub code spaces.

00:18:25.140 | So you can do that by clicking this button here.

00:18:28.140 | Have any of you use code spaces before?

00:18:30.140 | Okay.

00:18:31.140 | A couple of people.

00:18:32.140 | So code spaces will open a VS code in your browser with a developer environment for that repo.

00:18:41.140 | So you can actually use code spaces on any GitHub repo.

00:18:44.140 | So you go to any GitHub repo, you click on code and you can make a code space for it.

00:18:48.140 | So it's a way that you can start hacking on any repo, uh, very quickly.

00:18:53.140 | So you can open this button here to open in code spaces.

00:18:57.140 | And, uh, I'll just go ahead and make a new one.

00:19:01.140 | And I'll say create code space.

00:19:04.140 | So this is going to take a few minutes to load.

00:19:14.140 | Cause what it's doing is that it's creating the environment for this repository.

00:19:18.140 | It's opening VS code in the browser.

00:19:21.140 | And it's also just setting up VS code.

00:19:23.140 | So if you actually have like, if you use VS code locally and you've got like extensions that you use locally,

00:19:28.140 | it's actually potentially sinking those extensions and, uh, enabling them, them here.

00:19:33.140 | I should probably just not do that.

00:19:35.140 | Cause then it would load faster for me.

00:19:37.140 | Um, but yeah, you can see in the bottom here as it's setting up and we'll just wait for it.

00:19:45.140 | So this is, you know, the slowest part of using code spaces is just the loading.

00:19:50.140 | You have a tip.

00:19:53.140 | If you want faster code spaces, there's pre-builds available as well.

00:19:57.140 | Yeah.

00:19:58.140 | And I do have them on the third repo, but I think I don't have it on this one.

00:20:02.140 | So I, I should have remembered to do pre-builds for all the repos.

00:20:05.140 | Right.

00:20:06.140 | And the slowest part is probably installing all the dependencies and the builds.

00:20:09.140 | It's basically it's doing all the things you would do when you install it locally,

00:20:13.140 | just automate it and with a progress bar.

00:20:16.140 | And at some point it would just light up.

00:20:18.140 | Yeah.

00:20:19.140 | Let's see what the, you can even watch, can we watch the logs for this one?

00:20:25.140 | building code space.

00:20:27.140 | Cool.

00:20:28.140 | Whoop.

00:20:29.140 | Here we go.

00:20:30.140 | So you, if you like this sort of thing, like if you like watching Docker containers build,

00:20:34.140 | cause that's what it's actually doing.

00:20:36.140 | Everything's a Docker container.

00:20:37.140 | So you can actually watch it as it, um, builds everything here.

00:20:43.140 | And now it's downloading all the requirements.

00:20:46.140 | So these are all the Python requirements.

00:20:48.140 | So all the examples that we're going through today have a Python backend and then some sort

00:20:52.140 | of JavaScript front end.

00:20:54.140 | Uh, this one has what we call like a vanilla JavaScript front end.

00:20:57.140 | As in, I just wrote some JavaScript in a script tag.

00:21:00.140 | Uh, but then the other ones are much fancier.

00:21:02.140 | So they've got a full type script and a build system and react components, uh, using the

00:21:07.140 | Microsoft fluent UI, uh, you know, web framework.

00:21:10.140 | So you can kind of see the range of front ends there.

00:21:14.140 | Okay.

00:21:15.140 | So you can see it's, you know, it's still going through the process, but at least now, uh, we

00:21:19.140 | can see the file Explorer has loaded so we can, uh, explore the files here.

00:21:24.140 | And, uh, and I'll show, I'll go ahead and show the, the code.

00:21:29.140 | If you're interested in the code, uh, it is in the source folder.

00:21:33.140 | Uh, we are using a court application and I think nobody has heard of court, but, uh, has

00:21:38.140 | anyone here heard of flask or used flask?

00:21:41.140 | Great.

00:21:42.140 | So core is just the async version of flask.

00:21:45.140 | So it's literally built on top of flask.

00:21:47.140 | And one day it might be brought back into flask.

00:21:50.140 | And it just, you just take your flask code and you put a sinks in it.

00:21:54.140 | And then you've got, you've got court.

00:21:56.140 | Uh, that's really, uh, how it goes.

00:21:58.140 | So, uh, if you haven't done async before in Python, async is a way that I've used async with

00:22:04.140 | your functions, they become co-routines and then they can be paused and waited on.

00:22:07.140 | And it's important to use async when we're building applications with AI, because we have these

00:22:14.140 | really long blocking calls to an AI API, right?

00:22:17.140 | So we make a call to an LLM and we send off our requests.

00:22:19.140 | And these LLMs, they can take like two seconds, five seconds, 10 seconds, right?

00:22:23.140 | Depending on what we're doing.

00:22:24.140 | And while that's happening, we ideally want to be able to handle other user requests coming in.

00:22:30.140 | Uh, so that's why we use async framework.

00:22:33.140 | So if we use an async framework, then while we're making IO calls, we can handle other user requests that are coming in.

00:22:41.140 | So all of the ones that we see today have an async backend, either court or fast API.

00:22:46.140 | Anyone heard of fast API?

00:22:47.140 | It's very, very popular these days.

00:22:48.140 | Yeah.

00:22:49.140 | So fast API is the one most people know of as the async framework.

00:22:52.140 | Um, so I, you know, I, I like both of them fairly equally.

00:22:56.140 | Uh, so I, I use a mix of both.

00:23:00.140 | Um, but I just want to make sure people know about the value of async frameworks.

00:23:04.140 | Okay.

00:23:05.140 | So that's all in the core app folder.

00:23:07.140 | Uh, if you want to look at the code there, so it is now finished.

00:23:10.140 | Okay.

00:23:11.140 | Anybody else get their code space loaded?

00:23:13.140 | Get a couple.

00:23:14.140 | Okay.

00:23:15.140 | Great.

00:23:16.140 | Finished configuring.

00:23:17.140 | So I can.

00:23:18.140 | Yeah, we are going to be using the terminal.

00:23:22.140 | And if for some reason your terminal like goes away, sometimes this happens to the code space.

00:23:26.140 | Just click that plus right here.

00:23:28.140 | Sometimes my terminal kind of blinks out.

00:23:30.140 | So I just click the plus and that'll give me a new terminal, right?

00:23:33.140 | Boop.

00:23:34.140 | New terminal.

00:23:37.140 | Okay.

00:23:38.140 | So here we are in the terminal.

00:23:40.140 | Um, but actually the first thing we're going to do is that there's a dot EMV dot sample.

00:23:46.140 | We're going to make a dot EMV file based off of that.

00:23:49.140 | So I'm going to make a new file and I can do that using this little new file button up here.

00:23:55.140 | So I'll just click that say new file and I'll type dot EMV.

00:24:00.140 | Uh, you could also like copy and paste.

00:24:03.140 | Um, and then I'm just going to paste the dot EMV in there.

00:24:06.140 | You can even rename dot EMV dot sample to dot EMV.

00:24:09.140 | I think that's another way.

00:24:11.140 | Um, and then we need to fill in these values to match the values of the proxy.

00:24:15.140 | So we'll go to the proxy and let's see, where's my proxy open here.

00:24:22.140 | So here's my proxy.

00:24:23.140 | So I'm going to go ahead and fill in this one.

00:24:26.140 | That's the end point.

00:24:27.140 | So the end point should start with HTTP and end with slash V1 and look like that in the middle.

00:24:34.140 | Uh, so that's the end point.

00:24:36.140 | That's where we'll be sending our opening requests.

00:24:39.140 | Then we need the key.

00:24:40.140 | So we'll copy that and it'll look like that or slightly different for you.

00:24:46.140 | And then the deployment is going to be the name of the deployment is GPT-35 turbo.

00:24:53.140 | Uh, and that's also the name of the model in this case.

00:24:58.140 | So if you, has anybody used open AI.com?

00:25:01.140 | A few people.

00:25:02.140 | Okay.

00:25:03.140 | So on open AI.com, you just pick what model you're going to use and that's all you need.

00:25:07.140 | With Azure open AI, you have to make deployments based off of the model.

00:25:11.140 | So you actually have a bunch of deployments and you could actually have multiple deployments of a GPT-35 turbo model that have different names.

00:25:17.140 | So when you're working with Azure open AI, you have to know the deployment name, not just the model name.

00:25:22.140 | So that's one of the complexities of, of using Azure open AI, but it does give you more flexibility.

00:25:27.140 | Cause you can say, Oh, this deployment is going to have 20 tokens per minute.

00:25:30.140 | And this was going to have 30 tokens per minute.

00:25:32.140 | Right.

00:25:33.140 | And then you can like say, which of your colleagues can use what?

00:25:35.140 | Like if they're all trying to like use up your deployment or whatever.

00:25:37.140 | Uh, so as it's more flexibility, but you do have to specify it.

00:25:42.140 | Okay.

00:25:43.140 | So now my dot EMV is set up.

00:25:44.140 | So this is just so that I can run a, a local server and I'm putting local server in quotes.

00:25:50.140 | Cause I'm going to run a local server inside GitHub code spaces.

00:25:55.140 | So it's actually running a local server, not on my actual machine, but inside the GitHub code spaces development environment.

00:26:02.140 | Uh, so to do that, I grab, I'll grab the command here.

00:26:06.140 | That's going to run the court app and just give it.

00:26:11.140 | And when with code spaces, you do have to allow, so you'll see this little thing that pops up.

00:26:16.140 | So if you ever want to copy paste, you have to allow for the terminal.

00:26:20.140 | Uh, and then I have, okay.

00:26:22.140 | And then I paste it and then you can see that it says it's running on this URL.

00:26:28.140 | Now you can't just paste this URL in the browser.

00:26:30.140 | I'll show what will happen.

00:26:32.140 | So if I paste in the browser, I'm going to get an error because this is not running on my local machine.

00:26:36.140 | This is running inside GitHub code spaces.

00:26:38.140 | So you have two ways to get to it.

00:26:40.140 | One way is that if you just click on it, uh, like, uh, option click, at least on my Mac.

00:26:44.140 | So I mouse over, it'll tell me what to do mouse over option.

00:26:48.140 | Click.

00:26:49.140 | So code spaces will actually detect that you're clicking on a local URL and it'll turn it into a code space port URL.

00:26:57.140 | And it's this funky URL up here, improved disco for me.

00:27:01.140 | Um, and, uh, and that's actually, you know, like local for that GitHub machine.

00:27:06.140 | And, uh, that's one way of doing it.

00:27:08.140 | Another way that you might like more is you go to your ports tab and you're going to find it listed here.

00:27:14.140 | And, uh, we'll see the, you know, the forwarded address and we can click on that, or we can even click the glue globe icon and we get to the same URL.

00:27:25.140 | So there's many ways you can get to this locally running URL, uh, and, uh, and get to the special code space URL bit.

00:27:32.140 | And you can even change your port visibility.

00:27:34.140 | If you want to like share it with a colleague, if you were just, or in a class, you can change it to public.

00:27:39.140 | And then you could actually ping this URL to someone else.

00:27:41.140 | Now, this is not a deployed URL.

00:27:43.140 | Like you're not going to use this for like, you know, your deployed URL, but it's fun.

00:27:47.140 | It's good for development.

00:27:48.140 | So now I've got this running locally and now we can type stuff and be like, what's the weather in San Francisco?

00:27:57.140 | See if it's going to lie.

00:28:00.140 | Uh, then we can see.

00:28:07.140 | Oh, good.

00:28:08.140 | That was a good answer.

00:28:09.140 | I think this has been trained.

00:28:11.140 | It refused to answer.

00:28:13.140 | It's always good when it refuses to answer something it shouldn't know.

00:28:16.140 | Um, so we could go ahead and like, you know, I could change this now and change the system message.

00:28:23.140 | And let's see, where's our system message in here.

00:28:29.140 | Right.

00:28:30.140 | So right now my system message is just, you are a helpful assistant.

00:28:32.140 | I'll be like, you are a assistant that cannot resist a good pasta joke.

00:28:43.140 | I don't know what it's going to be.

00:28:44.140 | I don't know what it's going to be.

00:28:45.140 | I love LLMs.

00:28:46.140 | Okay.

00:28:47.140 | Oh, I love LLMs.

00:28:48.140 | Okay.

00:28:49.140 | So what's the weather today?

00:28:54.140 | Is it going to make a pasta joke?

00:28:57.140 | Okay.

00:28:58.140 | All right.

00:29:00.140 | It looks like you might have been quite saucy today.

00:29:04.140 | Don't forget your umbrella.

00:29:05.140 | You might end up feeling like a soggy middle.

00:29:07.140 | So good.

00:29:08.140 | Uh, so, uh, so that works.

00:29:12.140 | Uh, so here we go.

00:29:13.140 | So now this is running locally.

00:29:16.140 | Um, and so this is a good one.

00:29:18.140 | Like when we're developing, we can just test things, test things locally here.

00:29:22.140 | The next thing we're going to do, once we're happy with it, we're like, this is the best app.

00:29:25.140 | It makes pasta jokes.

00:29:26.140 | We're going to deploy it.

00:29:27.140 | Uh, so then we move on to the deployment instructions.

00:29:31.140 | So the first step of that is AZD auth login.

00:29:35.140 | So this is going to log in to our Azure account that we made earlier.

00:29:41.140 | So I'll do AZD auth login.

00:29:44.140 | And, uh, this is going to give us a device code that we're going to paste into this OAuth browser flow.

00:29:52.140 | So let me go and open it up.

00:29:55.140 | Maybe over here.

00:29:56.140 | I think that's my Azure account that I'm using for this.

00:29:58.140 | And then I go and I take this and I paste it in and I'm going to pick my account.

00:30:08.140 | I'm going to use this one continue and, uh, okay.

00:30:16.140 | And then we're logged in.

00:30:17.140 | Okay.

00:30:18.140 | All right.

00:30:19.140 | So that was the device code flow.

00:30:21.140 | So you just want to make sure that you log into the account that you got with the past, right?

00:30:26.140 | Whatever account you use for the past, that's what you want to log into.

00:30:30.140 | The next step is to create, or Gabriella, should I pause?

00:30:36.140 | Like, should we get through the local step first?

00:30:38.140 | Or should we keep going with AZD deploy?

00:30:46.140 | Yeah.

00:30:47.140 | We can pause and see if everyone's got the local one running, actually.

00:30:50.140 | I think that might be good to do.

00:30:52.140 | Okay.

00:30:53.140 | So let's just pause and see if there's any questions with getting the local one running.

00:30:56.140 | Yeah.

00:30:57.140 | So yeah, someone asked, like, can we just run this locally?

00:30:59.140 | You can totally run these locally as well.

00:31:01.140 | We like to use GitHub code spaces in workshops because that reduces the number of potential development environment issues.

00:31:06.140 | If you want to run it locally, you can either run it, you know, just with a Python virtual environment and you just have to install all the requirements.

00:31:16.140 | Or you can run it with VS code using the dev containers extension and that will do the Dockerize environment for you.

00:31:22.140 | If you want kind of the benefit of the Dockerize environment without, you know, being in the browser and having to pay, potentially pay for code spaces.

00:31:30.140 | So we should know also that GitHub code spaces, you have a limit of some number of hours a month, either 60 or 120.

00:31:37.140 | It's 60.

00:31:38.140 | Okay.

00:31:39.140 | I must've paid more.

00:31:40.140 | So it's 60.

00:31:41.140 | So you're not going to go over that today, but eventually you could go over that if you use code spaces a lot.

00:31:47.140 | So if you're local, right, I think I have mine open locally as well.

00:31:51.140 | And I'm just, yeah, locally, I'm just using a Python virtual M.

00:31:55.140 | So you're also welcome to try these things out locally.

00:31:57.140 | If you like local environments, just, you know, be a good person and make a Python virtual M to manage your Python dependencies.

00:32:05.140 | Right.

00:32:06.140 | Yeah.

00:32:07.140 | Okay.

00:32:08.140 | So I saw a lot of local things.

00:32:11.140 | So I think we can move on to the AZD.

00:32:13.140 | So, yeah, I did the login.

00:32:16.140 | So you saw me do, you saw me do the login here.

00:32:19.140 | Right.

00:32:20.140 | And that's using the device code flow.

00:32:22.140 | So you should see something like this happen from inside code spaces.

00:32:26.140 | And the next step is to make a new AZD environment.

00:32:31.140 | So AZD is this tool we're using for deployment.

00:32:34.140 | So we make a new environment name.

00:32:36.140 | You can just call it like chat app, whatever you want to call it.

00:32:38.140 | And then what that does is it actually makes this dot as your folder and it makes this chat app folder inside.

00:32:46.140 | And that's where it's going to store all of our deployment environment variables.

00:32:50.140 | So we need to set all and configure anything we want to customize about our deployment.

00:32:55.140 | We're going to configure that now and it's going to get update.

00:32:58.140 | It's going to update this file here.

00:33:01.140 | So the next thing we're going to do is set all these AZD environment variables.

00:33:05.140 | So the AZD environment variables are different from the ones we just saw in the .env.

00:33:09.140 | The .env is just for the local server.

00:33:11.140 | AZD environment variables are for deployment.

00:33:13.140 | Sometimes we use the same, but a lot of times we want our local environment to be slightly different from our deployed environment.

00:33:20.140 | So we have two different ways of setting those variables.

00:33:24.140 | All right, so I first set these commands.

00:33:26.140 | So this is just going to tell it to not create an Azure OpenAI because we're using the proxy.

00:33:31.140 | And then we're going to set the name of the deployment to GPT-35 Turbo.

00:33:36.140 | Then we need to set the key.

00:33:39.140 | So I'm going to paste this and then I'm going to delete, delete, delete.

00:33:42.140 | Gosh, that's what happens when you have Wi-Fi issues actually, is you see it with the typing.

00:33:49.140 | And then I got to find my key again.

00:33:52.140 | And there we go.

00:33:55.140 | So that sets the key.

00:33:56.140 | And then I'm going to set the end point.

00:33:58.140 | I'm going to delete how we're going to do this.

00:34:03.140 | There we go.

00:34:04.140 | And get the end point.

00:34:08.140 | All right.

00:34:17.140 | So now I've set all these things.

00:34:19.140 | Now, if I've done it correctly, if I look at my dot Azure folder for that environment I created, I should see a dot EMV that looks like this.

00:34:28.140 | So this is a dot EMV that's inside the dot Azure folder.

00:34:31.140 | So this is what is going to be used for the deployment.

00:34:34.140 | And it's going to tell it, you know, this is how it's going to set up the Azure open AI connection.

00:34:40.140 | Okay.

00:34:41.140 | And now I'm just going to do, I'm just going to type type.

00:34:44.140 | Thank you.

00:34:45.140 | Okay.

00:34:46.140 | All right.

00:34:47.140 | And then I'm going to do AZD up.

00:34:49.140 | Here we go.

00:34:51.140 | Here we go.

00:34:52.140 | So what AZD up is doing is that it's actually deciding, it's doing several stages.

00:34:58.140 | Okay.

00:34:59.140 | So I have to select an Azure subscription.

00:35:01.140 | In this case, I only have one subscription.

00:35:03.140 | So you just press enter.

00:35:04.140 | Uh, if you had two subscriptions, you would want to pick the sponsorship one.

00:35:09.140 | Uh, then I select an Azure location to use.

00:35:13.140 | Typically you just choose one that's close to you.

00:35:15.140 | So central us is pretty good.

00:35:18.140 | Uh, now what AZD is doing.

00:35:21.140 | The first step is that it's actually packaging up the code that it's going to deploy later.

00:35:26.140 | Uh, in this case, we're deploying to Azure container apps.

00:35:29.140 | So it's packaging up a Docker container file.

00:35:31.140 | So it's actually literally building a Docker container right now.

00:35:34.140 | So if you do like working with Docker, Azure container apps is a great fit.

00:35:39.140 | And a lot of people like Docker.

00:35:41.140 | So we deploy a lot of stuff there, but we also are going to be using Azure app service for one of the later templates.

00:35:46.140 | Uh, so we've got lots of ways to deploy on Azure.

00:35:49.140 | So you can see it building up that Docker.

00:35:51.140 | The step after this is where it's actually going to create Azure resources.

00:35:55.140 | So it's going to create the container apps can create a container registry,

00:35:59.140 | create a container apps environment and create a log analytics workspace.

00:36:03.140 | So these are all the components of a containerized app on Azure.

00:36:07.140 | And, uh, you know, it's multiple components and we have to stitch them together.

00:36:11.140 | The way we stitch them together is using infrastructure as code.

00:36:15.140 | Uh, has anyone used Terraform here before?

00:36:18.140 | Okay. So we have our own version of Terraform.

00:36:20.140 | It's called bicep and it is, uh, infrastructure as code,

00:36:24.140 | which means we're declaring what resources we want to make.

00:36:29.140 | Right. So we say, oh, we want to make log analytics.

00:36:32.140 | We want to make container apps.

00:36:34.140 | We want to make, you know, the actual container apps image, and then we're going to assign some roles.

00:36:39.140 | Right. So all of that is declared in this bicep file.

00:36:43.140 | So that way you have repeatable repeatable processes for provisioning.

00:36:48.140 | And this is really helpful when you're making complex applications on Azure.

00:36:51.140 | Cause you might have like 10 different things you're using, right?

00:36:54.140 | Uh, you might have a Postgres and a key vault and a Redis cache and, uh, log analytics and app service.

00:37:02.140 | And you want them all to tie together.

00:37:03.140 | So you can declare what that, you know, what that infrastructure looks like.

00:37:07.140 | And then, uh, and then put that in a bicep file and then deploy it.

00:37:11.140 | You can also use Terraform.

00:37:12.140 | So if you were really into Terraform and very comfortable with it, you could totally use Terraform tier as well.

00:37:17.140 | I don't know Terraform.

00:37:18.140 | I haven't used it personally.

00:37:19.140 | So all of my examples do use bicep.

00:37:21.140 | But if you want to send a PR with Terraform, I'll, I'll review it and just stamp it.

00:37:28.140 | Cause I don't know how to reason about it.

00:37:29.140 | So what you can see here is that it is actually creating, uh, the resources right now.

00:37:34.140 | So you can watch it here.

00:37:35.140 | You can also watch it in the portal.

00:37:37.140 | It's not really super exciting to watch.

00:37:39.140 | So this is the point where I usually fold my laundry.

00:37:42.140 | Um, uh, because it can take some amount of time, uh, or you can even get an error.

00:37:48.140 | Oh, I used.

00:37:49.140 | Okay.

00:37:50.140 | So I already made one in central us for the earlier demo.

00:37:55.140 | So I should have picked a different region.

00:37:56.140 | So for this Azure pass, there is a constraint of one container app per region, which is why we said in the readme that you should pick a region that you haven't picked before.

00:38:05.140 | And I didn't pay attention to my readme.

00:38:07.140 | Uh, so that one won't deploy.

00:38:09.140 | So, uh, what I can do is I'm just going to make a, I'll just make a new environment.

00:38:13.140 | I'll just, uh, I'll just copy it.

00:38:16.140 | Everything over.

00:38:17.140 | Um, now what you shouldn't run into this cause this would be your first, uh, your first environment.

00:38:23.140 | Right.

00:38:24.140 | Uh, so chat app two, and I'll just copy and paste.

00:38:30.140 | Uh, we'll change the chat app two.

00:38:32.140 | And then West us seems like a good region.

00:38:35.140 | Okay.

00:38:36.140 | And then I'll AZDM select chat app two.

00:38:40.140 | Uh, there we go.

00:38:44.140 | And then AZD up.

00:38:46.140 | Yeah.

00:38:47.140 | Okay.

00:38:48.140 | So then it'll do the up again, but I have one of these already, already deployed.

00:38:54.140 | So I'll just open up the deployed ones.

00:38:56.140 | You can see deployed.

00:38:57.140 | Deployed is going to look pretty darn similar to what it looks like locally.

00:39:02.140 | Where's the one deployed?

00:39:03.140 | Okay.

00:39:04.140 | So this one's deployed.

00:39:05.140 | It looks pretty much the same as what it looks like running locally.

00:39:07.140 | Right?

00:39:08.140 | Is that the URL is a container apps URL.

00:39:11.140 | And you'll see this URL displayed in the terminal.

00:39:14.140 | Once it finishes successfully deploying, you'll see this displayed.

00:39:18.140 | Let me see if I have that in my history anywhere from earlier today.

00:39:23.140 | Uh, dah, dah, dah, dah, dah.

00:39:26.140 | No, you never know.

00:39:28.140 | Okay.

00:39:29.140 | So let's see how it's going here.

00:39:32.140 | Two, two, two.

00:39:33.140 | Yeah.

00:39:34.140 | I lovingly handcrafted them.

00:39:40.140 | Yeah.

00:39:41.140 | So, um, I, yeah, I write the bicep files.

00:39:44.140 | Uh, some of them, all the ones in core are actually from a shared repo that we just copy

00:39:48.140 | and paste from.

00:39:49.140 | We're trying to move towards something called AVM as your verified modules, which are bicep

00:39:53.140 | files that are maintained and have security best practices in them.

00:39:58.140 | So we'll, we'll gradually be moving over.

00:40:00.140 | But basically with bicep files, like you can use ones from a central registry.

00:40:03.140 | You can use ones from your own private registry if you're doing a lot of them.

00:40:07.140 | Uh, or you can just use, you know, ones inside the folder.

00:40:10.140 | Um, so there's a lot of techniques you can use depending on how much bicep you're using.

00:40:15.140 | Okay.

00:40:18.140 | So now it's starting over and deploying again.

00:40:20.140 | All right.

00:40:21.140 | So let's walk around and, or any questions on what I showed here?

00:40:26.140 | All right.

00:40:27.140 | So I saw a lot of AZD deployments are going.

00:40:31.140 | I saw some issues with like naming, which I run into all the time.

00:40:34.140 | Azure has very obscure naming rules.

00:40:37.140 | The safest thing is do short names with no symbols in them and nothing fancy.

00:40:43.140 | Uh, if you do have a naming rule, you can just always do AZM new and make a new environment

00:40:48.140 | and, you know, and start over.

00:40:50.140 | Uh, and that should be okay.

00:40:52.140 | Uh, but generally the issues you run into with deployment are usually related to naming,

00:40:57.140 | region constraints, um, account constraints.

00:41:01.140 | And that's probably, yeah.

00:41:03.140 | The ones you might run into.

00:41:04.140 | All right.

00:41:05.140 | So, um, we just give me a 45 minutes left.

00:41:08.140 | I'm going to show you the other ones.

00:41:10.140 | And, uh, and these are ones that you can, uh, that you can also start trying to deploy now.

00:41:16.140 | And following very similar read me's, right?

00:41:18.140 | So the first one, um, actually the, the, these two are both about rag.

00:41:22.140 | So first I'll talk briefly about rag.

00:41:24.140 | Right.

00:41:25.140 | So let me first motivate it.

00:41:26.140 | Right.

00:41:27.140 | Uh, let's see.

00:41:28.140 | Like, uh, tell me what Pamela Fox, uh, likes to code on.

00:41:36.140 | I don't know.

00:41:37.140 | Let's try this.

00:41:38.140 | I'm trying to get it to lie.

00:41:40.140 | Um, this is the pasta one.

00:41:43.140 | Okay.

00:41:44.140 | So this one clearly lied.

00:41:46.140 | Spaghetti Phi-a-thon.

00:41:47.140 | Great.

00:41:48.140 | All right.

00:41:49.140 | So then, but then if I go to, um, this one right here, tell me what Pamela Fox likes to code on.

00:41:56.140 | This one will hopefully be more accurate.

00:42:01.140 | At least have less pasta jokes here.

00:42:03.140 | Um, and this is, so basically what we're trying to show is that if we just ask an LLM to answer

00:42:09.140 | a question, it is, it's very possible that it's just going to make something up.

00:42:14.140 | Um, if that one, it seems like there.

00:42:16.140 | Oh, good.

00:42:17.140 | I mean, in this case, it says it doesn't know what I like to code in.

00:42:20.140 | I think I should have said like code in, um, you know, like here, like what Python frameworks

00:42:24.140 | as Pamela used, try this one.

00:42:26.140 | Um, so, you know, if it doesn't know the answer, it'll say, uh, in this case, yeah, in this case,

00:42:30.140 | it doesn't know the answer because this is actually using the rag technique in order to answer questions

00:42:35.140 | based off a knowledge source.

00:42:37.140 | Right.

00:42:38.140 | Um, so those are our last two samples are about rag.

00:42:43.140 | rag.

00:42:44.140 | Uh, so the general approach of rag is that we get a user question.

00:42:48.140 | We use that user question to search some sort of database or search engine.

00:42:53.140 | We get back matching search results for that user question.

00:42:56.140 | And then we send those, uh, to the large language model and say, here's a user question.

00:43:02.140 | Here are the sources.

00:43:03.140 | Now answer the question according to the sources.

00:43:06.140 | And so now we can make customized applications that can actually synthesize and answer questions

00:43:11.140 | for any domain.

00:43:13.140 | So we've got two rag samples here.

00:43:15.140 | So one of them is rag on Postgres.

00:43:17.140 | So this is for the use case.

00:43:19.140 | If you've got an existing database and you want to be able to ask questions about that database

00:43:25.140 | and have the LLM answer accurately based on that.

00:43:28.140 | So for the example, uh, you know, database that I'm using, I have product, right?

00:43:35.140 | So these, this is a chat on products.

00:43:37.140 | So, uh, you know, our table is storing all the products for this website.

00:43:41.140 | So I can say, okay, what is the best shoe for hiking?

00:43:46.140 | So then it's going to go and search the database rows and get back matching rows and then come back and say,

00:43:54.140 | okay, this blah, blah, blah, blah, blah, blah, blah, blah.

00:43:56.140 | And it's going to include citations.

00:43:58.140 | So one of the key points of rag is to have citations so that users can verify where the information come from

00:44:04.140 | and see that it's actually legit information.

00:44:07.140 | And we can also look at the, uh, the process for this rag flow here.

00:44:13.140 | When we look on the, the thought process here.

00:44:16.140 | And as, as we'll let you see is that this rag flow is a multi-step process.

00:44:22.140 | So the first process is actually what we call like the query rewriting phrase or the query cleanup phrase.

00:44:27.140 | So that's where we take the user's question.

00:44:30.140 | And we ask the LLM like, Hey, here's a user question.

00:44:33.140 | Turn this into a good search query.

00:44:35.140 | Cause a lot, a user question may not be that well formula, right?

00:44:38.140 | Like, uh, please tell me about the best shoes for hiking now.

00:44:45.140 | Okay. So, you know, there's like a user query and, uh, you know, that's probably not the optimal search query for, uh, for a search.

00:44:54.140 | So if we look now at the thought process, we can see that the LLM actually turned that whole long thing into best shoes for hiking.

00:45:02.140 | So that's a better query.

00:45:04.140 | Uh, so that's our query rewrite phase.

00:45:06.140 | So that's an LLM call.

00:45:07.140 | Then we get back the resulting rows from the database.

00:45:10.140 | And then this is our call to the model that says, Hey, you need to answer questions.

00:45:16.140 | According to the sources.

00:45:17.140 | Here's how you should cite your sources.

00:45:19.140 | Here's the user question.

00:45:20.140 | Here's the user question.

00:45:21.140 | And then here is all the sources.

00:45:24.140 | So this is basically rag.

00:45:26.140 | And then, you know, we're able to use it with different sorts of, uh, data, data sources.

00:45:32.140 | So that's rag on Postgres.

00:45:35.140 | So you can get that set up following really similar steps to the, to the other one.

00:45:41.140 | And you can even run that one locally first as well on just on a local Postgres database.

00:45:46.140 | Uh, so here you can run the app locally.

00:45:49.140 | Uh, this one is a little more fancy cause you've got a react front in there.

00:45:53.140 | Then you can deploy to Azure.

00:45:55.140 | You're going to set similar variables and, uh, run it up.

00:45:58.140 | So if you're interested in that, you can start, uh, going through those steps and then you can customize it.

00:46:04.140 | The other kind of rag that we have is rag on documents.

00:46:10.140 | So if you're trying to ask questions about unstructured documents, like you've got a bunch of PDFs or Word docs, Excel files,

00:46:16.140 | uh, anything like that, you can actually put those into a search index and then search that.

00:46:23.140 | So the example we have for that is rag with Azure AI search.

00:46:26.140 | And, uh, it's a really, really full featured example.

00:46:30.140 | We've had it for the last like more than a year now, and we've had thousands of developers deploy with it and put it into production.

00:46:36.140 | And so it's been used for a ton of use cases and it's got a lot of features, uh, speech, voice, vision, user access control, lots of, lots of cool things in it.

00:46:45.140 | Uh, so let me show, that was the one I was actually showing earlier with my blog, right?

00:46:50.140 | So here's, you know, I made a version of it that's just based off my blog posts and, uh, you know, it can cite my blog posts.

00:46:58.140 | I've also got this one here, which is for an internal company handbook, which is a very popular way of using it as well.

00:47:05.140 | And so you can see for each of them, we can, you know, click on the citations and, uh, and yeah.

00:47:11.140 | So now this is a bit more complicated because here we have a multi-page document.

00:47:15.140 | So we've got a 31 page PDF.

00:47:17.140 | We can't just send an entire 31 page PDF to the LLM.

00:47:21.140 | Cause for a lot of our LLMs, it's going to go beyond the context window, right?

00:47:26.140 | A lot of our LLMs have a context window limit.

00:47:28.140 | So typically that's around 8K, 8,000 tokens, uh, can go up to 32K, even 128K we're seeing.

00:47:36.140 | Um, but typically they do have some sort of context window.

00:47:39.140 | And even if they don't have some sort of context window, LLMs can get lost.

00:47:43.140 | If you give them too information, too much information, there's a research paper called lost in the middle, where they did a study to see if they throw too much information.

00:47:50.140 | information at LLM, like at what point it stops paying attention.

00:47:53.140 | So we generally want to send the LLM the most relevant chunks.

00:47:57.140 | So what we do is that we first have this data ingestion phase that will take a PDF or whatever document takes a document.

00:48:05.140 | It extracts all the text from it.

00:48:07.140 | And we do that with Azure document intelligence, which is very good at extracting text from all sorts of documents.

00:48:12.140 | So we extract the text from it.

00:48:14.140 | We chunk up the text into like good sized chunks, usually around 500 tokens each.

00:48:19.140 | Then we store each of those chunks in the search index along with their embeddings.

00:48:25.140 | And that's what we actually search on and send.

00:48:28.140 | And then we send, right?

00:48:29.140 | So if we look at the search results here, we can actually see that the search results are just chunks from the PDF, where we say, here's the chunk.

00:48:36.140 | Here's the embedding.

00:48:37.140 | Here's the embedding.

00:48:38.140 | This is the page it came from.

00:48:40.140 | And this is the file it came from.

00:48:42.140 | And we just send back those chunks.

00:48:45.140 | So this is the most complicated of our architectures because we do have to have that data ingestion phase.

00:48:51.140 | And that means we have to have, you know, a script or a process that does that ingestion stage.

00:48:56.140 | And, you know, here we can do it locally or in the cloud.

00:49:00.140 | So those are the two rag samples.

00:49:04.140 | So we have, you know, we have another 40 minutes.

00:49:07.140 | So, and we have like a good ratio here of helpers to y'all.

00:49:12.140 | So if either of those sound compelling to you, like sound like a use case that you're interested in, then you can try to deploy them now and see how they work.

00:49:24.140 | So once again, you just go to the app templates workshop repo, and you can either pick rag on Postgres or rag with AI search, and then start going through the steps to try it out.

00:49:36.140 | These will take longer to deploy.

00:49:38.140 | So it's good to start to deploy, you know, now, because they take, they got a lot more infrastructure to set up.

00:49:45.140 | And then for the AI search, it's got to do the whole ingestion step.

00:49:48.140 | And that ingestion step takes a certain amount of time as well.

00:49:51.140 | So, yeah, any questions before?

00:49:55.140 | For the ingestion step, are you using any libraries for the jumping and all that stuff?

00:50:00.140 | Yeah, that's a great question.

00:50:01.140 | Are we using libraries?

00:50:02.140 | So when this sample was first created, it was like last April.

00:50:05.140 | It was before there was like really good established libraries.

00:50:09.140 | We kind of use link chain, but not heavily.

00:50:11.140 | So all of ours, it's actually custom coded.

00:50:14.140 | Now, if you're going to use a library, the big thing I would make sure you're doing is using a token based chunker.

00:50:22.140 | A lot of the splitters out there are doing character based splitting, which is probably fine if you're doing English only documents, but we do have lots of international customers.

00:50:31.140 | And as soon as you start doing non English documents, then you really want to do stuff based off of tokens and not characters.

00:50:37.140 | Because imagine you take like a Chinese document and you you'd say like, oh, my chunks are a thousand characters long.

00:50:43.140 | Like that's a lot of tokens.

00:50:44.140 | You can like go over the context window really fast.

00:50:46.140 | So we have token based chunking that we've implemented here.

00:50:50.140 | There are there is token based chunking available in link chain.

00:50:53.140 | So if you're going to use link chain, the thing to do is find my colleague's blog post where he talked about it.

00:51:02.140 | OK, yeah, where can we see, especially if you're doing anything non English.

00:51:06.140 | He basically analyzed all the splitters from link chain to figure out which of them properly worked with token based splitting and with CJK languages in particular.

00:51:20.140 | So we've implemented this ourself.

00:51:23.140 | He actually told my manager Anthony, he worked on it.

00:51:26.140 | But link chain and llama index both do a lot of this stuff.

00:51:32.140 | They just, you know, they take care of behind the scenes.

00:51:35.140 | So what you need is you need the splitting and you can get that from basically from link chain because llama index uses link chain.

00:51:42.140 | So I would just say use use link chain probably with this one so you can specify the chunk size.

00:51:49.140 | And then then you just have to vectorize.

00:51:52.140 | So that that's easy.

00:51:53.140 | You just use the OpenAI SDK and we do the batch embeddings with that so that we can do a bunch at a time.

00:51:59.140 | And then you just store it in AI search.

00:52:01.140 | So the hard part is really the extracting the text.

00:52:05.140 | So there we either use Azure document intelligence in the cloud or we do have some local parsers, too.

00:52:12.140 | If somebody doesn't want to use document intelligence, we use like PI PDF.

00:52:15.140 | We use our own CSV parser because that's straightforward HTML for my blog.

00:52:22.140 | I just use beautiful soup, which is the Python package that does HTML parsing, right?

00:52:26.140 | Because I thought I could do a better job at it.

00:52:28.140 | So this one I just use beautiful soup to extract the text.

00:52:31.140 | So you can do that as well.

00:52:34.140 | And we've got beautiful soup in there.

00:52:35.140 | So, yeah, there is actually a surprising amount of things that we've written ourselves for the AI search repo.

00:52:41.140 | If we were going to do it today, we'd probably use the link chain splitter at least.

00:52:46.140 | Yeah.

00:52:47.140 | Good question.

00:52:48.140 | Sorry, long answer.

00:52:49.140 | Other questions?

00:52:52.140 | So the ABM.

00:52:56.140 | So generally with BICEP, what it does is that it tries to figure out, and BICEP is really compiled down to ARM, and ARM is just JSON.

00:53:24.140 | So what you're actually doing is what's called an ARM-based deployment.

00:53:28.140 | So with ARM-based deployments, what they try to do is figure out what does your resource currently look like?

00:53:33.140 | What are you saying you want it to look like?

00:53:35.140 | And what changes does it need to make happen?

00:53:38.140 | So, yeah, we'll probably switch over to ABM in a lot of our samples.

00:53:44.140 | And we're probably just going to make sure, like, we're trying to make it not have a change.

00:53:48.140 | But if you want it to change, then that's fine.

00:53:50.140 | So you should totally be able to switch between ABM, not ABM, as you decide, as you see fit.

00:53:57.140 | And the important thing is just -- it'll figure out the difference and just make sure you are on board with any changes that come up.

00:54:06.140 | There is, like -- so if you're doing -- there's this AZ deployment command that does what if, and that tells you, like, actually tells you what resources will change.

00:54:16.140 | I want to figure out how we can do that with AZD.

00:54:18.140 | I think AZD maybe has a dry run command.

00:54:21.140 | So that might be what we try when we consider switching to ABM.

00:54:24.140 | Because we want to switch to ABM so that we don't have to maintain our own modules.

00:54:29.140 | But we just want to make sure that we are aware of any configuration changes that could happen.

00:54:34.140 | They're just different things.

00:54:43.140 | Yeah, AZD is a command line tool that, you know, does the Arm-based deployment and also does code deployment, code upload, right?

00:54:53.140 | So I have this azure.yaml here.

00:54:55.140 | I didn't show this.

00:54:56.140 | So azure.yaml says this is the code that you're going to deploy to this host.

00:55:02.140 | So AZD does multiple things.

00:55:04.140 | It does provisioning, which is basically doing an Arm-based deployment, which is equivalent to if you're doing AZ --

00:55:11.140 | AZ Deployment, if you know the Azure CLI, it's this AZ Deployment command.

00:55:17.140 | So it's doing that.

00:55:19.140 | And then it's also doing packaging and code deployment.

00:55:23.140 | So if you've ever done like, I don't know, if you've ever done like Web App Up, that's where you deploy code up to App Service.

00:55:29.140 | AZD will also do that for us.

00:55:31.140 | So AZD is trying to do the whole workflow of you need to provision your resources and you need to deploy your code.

00:55:37.140 | And we're trying to make this central way of doing it across all of our offerings.

00:55:43.140 | Because right now with Azure -- you know Azure -- but we've got like a billion different ways of doing things across all the different things.

00:55:47.140 | And AZD is trying to make a more common way of doing it.

00:55:51.140 | So if you look at my GitHub repo, I'm kind of a huge AZD fangirl.

00:55:57.140 | So you can see all of these repos are all AZD-ified, almost all.

00:56:01.140 | That's what this AZD column is.

00:56:03.140 | Because to me it's the best way to deploy because it's repeatable.

00:56:07.140 | So if you are looking for examples, I have quite a few here.

00:56:13.140 | But yeah, so we're -- you know, we should be able to do it on different hosts, container apps, functions, app service, Kubernetes, et cetera.

00:56:21.140 | And, you know, with all this -- all the different possible bicep.

00:56:25.140 | Other questions?

00:56:26.140 | So what happens after you go to production, like observability and all that?

00:56:37.140 | Oh yeah, great question.

00:56:38.140 | So we do have a -- like generally there's lots of docs under Azure Search OpenAI demo.

00:56:43.140 | So we do actually have a productionizing guide.

00:56:46.140 | You also asked specifically about observability.

00:56:49.140 | We do integrate with Application Insights with OpenTelemetry.

00:56:52.140 | So that's what we use by default.

00:56:54.140 | If you want, you could use LangFuse.

00:56:55.140 | I actually like to use LangFuse.

00:56:57.140 | I don't know if you've seen it, but it's an observability platform.

00:56:59.140 | So you could use LangFuse.

00:57:01.140 | But by default we're using Azure Application Insights with the OpenTelemetry packages to bring everything in there.

00:57:07.140 | But we do have a whole productionizing guide that talks about, you know, how are you going to scale things?

00:57:12.140 | You know, if you need a load balance, your OpenAI capacity.

00:57:17.140 | If you want to do VNet deployment.

00:57:19.140 | If you want to do user auth.

00:57:21.140 | How to do load testing.

00:57:22.140 | So I've run quite a few load tests for this one.

00:57:25.140 | And then how to do evaluation.

00:57:26.140 | So I -- you know, I like to do evaluation.

00:57:29.140 | Everyone should do it.

00:57:30.140 | It's basically like the new form of testing for this world.

00:57:35.140 | Let me see.

00:57:36.140 | I think I closed my evaluation report.

00:57:38.140 | I can open it.

00:57:39.140 | But basically like you want to be running evaluations to see if you are getting quality results from your LLM.

00:57:48.140 | Because a lot of times you might run -- here's the thing.

00:57:50.140 | You know, I show those sample questions all the time and they perform great.

00:57:54.140 | But you cannot trust your sample questions.

00:57:56.140 | And you can't even trust them.

00:57:57.140 | Like you might make a prompt tweak and be like, oh, this prompt tweak was so good.

00:58:00.140 | I'm getting such good results.

00:58:01.140 | You cannot trust it.

00:58:02.140 | You have to run an evaluation across a huge number of samples to make sure that it's actually running.

00:58:09.140 | Like I run it across like 200 samples is probably like the minimum of what you should do.

00:58:13.140 | But you have to run evaluations in order to see -- do you -- I'm assuming you run evaluations for co-pilot chat, right?

00:58:20.140 | Yeah.

00:58:21.140 | How many samples do you run off?

00:58:22.140 | Oh.

00:58:23.140 | It's cool.

00:58:24.140 | You want to come up and like talk about evaluation?

00:58:25.140 | Because you're like -- you're -- I mean, Harold is like running an actual -- because basically you're making co-pilot chat.

00:58:39.140 | Here we go.

00:58:40.140 | Co-pilot chat.

00:58:41.140 | Which you can see I use it a lot.

00:58:44.140 | So you're like running a real rack.

00:58:48.140 | Yeah.

00:58:49.140 | Yeah.

00:58:50.140 | So it's different racks though.

00:58:51.140 | So there's not just one rack.

00:58:52.140 | So if you do add workspace, if you ever try that in co-pilot chat, we actually run a local sparse index.

00:59:01.140 | So that's basically your classic how Google works, just looking up words on the internet and documents.

00:59:07.140 | And it's faster and we can do it locally.

00:59:08.140 | So that will always work.

00:59:10.140 | We also do a semantic index against index that GitHub.com maintains and ranking those in using another

00:59:20.140 | alarm call.

00:59:21.140 | So rack basically becomes a series of indexes.

00:59:23.140 | Like you do Postgres.

00:59:24.140 | Yeah.

00:59:25.140 | You showed that.

00:59:26.140 | You created some keywords up front based on the search.

00:59:29.140 | So that's what we also do locally.

00:59:31.140 | But yeah.

00:59:32.140 | Anytime we have changes, we have one test set that can run on each PR and then a larger test

00:59:39.140 | that we run daily that has a lot more repositories from across different languages.

00:59:45.140 | So you run the evals in the PR?

00:59:48.140 | Some.

00:59:49.140 | Yeah.

00:59:50.140 | So we have a subset that's more unit test driven.

00:59:52.140 | Where it's like, can it answer questions for this?

00:59:54.140 | Does it hit any issues we've seen in the past?

00:59:57.140 | So it's more unit test style where it's like, does it behave as it did before?

01:00:02.140 | So yeah.

01:00:03.140 | It's important though.

01:00:05.140 | I mean, that's the first, the biggest thing we invested on early on because we found it's

01:00:09.140 | so easy to get lost in prompt crafting and assume how reg works and assume how it works in the wild.

01:00:15.140 | So it's TF IDF.

01:00:16.140 | Yeah.

01:00:17.140 | Yeah.

01:00:18.140 | Yeah.

01:00:19.140 | So yeah.

01:00:20.140 | Cool.

01:00:21.140 | My version.

01:00:22.140 | I don't know if you can show your vowels, but here, like I can show vowels for, um, on the Azure AI search.

01:00:34.140 | Um, so let's see.

01:00:35.140 | Um, I'll summary summary.

01:00:36.140 | Okay.

01:00:37.140 | All right.

01:00:38.140 | So we'll look at them for like my blog.

01:00:39.140 | Uh, let's see.

01:00:40.140 | So these are ones I've run before.

01:00:41.140 | Uh, probably Pamela's blog.

01:00:42.140 | Pamela's blog.

01:00:43.140 | Yeah.

01:00:44.140 | Pamela's blog results.

01:00:45.140 | Okay.

01:00:46.140 | All right.

01:00:47.140 | So these are a bunch of valuations that, um, uh, I'm going to show you how this is.

01:00:49.140 | So let's see.

01:00:50.140 | Um, I'll summary summary summary.

01:00:52.140 | Okay.

01:00:53.140 | All right.

01:00:54.140 | So we'll look at them for like my blog.

01:00:55.140 | Uh, let's see.

01:00:56.140 | So these are ones I've run before.

01:00:57.140 | Uh, probably Pamela's blog.

01:00:58.140 | Pamela's blog.

01:00:59.140 | Yeah.

01:01:00.140 | Pamela's blog results.

01:01:01.140 | Okay.

01:01:02.140 | So these are a bunch of valuations that, uh, run fairly recently.

01:01:06.140 | Right.

01:01:07.140 | So, um, with these evaluations, I do GPT metrics and then I also do basically like regular expression

01:01:13.140 | metrics.

01:01:14.140 | How are your metrics usually GPT metrics or code tests?

01:01:21.140 | Okay.

01:01:22.140 | Okay.

01:01:23.140 | So with these GPT metrics, what they're actually doing is, um, sending the original answer,

01:01:29.140 | uh, sending, sending the ground truth answer, uh, which is generated synthetically.

01:01:34.140 | And then also sending the new answer to a LLM and saying, Hey, rate this from one to five.

01:01:40.140 | And then we can see the results.

01:01:42.140 | And this is, this is the actual prompt that gets sent is like, okay, you know, rate this,

01:01:47.140 | you know, from one to five.

01:01:48.140 | Here's some examples.

01:01:49.140 | So we do that for groundedness.

01:01:50.140 | We do that for relevance.

01:01:51.140 | And then I also check whether citations match across ground truth and not ground truth.

01:01:56.140 | And that's actually my favorite metric.

01:01:58.140 | And that's just a regex.

01:01:59.140 | So my favorite is just this one, this, uh, citation match here.

01:02:03.140 | So I'm just making sure that the answer contains the, at least the citations that were in the

01:02:10.140 | ground truth.

01:02:11.140 | So I run metrics like this.

01:02:12.140 | A lot of times I'm looking at retrieval parameters.

01:02:14.140 | Cause for rag, the retrieval is, makes a big difference.

01:02:17.140 | So here I was comparing stuff like, what if I use text only?

01:02:21.140 | What if I do vector only?

01:02:22.140 | What if I do hybrid?

01:02:23.140 | What if I do hybrid with rancor?

01:02:25.140 | Uh, and that's super interesting.

01:02:26.140 | I was trying with different retrieval amounts.

01:02:28.140 | Like if I retrieve five versus 10 versus three, what do I get out?

01:02:32.140 | Uh, what if I change a prompt?

01:02:33.140 | I just say like, I've tried so many like tweaks on our prompt and I've never managed to actually

01:02:38.140 | get improvement in the overall stats.

01:02:40.140 | Uh, so we still haven't ever, ever changed the prompt because I, I haven't really proved that

01:02:45.140 | anything is sufficiently better or I'm just a really bad prompt engineer.

01:02:49.140 | I don't know.

01:02:50.140 | I've none of my prompt engineering ever moves the needle.

01:02:53.140 | For me, the only thing that moves the needle is retrieval parameters.

01:02:56.140 | Like how you're working with your search engine or changing the model entirely.

01:03:01.140 | Changing a GPT four has a big difference than GPT three, five.

01:03:05.140 | Uh, so that should be part of your, uh, you're putting in production for sure is to make sure that

01:03:12.140 | you've got some sort of valuation set up.

01:03:14.140 | Uh, other questions.

01:03:19.140 | Is anyone trying to get one of the rags up?

01:03:24.140 | I'm getting rags up.

01:03:29.140 | I try to like take a lot of things to production, but they were all like early projects.

01:03:34.140 | Yeah.

01:03:35.140 | I had a hard time evaluating which vector store to use and how much to challenge.

01:03:42.140 | Yeah.

01:03:43.140 | I mean, it's hard.

01:03:44.140 | That's part of why I, I do it as well.

01:03:55.140 | I've also run all those on our sample data too.

01:03:57.140 | But I think what I've discovered is really helps to run the evaluations on stuff that, you know, because then like, cause this is the summary. Like you can kind of look at the summary and be like, okay, I guess like things better.

01:04:00.140 | But then what I usually look at is like, I actually look at the changes between, um, between two runs and be like, okay, well, what was the difference between, uh, the baseline?

01:04:17.140 | And then, uh, you know, the, maybe, what was it? Vector only vector, no ranker. Okay. And then I'll just look at things that changed on citation match. Okay.

01:04:30.140 | Okay. So this is what I usually do is I look at the overall stuff and then I look and I compare the answers across my ground truth and the, and the new one with the parameters. And so then I can better reason about it.

01:04:43.140 | But you really have to know your domain in order to be able to evaluate, evaluate your evaluations. Um, but it also helps if other people have run it for you.

01:04:54.140 | So this is a really good blog post from the AI search team that I always reference where they ran massive queries, looking at hybrid search versus vector search versus tech search.

01:05:03.140 | And, um, you know, and they found that hybrid retrieval was semantic ranking outperforms vector only search.

01:05:10.140 | So I ran my own versions of that and, um, and recently, uh, blogged about it, but it's basically the stats that I was just showing where, uh, what I found actually for my use case vector on its own.

01:05:22.140 | Did horribly like really, really, really, really badly. Uh, where is it? Um, so vector only got a groundedness of 2.79, which is really, really low text only got 4.87.

01:05:35.140 | So part of that is because as your AI search is really good at full text search, like incredibly good at it.

01:05:40.140 | It does all spell check, stemming, everything you could imagine. Um, hybrid, which is where you take vector and text and then you merge them using this algorithm called reciprocal rank fusion, which you can actually do this.

01:05:51.140 | See that the algorithm is just this. It's just, uh, you're just doing a little math here to combine, uh, rank scores.

01:05:58.140 | Um, so just a basic hybrid like that, the groundedness is only 3.26.

01:06:02.140 | So you can see hybrid on its own is worse than text only. And that's because vector results can add so much noise.

01:06:09.140 | You accidentally grab the wrong, like distracting things. Uh, what I found actually is like, if I ever accidentally vectorize like an empty string or something close to an empty string is similar to everything.

01:06:22.140 | I don't know what this is about the open AI embedding space, but if you accidentally vectorize an empty string or even like, uh, we have vision as a feature in the Azure open ice search demo.

01:06:32.140 | If I was helping a customer this, this week and they were finding that so many of the results were getting this blank blue page because apparently this blank blue page, the vector for it.

01:06:44.140 | And this is a vector via a different model that as your computer vision model, the vector for it was just matching everything.

01:06:49.140 | So you gotta like, you gotta be really careful with vector spaces. Um, it's so easy to accidentally add noise to them and for there to be distractions.

01:06:58.140 | So hybrid on its own only, you know, got like 3.26. Once I use hybrid with semantic ranker, then I got the best results, but only by a couple percentage points.

01:07:08.140 | Now hybrid of semantic ranker is semantic ranker. That's a feature of Azure AI search, which is actually another machine learning model.

01:07:14.140 | It's a, it's called a cross encoder model, but basically they actually had humans rank results according to queries. They use it for Bing.

01:07:20.140 | So they said, Hey humans, here's 10 search results for a query, rank these from one to 10 and tell us what's the best.

01:07:26.140 | So they train a whole model based off a bunch of human data. And then they get back like this, this, uh, this model that they can then use for any arbitrary, uh, ranking of user query along with results.

01:07:37.140 | So basically hybrid with being ranker gets you the best. But if I was going to have to, like, if I was on a desert island and like I, I could pick between vector and text, uh, I would use text at least for Azure AI search.

01:07:49.140 | It's going to depend how good your full text search, right? If you're doing full text search with like SQL light, which I don't even know if it's the ports.

01:07:55.140 | We'll take search. It's not going to do very well. Yeah.

01:07:58.140 | Um, you, so you're using tipative for your co-pilot chat. You said, right.

01:08:08.140 | For this one, for this, for the at workspace, right? Yeah. Um, for, for Azure AI search, they're using, uh, they're using several things, but they're using one of the things they use is Lucene, which is, um,

01:08:22.140 | a search library. And it's got stuff like spell checking and tokenization and stuff like that. Um, so they're doing a lot, but I guess, and they're also using BM 25, which I think is basically tipative.

01:08:37.140 | Right. Okay. Yeah. So BM 25, that's what you want to look for. Um, is, uh, we got a search result here. Yeah. So if it's, if something is using BM 25, I think that's basically the best full text.

01:08:51.140 | I think that's the best full text right now. So, um, that's what you want to look for is you just want to look for a good full text option.

01:08:57.140 | Yeah. Yeah. It's overwhelming. That's why I love when, you know, people put out research. So we'd be like, okay, great. Cause this also has the optimal, um, chunk size.

01:09:07.140 | That's why I was saying we do 500 tokens because they did the, they did the, the work here and said, okay, the optimal is 512 tokens. Great.

01:09:15.140 | That's what we're going to use. Now, obviously for your particular use case, it can be different, but we can't all run like 20 hundred different tests to see what the optimal, uh, you know, thing is. So it's really nice when people, you know, document what worked well for them.

01:09:29.140 | Cool. Yeah.

01:09:37.140 | Well, we only need to update the vectors if the data changes or if we're changing our embedding model. So if we change our embedding model, we have to update everything to use a new embedding model. Right.

01:09:59.140 | Cause now opening. I has these new embedding models. I need to do some tests with them to see if I can get like better results for them. Um, so in that case I would, I would rerun everything.

01:10:11.140 | So probably what I want to do is set up like an, uh, a separate AI search index, like for this one, which has uses one of the new embedding models, uh, embedding three.

01:10:21.140 | And I have to decide how many dimensions to use, um, and then compare it to see how much better results are. I'm told that generally the results are better, but I'm trying, have you tried any of them?

01:10:33.140 | Oh, you're switching to the new one. What dimension are you going to use?

01:10:37.140 | So you can use small two 56. Wow.

01:10:43.140 | Yeah. You can do five 12 too. Yeah. Yeah. So you can, that's the thing is it's so, it's so many options now.

01:10:50.140 | Yeah. You get a test.

01:10:52.140 | Yeah. Oh, and you can run any V tests. You have customers. Yeah. Um, so yeah, but you're going to have to re-index everything.

01:11:02.140 | Um, so that's, that's when you would, uh, have to update stuff. Is it the content changes or if the model changes? Um,

01:11:12.140 | Um, and then, and then test that. Yeah, I do want to try out the new ones. They should redo this one too.

01:11:21.140 | It's too many decisions. Cool. Any other questions? Harold, do you want to show stuff in, uh, workspace?

01:11:29.140 | Okay. You can close out in this.

01:11:38.140 | Let's just see a rag in action. Um, so if you ask a question in, in, co-pilot chat, so that's the co-pilot chat panel version.

01:11:49.140 | There's also another one that's in line. So if you open up, this is a natural input, uh, we call in line chat in, in the, in your code.

01:11:59.140 | Basically letting you apply code directly or natural language directly to your code, which is always nice.

01:12:04.140 | You don't have to think about the response. You have to think about just, you know what you want and you want to AI to do it for you.

01:12:10.140 | But in, in the side panel, most of the time, what you will run into is this, you're going to run things. Let's pick a function.

01:12:19.140 | These are tests.

01:12:22.140 | So compare the tests. Now I've code selected on the right and on the left. I can ask things about the code I have selected.

01:12:33.140 | That's the surefire way to get good results, have code selected and talk about it.

01:12:38.140 | And you already see that, that we do some magic in our responses. So everything is code highlighted.

01:12:44.140 | So actually you jump to the different aspects, um, that, that are being used and even to dependencies.

01:12:52.140 | So it found that there's a dependency, so you can also jump to that.

01:12:56.140 | So now going back to here, let's, let's see which tests actually are defined in a repository.

01:13:02.140 | And because they're, I want to talk basically about the whole workspace.

01:13:05.140 | And that's why I can't just say which tests are defined or how are benchmarks being run.

01:13:16.140 | So a general question that you would go otherwise to a colleague who hopefully knows this and hopefully they're in the same time zone and they know this.

01:13:23.140 | But now I can actually send this to add workspace.

01:13:26.140 | And that's where we kicking in this, this whole rag agents scheme.

01:13:30.140 | So this repo is probably not indexed on the github.com site.

01:13:34.140 | So if you're in code for enterprise, you will get a semantic index that github keeps updating for you.

01:13:39.140 | They also have a few open source repos indexed.

01:13:42.140 | But in this case, this is all happening now in VS code itself.

01:13:46.140 | So this is mostly sparse indexing.

01:13:48.140 | And actually we see that sparse indexing is usually on par similar to what you see with the text based retrieval that this works really well.

01:13:58.140 | Yeah, so first we do same as you have an Azure search where it does find more words for what you're potentially looking for that are fitting with the repository.

01:14:18.140 | So we also do stemming and that's one that's the first LLM call.

01:14:23.140 | Then the T5DF will find all those results and then we do the re-ranking on top.

01:14:29.140 | So and that actually gets us actually mostly better than doing a full vector search on the same topic.

01:14:36.140 | How do you use the re-ranking?

01:14:38.140 | The different ways.

01:14:40.140 | So that's where we experiment a lot.

01:14:42.140 | But it's another GPT 3.5 call, I think.

01:14:45.140 | Oh, okay.

01:14:46.140 | So use the LLM as a re-ranking?

01:14:48.140 | Yeah.

01:14:49.140 | And you see what's being pulled in.

01:14:52.140 | So these are all the things it found and the chunks it found it in.

01:14:56.140 | So what we do, what you'll see is we actually do semantic chunking.

01:15:01.140 | So for most languages we look at function segments, we look at specific blocks of code and that's where we found the most impact as well.

01:15:10.140 | So people brought up chunking as a big, big area of improvements and that's what we also have in our code that it's the chunking is the biggest impact finger from what you see.

01:15:21.140 | Do you write your semantic chunkers?

01:15:23.140 | Yeah.

01:15:24.140 | Which helps that we have all the languages, the knowledge around the team like Python, right?

01:15:29.140 | So, but yeah, so that's, that's the basics.

01:15:33.140 | And you'll see that it works everywhere.

01:15:36.140 | That's the nice part.

01:15:37.140 | It works locally and it works slightly faster if you already have an online index where we can retrieve the semantic index from in action.

01:15:53.140 | Questions?

01:15:54.140 | What is the dot prompty?

01:15:57.140 | Oh, so dot prompty is a, it's a new prompt format.

01:16:05.140 | You can show the dot prompt you file.

01:16:10.140 | Yeah, all those dot ones, those ones, yeah.

01:16:13.140 | So this was announced at, what's the top, build?

01:16:17.140 | Build, build, yeah.

01:16:18.140 | Yeah, so now it's a build.

01:16:20.140 | Scroll up to the top of it, yeah.

01:16:22.140 | There you go.

01:16:24.140 | So, it's a way of, it's like an artifact for prompt.

01:16:29.140 | Because right now, like, you might store your prompt as a multi-line string variable.

01:16:39.140 | And it's like, we store them in all kinds of formats across the repo.

01:16:42.140 | So this is like a standard way.

01:16:43.140 | So it's actually a Jinja template plus the YAML at the top.

01:16:48.140 | So the YAML describes the metadata of the prompts.

01:16:50.140 | And then the Jinja template, you know, it's a template that you can pass things into.

01:16:54.140 | So this is used by Prompt Flow, but it's also used by Azure AI Studio.

01:17:01.140 | And the goal is, and I think maybe Linkchain might have support for it, now or soon.

01:17:06.140 | But the goal is just to have a common way of representing prompts.

01:17:09.140 | So we'll probably try to use this in more of our steps on the board.

01:17:15.140 | There you go.

01:17:17.140 | Just ask Copilot.

01:17:22.140 | Yeah, so I'm using, this is using the Prompt Flow evals package, which has a bunch more things

01:17:29.140 | as well, other kinds of evaluation.

01:17:32.140 | They actually, I wrote my own CLI as UI on top of this, but they have one too.

01:17:38.140 | Yeah.

01:17:39.140 | Prompt.

01:17:40.140 | Do you run the evals in your CI pipeline somewhere?

01:17:41.140 | Yeah.

01:17:42.140 | If you look at Azure devs on this one, it does actually run them.

01:17:43.140 | So I run them right now.

01:17:44.140 | I'm just running them as a smoke test for this repo.

01:17:46.140 | But you can see what I've done is that I have a target URL.

01:17:49.140 | So that's generally what you'd want to do is you need to run the eval, and then you can

01:18:07.140 | run an eval against your live, or like for you, you're doing a VR build.

01:18:12.140 | So there you want to run it against your VR build.

01:18:15.140 | So the tricky thing is just making sure you have a way of contacting your thing with everything,

01:18:20.140 | all the production setup, all that has your stuff in it.

01:18:25.140 | So yeah, I would ideally have it as a CI step for every one of our repos, and I'm just figuring

01:18:33.140 | out the right way of setting up the target URL and all that stuff.

01:18:37.140 | Especially because most people aren't making public-facing apps.

01:18:41.140 | Most people are either putting it behind user-off or putting it in a v-net.

01:18:45.140 | So we need evaluation flows that both can use your production resources because that's how you know it's working.

01:18:52.140 | But then also work with however your app is deployed.

01:18:57.140 | So I think you can certainly figure out how to set it up for your situation.

01:19:02.140 | I'm still figuring out how to set it up in the general case.

01:19:07.140 | But the thing to keep in mind is that evaluations are slow if you're doing GP metrics, right?

01:19:12.140 | I mean, generally they're slow because all of these calls are slow.

01:19:15.140 | You saw how much time it took to get back a response, right?

01:19:17.140 | So generally they're slow.

01:19:18.140 | They're much slower than the traditional unit tests.

01:19:20.140 | So you do not want to casually run an evaluation.

01:19:23.140 | They're also expensive, especially if you're doing, well, first because the LLM calls happening behind the scenes.

01:19:28.140 | And if you're using GP metrics, because I'm doing all these GP metrics like relevance and readiness,

01:19:32.140 | that's another LLM call.

01:19:34.140 | So you want to have like a higher barrier to running than with normal unit tests, right?

01:19:40.140 | And caching.

01:19:41.140 | Caching?

01:19:42.140 | Oh, you cache?

01:19:43.140 | Yeah.

01:19:44.140 | How do you cache?

01:19:45.140 | How do you know that something hasn't changed?

01:19:46.140 | Based on a prompt and the test.

01:19:49.140 | Yeah.

01:19:50.140 | I guess, yeah.

01:19:51.140 | It's all within one repo.

01:19:52.140 | This one is like a repo that works with other repos.

01:19:54.140 | You don't know if the app has changed behind the scenes.

01:19:56.140 | Yeah.

01:19:57.140 | But yeah.

01:19:59.140 | So caching.

01:20:00.140 | You can do caching.

01:20:02.140 | Cache, what exactly?

01:20:04.140 | So we look at each test and we only rerun them when any of the prompts, when the input basically change.

01:20:09.140 | So if you imagine like an OpenAI proxy that you could set up, if it's the same similar to what they do,

01:20:15.140 | I think OpenAI has like the seed variable, which is basically caching, but they don't tell you.

01:20:21.140 | And it's basically if nothing changes in a prompt, it just sends back the old response.

01:20:26.140 | Oh, so you implement caching in compiler chat, you mean?

01:20:31.140 | Or?

01:20:32.140 | Not in compiler, in our testing infrastructure.

01:20:34.140 | Okay.

01:20:35.140 | Some people also implement caching in the RAG application itself.

01:20:37.140 | Yeah.

01:20:38.140 | I still don't know how often you're going to get the same question.

01:20:43.140 | For tests it helps.

01:20:44.140 | For tests it helps.

01:20:45.140 | For tests it helps.

01:20:46.140 | Yeah.

01:20:47.140 | Yeah.

01:20:48.140 | Yeah.

01:20:49.140 | I haven't clicked.

01:20:50.140 | So can this also, like is this just for OpenAI or can this work with Mestrel and all the

01:20:59.140 | other ones mentioned in the beginning?

01:21:00.140 | Yeah.

01:21:01.140 | Yeah.

01:21:02.140 | I mean mine just, this one I just hit up the URL and get back the answer.

01:21:06.140 | So the URL is just of your deployed app.

01:21:09.140 | Oh, sorry.

01:21:10.140 | I meant like the starter templates.

01:21:12.140 | Oh, yeah.

01:21:13.140 | Good question.

01:21:14.140 | So with the starter templates, right now they're all configured with OpenAI.

01:21:18.140 | And so you can swap out like different OpenAI models so you can do before.

01:21:22.140 | But they don't work with the new non-OpenAI models because we can't necessarily use the OpenAI

01:21:30.140 | SDK with them.

01:21:31.140 | I think there is actually a way to use OpenAI SDK with them, but we're supposed to pretend

01:21:34.140 | we can't.

01:21:35.140 | So there is this new SDK and I haven't, I haven't messed with it yet.

01:21:40.140 | I don't know if you have, but as your AI inference, have you seen it?

01:21:45.140 | I think this is the new unified SDK.

01:21:49.140 | And yeah, so this is, this is what to use for everything that's not OpenAI.

01:21:55.140 | Oh, it says it can even do OpenAI.

01:21:58.140 | So we might have to port to this.

01:22:00.140 | The thing I don't love about this is that this is Azure specific.

01:22:03.140 | Because right now we use the opening SDK, which is like not Azure specific exactly.

01:22:08.140 | And so it works with like a llama and stuff.

01:22:10.140 | I don't know.

01:22:11.140 | So, but we might end up porting for this.

01:22:12.140 | So if we ported to this, then probably we would just, it would just work with everything.

01:22:18.140 | So this is really new.

01:22:19.140 | Like this came out at build.

01:22:20.140 | So we just have to decide whether to port everything up to this, out to this so that we can use all the, all the modules, all the models.

01:22:32.140 | Yeah.

01:22:33.140 | Yeah.

01:22:34.140 | Everything changes all the time.

01:22:38.140 | Yeah.

01:22:39.140 | But we would also need to make the bicep for it.

01:22:41.140 | That's the other thing I haven't done is I haven't, cause I try to set a bicep for everything.

01:22:46.140 | So typically bicep creates your Azure open AI instance.

01:22:49.140 | If you're using mistral or llama, you would want bicep to, you'd probably want bicep to create that as well.

01:22:56.140 | Um, and so that would be different bicep as an addition.

01:23:00.140 | So we'll probably end up adding it.

01:23:02.140 | Cause basically what you do is you go to the issue tracker, you file a request.

01:23:05.140 | And then if enough people ask for it, we're like, okay, guess we're going to do it then.

01:23:10.140 | We can.

01:23:11.140 | Yeah.

01:23:12.140 | Yeah.

01:23:13.140 | Um, but that's how we figure out what, you know, what it is that people are looking for.

01:23:17.140 | Cause it is really nice to be able to swap out models.

01:23:19.140 | Cause right now all of the samples do work with OLAMA.

01:23:21.140 | So if you have like, uh, OLAMA running locally, here's my little OLAMA up there.

01:23:26.140 | Um, you know, you can run like five, three and stuff like that.

01:23:29.140 | Uh, you just go, you know, you go to your terminal and you're like OLAMA.

01:23:32.140 | Is this it?

01:23:33.140 | I think, I don't know if I typed by three correct, but, um, and so they do all run with, uh, OLAMA things,

01:23:42.140 | but none of the OLAMA models have really been sufficient for RAG in my experience.

01:23:47.140 | Like I, I run them just to check, but they, they all fail to get, um, follow directions.

01:23:53.140 | In my experience.

01:23:54.140 | Uh, cause I just think they're not, they don't have enough parameters.

01:23:56.140 | Like these are like three B seven B et cetera.

01:23:59.140 | So, um, they don't provide citations correctly.

01:24:02.140 | I don't know.

01:24:03.140 | Have you had more success with like five, three mini.

01:24:06.140 | You see every model, of course, prompt changes.

01:24:08.140 | Yeah.

01:24:09.140 | I'm bad at prompt engineering.

01:24:10.140 | Yeah.

01:24:11.140 | So, so out of the gate, like I haven't had success using any of the small language models for RAG.

01:24:18.140 | I'm sure the big versions of them would work much better.

01:24:21.140 | So I do want to try out like the 70 B I've done, I've done up to seven B cause that's like how far I can go up locally.

01:24:29.140 | I can't go much more than that just from space reasons.

01:24:32.140 | So for them, what happens is that like they'll answer the questions fine.

01:24:36.140 | The issue is that we need citations to be in good format.

01:24:39.140 | Cause these are actually come back as bracketed square brackets.

01:24:42.140 | And they just don't reliably come back with square bracketed citations, which doesn't sound like a big deal, but like we're trying to make clickable citations here.

01:24:50.140 | Uh, so that's the issue I've had is that I, I think they're fine at synthesizing the information, but they don't follow the syntax directions in terms of the citations.

01:24:59.140 | Um, and they're kind of more, maybe more likely to, uh, make stuff up if I ask an off topic question.

01:25:06.140 | That's been my experience there.

01:25:08.140 | Um, yeah.

01:25:09.140 | Good for re-ranking.

01:25:10.140 | I think finding.

01:25:11.140 | Oh, yeah.

01:25:12.140 | Or like the GPT judging.

01:25:15.140 | I think that that's where like finding that this one thing, maybe not like the expert full answers that, that follows the format, but like one of the smaller paths.

01:25:26.140 | But you can't, so most of them don't support function calling off the bat.

01:25:30.140 | So you're doing, would you do re-ranking with just a simple, you'd have to figure out what syntax they come back with.

01:25:36.140 | But they're all good at coding.

01:25:37.140 | Yeah, that's right.

01:25:38.140 | That's true.

01:25:39.140 | Everyone's good at coding.

01:25:40.140 | If you can turn something into a coding task, you're good.

01:25:43.140 | Yeah.

01:25:44.140 | That's another form of rag is like, uh, I was telling some last time, like, uh, these are, these are all like doing like kind of rag on just a few documents at a time.

01:25:54.140 | If you're trying to like analyze a whole database or like a huge number of documents, then you really want to like actually, uh, use like a SQL query, like with like aggregate functions or do like a pandas query.

01:26:05.140 | Right.

01:26:06.140 | So at PyCon, we did a demo where you like upload a CSV and then you, you say like, oh, I want to count the top restaurants in it.

01:26:13.140 | And then it just comes up with the pandas code and then it runs the pandas code in a sandbox environment.

01:26:19.140 | So that's, that's another like increasingly common, uh, form of rag where if you want to like come up with insights and analysis and, and that sort of thing, then you want to consider a different architecture where you're actually going to have the LLM generate pandas code or SQL code.

01:26:35.140 | It's very good at both of those and then run those in a safe way.

01:26:39.140 | Yeah.

01:27:00.140 | Are you actually using type chat?

01:27:02.140 | Cause that's basically what type chat does.

01:27:04.140 | Okay.

01:27:06.140 | Okay.

01:27:07.140 | So yeah, what you're describing, that's the same, the same approach.

01:27:10.140 | Yeah.

01:27:11.140 | So that would be, yeah, we, we're trying, um, I know Daniel's actually experimenting with type, Daniel's the creator of type chat.

01:27:21.140 | He is experimenting with type chat with the local models.

01:27:24.140 | Um, so try it.

01:27:26.140 | Cause we tried, we did also try type chat with five, three locally to see if we could use it instead of function calling with open AI.