Build, Evaluate and Deploy a RAG-Based Retail Copilot with Azure AI: Cedric Vidal and David Smith

This is the workshop on developing a production level RAG workflow so you're in the right place if you want to learn how to build the backend for a chat application that works also off of open AI and builds its answers based on information that we draw from databases and vector databases we'll see all about that in this presentation today my name is David Smith I'm a principal AI advocate at Microsoft I've been with Microsoft for about eight years now after my startup was acquired back in the big data space and I've been at Microsoft ever since my background is in data science also did a lot of work as as a statistician and these days I'm a specialist in AI engineering and I have with me today two other members from Microsoft that are also specialists in AI engineering first I'd like to introduce Cedric Vidal who's on my team in AI advocacy you wouldn't mind introduce yourself Cedric hello everyone so like David said I'm a Cedric Vidal I'm a principal AI advocate at Microsoft and I I have a background in AI said driving cars software design architectures and everything in between I've been working in space for 20 years and today I'm gonna help David with the workshop welcome everyone thank you Cedric and we also got Miguel Martinez come all the way from Houston is a technical specialist at Microsoft we go tell the crowd a little bit about yourself absolutely hello everyone welcome today my name is Miguel Martinez I am a senior technical specialist for data and AI at Microsoft so a lot of our clients you know this can be startups businesses they hear about open AI and chat GPT and all of those things so they think about well how can I actually use it for my business how can I use those tools to drive business value and that's where me and my team come in and we help all of our clients develop some new solutions to drive that value all right well Cedric and Vidal will be here after the next two hours helping you as you go through this workshop so they'll be wandering around if you guys want to head out and if you need any help during the workshop raise your hand and one of the three of us will come up and help you so thanks guys all right let's jump right in if you would like to get started you can use that URL on your screen right there just pop up a browser on your laptop I'll give you more information about what's going on when you get there but you should be able to follow along and get started if you like now to participate in this workshop it is going to be hands-on so you will need to have your own laptop and I can see looks like everybody does have their own laptop that's great this is not something you'll be able to do on your phone or on a tablet because there's lots of lots of work that we'll be going through as we work through a github repository and that is the second thing that you'll need to have to participate in this workshop is a github account if you don't yet have a github account please go ahead now to github.com/signup and create yourself a brand new account we'll be using the github code spaces feature to provide our development environment but the free github code spaces is more than sufficient for the work that we'll be doing here today we are going to be building an application in the Azure cloud but you do not need to have an Azure account for this workshop we're going to provide you with a login to an Azure account and we will have already set up all the resources that you need to work with this application and that specifically is things like Azure AI search the vector database cosmos DB the database which we're going to be using for our customer information connections to open AI which we're going to be using for our LLM and various other resources and tools that we'll be using in Azure you can do all of this on your own if you do have your own Azure accounts and are willing to spend a few dollars in credits to run the resources over a few hours for this resource by the end of this workshop you will have excuse me you will have all of the information code data and everything you need to recreate everything that i show you here today one of the first things that we will do in fact is to clone to fork in fact a repository into your own github account account and everything you need will be right there in there all right but if you would like to run through this at home you will need an Azure account and you can create one at the link you see on the screen right there so before we jump in and i start giving you some demos about how this works let me orient you a little bit that link that i gave you on the last slide will have launched you into a virtual machine it is a windows virtual machine but we're not going to be using windows at all in fact the only thing we're going to be using this virtual machine for is the instructions for the workshop which you can see on the right hand side of the screen there in front of you you'll be going through that page after page this instruction workshop feature has some nice tools you can use for example wherever you see the green text you can click on the green text and it will paste that directly into your browser that'll save you some time when you're entering in the passwords and urls and things like that and so forth i will mention that we are in a virtual machine environment it looks like the wi-fi is pretty good here but if you find that the virtual machine is slow or there's lots of lagginess because of the wi-fi or if you just prefer to use your own browser on your own laptop you can totally do that you can just open up a browser and follow all the same set of instructions the only difference is you'll have to open up that virtual machine every now and again to look at those instructions and you'll have to manually cut and paste the green parts from the guide into your own browser so it's your choice about which direction you want to go there when we actually come to running shell commands and things like that the dev environment that we're going to have you use is github code spaces you could totally run all this on your local desktop but we've got to make sure you've got vs code installed you've got to have all the right python libraries installed etc etc so to make things easy here we're just asking everybody to go straight into github code spaces where everything is set up and you can do exactly the same process for yourself at home if you're not familiar with github or not familiar with github code spaces just pop your hand up when we get started we'll come and chat to you and talk to you about how that works and what's going on there on there but if you have used github code spaces before it should be pretty straightforward right there all right so this is what we're going to do today we are going to build this app right here or at least we're going to build the back end to this app and have you interact with that back end through a little ui that you'll be working through if you do want to build the front end to this as well we do provide the code for the front end and all the data that's also linked from the github repository so you'll have have access to that but i'm going to give you a little demo about this website just to set the scene so let me go over to my browser here i think i have the website open so the the um the sort of the the the idea here is that we are engineers working for a retail company you know something like rei that sells camping equipment and backpacks and trail hiking shoes things like that they already have a website which you can scroll through and you can see the products that they have available it's pretty limited selection just to keep things simple and we can click through to see product information for each of the products that this company sells so for these these uh trailblaze hiking pants we've got lots of details about the features there are some roof reviews faqs there's a return policy some cautions technical specifications user guide care maintenance lots and lots of information available to the customer about all the products that are available at this store in addition this storefront has a customer login feature in this particular example the customer sarah lee is already logged into this system what we're going to be building here is a chat bot that operates on this website you'll be able to access that click on the chat bot button and ask the question for example what can you do and this is going to connect to the system that we're going to build to get an answer to that question this is a pretty simple question it's coming straight from the llm and its context to tell it that as as an ai agent i can provide you with information about our products to help you with your purchases so on and so forth now we can also ask questions about specific products for for example i'm going to pop up my paste here let's try one like here we go what is a good tent that goes with um the trail walker shoes okay so in this case it's actually consulting all the information product information that's available in that website to formulate that that formulate that answer with the llm and it comes up with for your upcoming trip to andalusia i recommend pairing your trail walker hiring shoes with the trail master x4 tent we're also going to provide the llm with information about the customer themselves their name their status in the loyalty program where they live and their order history so it's able to answer questions like this what have i already purchased from you so in this case the llm is able to consult the customer's purchase history and give back the answer to say that sarah lee a value customer who's purchased the cozy night sleeping bag the trailblazer hiking tent and so on and so forth so this is the system that you'll be building on the back end so that that chat bot can use an llm like open ai to answer those kinds of questions based on the customer's purchase history and the products available from this particular retail store any questions on that so far very straightforward okay great all right we are going to be building this on azure and using various azure resources which we will provide to you to do that we're going to be using the azure ai studio platform to manage the large language models and the various resources that we're working with we're going to be using azure ai services which are going to provide our various tools to us but in particular the open ai models that the chat bot is going to use to generate its responses we'll be building an azure ai project which is going to manage all the flows that we have to to have the chat bot gather its information and generate its responses the product database is going to be stored in azure ai search and as an ai search is a vector database or at least it has a vector database feature which we're going to be using to match the customer's questions to the nearest or the most relevant products that are related to those customer questions and using that to provide context to the chat bot and lastly the customer information is going to be stored in a database regular rows and columns type of database relational database where we will extract the customer's information and order history okay so let's have a look at the architecture we're building for the back end in a little bit more detail customer types of question into the chat bot on the website and that gets extracted out and sent to an endpoint just as a simple string along with the customer's chat history as well that question gets sent to azure open ai where it will get embedded into a vector format if you're not familiar with embedding it's basically the idea of converting a piece of text piece of string into a point in multi-dimensional space in such a way that the other points in multi-dimensional space we have already defined by embedding the product information pages the ones that are closest to the question are the products that are most relevant to that and we'll use that to extract out the most relevant products and push them into the context of the large language model at the same time we'll also extract out information about the customer and their purchase history from cosmos db so we do the embedding do the search in azure search to get the product information we extract out the customer information with cosmos db we feed all that information into a prompt so we create a large prompt with information about the problem to solve you have the customer information about our products information about the relevant products from the azure ai vector search information about the product's customer history and then generating a response based on the customer's question any questions so far all right we'll get right into it just shortly i'll come into all this a little bit later on but we're going to be using a tool within azure ai search called prompt flow i'll come back into the details a little bit later on but what we're going to be doing is taking that state that same retrieval augmented generation process that i just outlined to you the steps are to take the question retrieve related data which in this case is our product data augment the prop for that information that's come from the knowledge base generate a response with the large language model and then send the results back to the user we'll be creating a version of that retrieval automatic generation flow using prompt flow which is a development tool that's in azure ai studio to streamline this process of putting all that data together we take the inputs we embed it we retrieve information from ai search we look up customer information for the database put all that together with the customer prompt to generate the response and that's what the customer sees in the website so that's what you will be doing so with that let me go back to the start here has anybody not yet been able to log in to that website if you anybody having trouble if so cedric and miguel will help you guys up let the password will come up in just a second all right so let me actually show you that okay so i've gone to that website and you should have come to a page just like like this we can go ahead and launch the virtual machine so on the right hand side of the virtual machine is the instructions the bottom right hand corner is the next button that'll take you to the next page of the instructions and that is the page where you'll find the password for the windows machine which is curiously password and from then you should be able to log into the windows virtual machine right there the next step after that and i can actually go to the next step here is to open up the browser i'm doing this directly within the virtual machine we're going to browse to a particular github repository that we provided to you for this workshop you can just click on the link there and have it typed into the browser in the virtual machine or you can cut and paste it into your own browser as okay you'll need to log in to get up at this point just a slight warning is that if you use single sign-on this happens to us at microsoft we use signal sign-on to access github that does not work through the virtual machine so i'm going to show you actually doing that directly from the browser where i've already logged in to github there and we're going to do code spaces it looks like i've actually already started a code space here so let me just go ahead the instructions will tell you to launch a code space oh i forgot one very important step before you launch any code spaces we're going to switch to a different branch rather than doing the main branch which is the one that you'll use if you want to do this at home we've got a special branch here for this lab it's called ms build dash lab 322 the only difference is this version of the lab skips all the deployment instructions because we've done that for you already and then from that from that branch of the repository we're going to launch a code space on that specific branch i'll show you that in just a moment it's a launch one that i already have here this is what happens once code space is set up takes a couple of minutes for code spaces to warm up warm up this one should already be warm and what we have here in case you've never seen this before is basically an instance of visual studio code running in the github cloud directly in your browser so it's the same user interface as visual studio code but it's just running in your browser if you're experienced using code spaces you probably know that you can also run this directly from within visual studio code in your desktop and connect to that instance if you've got questions about that happy to show you one of the things that we are going to be doing with code spaces and it looks like i've already closed the terminal is logging in to your azure account at the terminal and the instructions to that if you've been following along you might be ahead of me already is to log in to um the portal and actually do this directly within the the local browser in this case okay this is the username we provided to you for this temporary azure account it'll only exist um for the duration of this workshop and everything will be deleted once we're finished one thing you'll probably find is because we're in a brand new windows virtual machine it thinks you're a brand new user to windows it thinks you're a brand new user to azure keeps on popping up all these hints if you're familiar with azure you can just delete them if you'd like to do an introduction to azure and you can do it here or you could do it at home and the trick is to get back to the starting place is to go to the home button there and then i won't go through these steps on the screen for you but you'll be following through the steps on here to have a look at the resource groups we have available also the resource group we provided to you in azure and you'll be able to inspect all the resources that we've already deployed for you that you'll be working with lastly i'll just show you this last part is when we log into visual studio code let me show an example of copying sorry i don't normally do this way there we go where's my copy button i'm not using my usual laptop here all right so what i will do this might be a neat trick is just paste it directly up here and copy that back into my other visual studio window allow there we go all right it's usually easier than that just to show you that process of how you're going to connect your visual studio code instance in github code spaces with your az account is with this az login after a moment it will give me an eight digit code which i'll copy and i'll go to this device login page to actually log into azure and now all the commands that i run within the visual studio code command line terminal will run against the azure account that we've set up for you so that's kind of the main things to watch out for as you get started just continue to work with you through the instructions as you go cedric miguel and i'll be wandering around to help you and put your hand up if you get stuck and in about 20 minutes or so i'll jump in and give you a bit more context about the prompt flow that we're going to be able to find out for you so that's what's going to be able to find out for you so that's what's going to be able to find out for you so that's what's going to be able to find out for you so that's what's going to be able to find out for you so that's what's going to be able to find out for you so that's what's going to be able to find out for you so that's what's going to be able to find out for you so yeah yeah yeah yeah yeah yeah yeah yeah yeah yeah so if you're already logged into github it'll just continue to use that login yeah yeah yeah before you do that though yeah yeah yeah yeah it looks like you're logged in yeah yeah yeah as long as you're logged in you're good yeah yeah and when you forked it did you um did you uncheck this copy of the main branch only uh oh i haven't forked it yet okay oh good yep yeah oh wait the this is what i got to fork then the contoso chat yep yep so click the fork button first of all yeah which is up here yeah and then and be sure to uncheck that box so you and the nice thing is now whatever github account you're using you'll have this in that account right to work from when you get home oh yeah yeah yeah i love that yep i mean the setup's awesome i love that github and that's everything yeah there we go okay so for the repo we just did that like a second yep yep just telling you that you just skipped ahead yeah oh i did right yeah it does it better um oh i gotta go to that branch yep uh the mslav fourth or whatever yep that one go to code and then code spaces and then code spaces i didn't try code space and you never tried it before yeah so click on the code spaces tab up there yeah you can see that i haven't seen it yeah yeah and then click create code space yeah so that's available because of my like private account yep yeah yeah it's um it's limited in how much you can use like it's 20 hours worth of computer stuff but you know those kinds of things but it's a nice consistent environment and what we can do on our end which we do for all of our samples is we make sure that when you launch code spaces you get a command line that has everything installed that you need to go so there's no right yeah uh and then uh yep yep yep yes it's doing it yep so it's gonna take probably 60 seconds or so to do things so what we actually recommend is you go to the next step and then come back to it later yeah yep all right i'll let you go to it uh...

so i mean it's like you gave me an azure account yep yep yep and there's the username and password yeah and i was able to log in oh wait it's right here yep so i had all the resources exactly i'll deploy for you yeah cause some of them take a little while to provision so it's it's for these workshop environments it's easy for people to be able to jump straight in okay um and uh none of this is ai studio though uh one of them will be you should have eleven resources okay so you got an azure ai project so that's an ai studio project and then yep and the next step we can actually log into ai studio okay uh...

i'm not familiar i think you know you gotta you will have to wait for something you should be you should be you should be you should be you should be available you should be you should be you should be you should be you should be you should be you should be you should be you should be you should be you should be you should be you should be you should be you should be you should be you should be you should be you should be you should be uh...

nope there you go yeah yeah yeah yeah yeah Thank you. Thank you. Everything going all right so far? Cool. It should be the second tab across, I think. Yep, that one is code spaces, the one before. This tab right here where it says preview reading, that's code spaces right now.

So it's showing you the contents of the repository with a command line connected to that file system. I'm so used to using, sorry, oh I don't do that, I'm sorry, I'm so used to just using VS code, I don't even, sorry, so it doesn't seem like you are inside the, yeah, I'm not.

I don't know, I don't know, I don't know what happened, it used to have all the iPhones and it used to have all the, it looked, it had showed I explore and everything else, right? I think it just stopped but, yeah, I did it earlier so I think it just wiped itself or something.

It's really odd because I did that earlier. Okay, it's okay. No, I had it earlier so. Incorrect. Yeah, I did that the first time. I had it actually, hmm, like it, hmm, Apple, but I really logged into this like ages ago so that's the reason why it's not loading.

Potentially. Potentially. Potentially. Potentially. Potentially. Potentially. Here we go. Now you can open the browser. Oh, I did that earlier. I don't know why it wiped everything. Yeah, and then I went. Because, you know, the tab that you opened outside or no, that's okay. Oh, no, that's okay. Okay, okay.

Yeah, that's okay. I'm just saying, I did this earlier. Okay. Yeah, so that's why I'm thinking that I already have the, the, the, the, the, the, Yeah, I guess you need to do it again. I'm sorry. I said you should be able to just click on it. It should tell it for you.

You don't have to click on it. Oh, cool. Well, no. The reason I'm saying is because I had this load. I had everything else load. The only thing I stopped at was after I did this. No, I understand that. I'm just saying your stuff is buggy because it shouldn't not work across windows.

Say that again? Oh, no, no. I'm just catching up to you guys. Yeah. I'm just logging into GitHub right now. Yeah, that's where we are. There we go. Just to avoid confusion, you might want to close the other tabs. Otherwise, it's confusing. No, that's why I'm confusing it. Like, these are other testing.

The issue. The issue I'm having is you're saying this thing should be . Okay. Literally just kind of . Yeah, and this will, I already have that. So, after that, what I'm sorry. Okay. What I'm sorry, what I'm sorry, what I'm sorry. What I'm sorry, what I'm sorry, what I'm sorry, what I'm sorry.

What I'm sorry, what I'm sorry, what I'm sorry, what I'm sorry, what I'm sorry. Okay, I got it. That's what I was trying to do. Got it. So, copy and then join me. Yeah. So, that's why I was saying that . Oh, because I didn't lose this . Got it, got it, got it, got it.

Okay. That's what's confusing. Yes. No, you could just open the one you had already created. Here. Yes. Yes. Yes. Yeah, that worked. That's mine. The issue I have is . It takes about 60 seconds sometimes for the code to pop up. Yeah, yeah. And you do need to hit enter in the terminal afterwards as well to confirm.

Yeah. Yeah, I'll take a look. Yes, in this browser. And then open the portal. Basically redo what you've done, but do it inside the VM instead of your macOS browser. Is what I'm saying. Yeah. Yeah. Yeah. Yeah. Yeah. Okay. Yeah, we've had. Azure domain. Yeah. We haven't used Azure AI Studio.

I think that's the reason. Because it's been like a big data breach. Okay. Can you, yeah. Oh, you tried that by the way. Okay. Thank you. Yeah. Yeah. Yeah. Yeah. So, can you, yeah. Yes. It's, uh. Oh, I said, like, copy that. Yeah. Yeah. Yeah. That's right. You're connected and you can close that tab now, you won't need that one again.

And just press Enter on that. So, right now it's going to create the attribute table and the data. Yeah, now you should be good. Yeah, but except you are in your macros. I mean, yes, but inside the VM browser. Yeah, but you have already the cut space opened here.

Let me show you where you'll get to once you get to about step seven. I've gone through the steps of logging into GitHub, cloning the repository, launching code spaces, logging into the Azure portal and AI studio from the browser, and also logging into the Azure portal. Azure from the visual studio from the visual studio command line.

So, they can run some commands, and I've run this post provision command here. You're welcome to have a look at what that script does. But let me actually show you. What we have done is, if you go to the home page of the Azure portal, you can also do this from the Azure CLI if you prefer.

You can have a look at the Azure CLI if you prefer. You can have a look at the list of resources we've deployed into this single resource resource group for you called Contoso Chat RG. And you can have a look at the list of resources we've deployed into this single resource group for you called Contoso Chat RG.

And you can have a look at the list of resources we've deployed into this single resource group for you called Contoso Chat RG. And you can see that there's 11 resources that we have launched for you. What that script did was to populate some data into Cosmos DB and to Azure AI Search.

Let's have a look first at Cosmoso DB, which is the Azure Cosmos DB account. And here in the portal, I can go into and have a look at the data explorer. Okay, there's a video to watch if you're interested. I'm not. Okay, and what you can see here is now within our Contoso Outdoor Cosmos database, we have a customer's database.

And you can actually drill in there if you're familiar with using databases and have a look at the tables. There's about 12 customers in the table and/or their purchase history. Likewise, if I go back to the resource group and have a look at the Azure AI Search research, which is called Search Service.

You can go to the indexes in the sidebar under Search Management. By the way, if you can't see this sidebar, it happens if you're using a very small laptop screen or if you've got a large font. It might be hidden behind this hamburger menu here in the corner. Go to indexes.

Oh, that's a problem. I've got no indexes found. So I must have missed a step. Hopefully, you've got indexes there. And if not, we'll come back and check. I don't think I went through all the steps to actually pre-provision things there. So we'll go back and have a look at that again.

Okay. I just want to give you a preview of where you're coming to. Again, always any questions, just pop your hands up. Did I? Yeah, probably. I skipped ahead a bunch of stuff. Oh, there we go. A bunch of it. Did you get errors, too? Yeah, I got errors as well.

Oh, okay. Permission errors. Let's see if I can figure what's going on here. Yeah, yeah. Okay. Right. Yes. I wonder why that happened. Oh, you know what? I know. I skipped a step. I know which step I skipped. I skipped. We have one good question from the audience. Yes.

What's the question? So I was just looking at the prompt flow example on step, sort of end of step seven, where we're looking at the graph of the pre-existing prompt flow that's been created. One of the questions that I had is, can the graph also be cyclical? Can the graph be cyclical?

No. It's a directed acyclical graph. And that is, because I don't think there's any support for any kind of looping like that. So, yeah. Is there a reason why you'd want it to be cyclical? Yeah. For example, React agents? Yeah. Well, like, within a single component of that prompt flow is anything you want.

It could be any Python code. So if you need to do that kind of interactivity within one of those nodes, you can do that. But the flow itself is acyclic. Thank you very much. Yeah. It worked on the second time. Yeah. I realized the step that I skipped, because I was heading ahead, was I didn't do the bit where we create the .env file.

So let me check. Yeah. I'm trying to remember which step that was. Done that one. All right. This is the one I didn't do. And, yeah. Yeah. I skipped this step. Oops. Click too late. You're late. Hm? Okay. Hm? Okay. Oh. What is that now? Yeah. Did I log into this one?

Let's make sure I did. I'm going to go to the next one. I'm going to go to the next one. I'm going to go to the next one. I'm going to go to the next one. I'm going to go to the next one. I'm going to go to the next one.

I'm going to go to the next one. I'm going to go to the next one. I'm going to go to the next one. I'm going to go to the next one. I'm going to go to the next one. I'm going to go to the next one. I'm going to go to the next one.

I'm going to go to the next one. I'm going to go to the next one. We have another good question here. Sure. Go ahead. Again, about the Prompt Flow DAG. This time, there's a lot of power in the DAG description that we can build. One thing that we'd like to explore, that we're currently exploring with a couple of our clients is how to hit these kinds of DAGs from Enterprise to GPT to use the Actions Framework and then go to a middleware layer where a DAG resides.

What I'd like to understand is, what are our options to expose this as an endpoint to external sources other than our web app? That's actually going to be about step 11 or 12. We're actually going to deploy it as an endpoint and then connect it to the website. Let's go to the next one.

Let's go to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one.

Let's go to the next one. Let's go to the next one. Let's go. Let's go to the next one. Let's go to the next one. Let's go to the next one. Okay. I figured out what I did wrong. I completely skipped step six, which is an important step. You can confirm if you've done step six by having a look in your Explorer for Visual Studio Code.

Once you've done that, you should have a file called .env, and that is where we've set up all of the endpoints and keys that you will need to access the resources that we provided for you. And you'll also have a config.json file, which does similar kinds of things for AI Studio.

And then once that is all set up, I should be able to go back and run this pre-provision script. to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one.

Let's go to the next one. Let's go to the next one. Let's go to the next one. Are you in the right directory? It looks all right. I think it's because I ran the script already once. I'll just have to do it again. Let's go to the next one.

Let's go to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one. Let's go to the next one. Yeah, that's right. So one of the things we did in that pre-provision script was to take each of the markdown files, which are in the repository.

There's one markdown file per product that the company sells. And then we just script indexing that into the AI search database, which essentially converts that entire markdown file into one point in multidimensional space. It is actually chunking it, yeah. Actually, no. Actually, in this example, we don't chunk it, just for simplicity.

If you actually do it through AI Studio on the search on your own data, it does do the chunking there. Yeah. Yeah. Mm-hmm. Yes. Question. While we're getting going, how does PromptFlow, Autogen, and Semantic kernel all come together? Are they all competing projects at Microsoft, or are they...? Not so much competing, and they have slightly different versions.

Yeah. Let's start with Semantic kernel and Autogen. Both of those are orchestrators. They come from different parts of Microsoft. Semantic kernel we kind of view as the enterprise product. that's the one that is designed for user enterprise settings, has strict versioning, you know, API changes, all those kinds of stuff.

But like Langchain and other orchestrators, you can use it to connect different tools together in different environments. Autogen serves a similar kind of purpose, but it comes out of Microsoft Research. So it's a little bit more cutting edge. It's a little bit more flexible, based on a slightly different paradigm.

But that's not the one that we recommend for enterprise applications today. PromptFlow is a different beast again. PromptFlow is directly within the AI Studio product, and it's purely for orchestrating within an endpoint that you deploy through the AI Studio product. One endpoint. One endpoint. The whole purpose is to create one endpoint that goes through, in this case, a RAG process.

But it's designed to be more flexible than just for RAG. You could use semantic kernel to manage multiple PromptFlows as endpoints. So, yes. And also, you can embed PromptFlow as a library. Deploying it as an endpoint is one possibility. It can be used to write integration tests, evaluation tests, embed in your application.

Like, if you have a Python native application and you want to embed a DAG in it, it's also kind of interesting. Is there anything like PromptFlow that sits on top of semantic kernel to manage multiple endpoints? Or just the graphics side of it. Or just the graphics side of it.

Or just the graphics side of it. Or just the graphics side of it. I don't think so. And honestly, when you work with PromptFlow for any length of time, you'll be working with the YAML files. It's nice to have that picture. It's great for these workshops, because I can point to things and show you how these connect.

And it's great for debugging, because you can actually see how the data is flowing through the thing. In terms of an editing environment, that's not really what it's for. Yeah. We also have the Python DSL version now. In addition to the graphical YAML-based DAG, now you can also use PromptFlow in a more programmatic, code-first way.

Where you write the PromptFlow configuration in Python instead of graphically. As an alternative. Yeah. To be honest, I'm not sure how generally available it is. I'm sorry. To be verified. But ultimately, this is the representation of the PromptFlow. It's a YAML file, which just defines each node with a bunch of tags associated with its inputs and outputs, and how it connects to the various endpoints and the types of nodes that we provide.

Excuse me. In AI Studio. If you're running that as like a Python library, at that point it's fairly similar to using Langchain in Python? Exactly. Yeah. Yeah. So there's a command line, which is what we're doing in this thing, is to run that PromptFlow with a given set of inputs to generate the outputs.

Correct. Which provides a more Longchain-like experience. Right. Right. Right. Cool. Autogen could, in theory, do similar stuff, but it's more of a labs thing, not ready for prime time. Yeah. Like David was saying, the team at Microsoft that works on both projects are very different, and they have very different goals.

PromptFlow and I'm blanking. Semantine-Kernel are very product-oriented, so they follow very strict software release lifecycle, whereas the other team, the Autogen team, is really research. So they try the latest cutting-edge AI things. They also use that for papers. So if you use Autogen, you're literally on the bleeding edge, and things might break.

And things are experimental. So you might want to do that. It might work for your application, but you need to know where you're going into. Yeah. We had used it for a hackathon, and we were able to put together an agentic-type flow really quickly with it, so we liked it, but it's good to know that we need to be a little careful with it.

Yeah. Definitely. Okay. Thank you. So semantic-kernel is more for the orchestration, and this could be more closer to the app layer, where I'm building a back-end, where I'd be using semantic-kernel to orchestrate the LLM, and the calls on that side. Prompt flow is more on, I want to build a DAG, manage its lifecycle, potentially evaluation.

Hopefully we'll get to that in the workshop. Yeah. And then deploy it as one endpoint. Exactly. Is that the right way to look at things? Sure. And like I was saying, you can also embed Prompt flow as a framework, as a library. You don't have to deploy it as an endpoint.

But that's the use case we'll be using in this one, as an endpoint. And to make things even more confusing, sorry, semantic-kernel, you can also build agentic applications with it. It also has some of Autogen's use cases. Simple ones, but you can build agentic applications with it. Personally, I like to use it that way, because you're in development.

It makes things easier to write tests, to write snippets of code that you can reuse. So this is very convenient for that. So it's more like development convenience rather than anything else. If you wanted to build, I don't know, like a rich Python desktop app where you happen to be wanting to orchestrate LLMs, that would also be a relevant use case.

I would say-- I can jump in there, Cedric. Prompt flow you'll find useful when you're managing all your research within Azure AI Studio. I haven't actually got to show you that yet. We'll do that in a minute. But within Azure AI Studio, you can launch-- you can create endpoints for open AI models, Mistral models, LLAMA models, anything from the model catalog.

You can create connections into other Azure resources like Cosmos DB and AI Studio-- AI Search, rather. You can hook it up to our evaluation framework. We'll be seeing that a little bit later on. All the features in this for managing the entire lifecycle of the endpoint itself and all the resources that are required to make that endpoint search.

That's the situation where you'll mainly be using Prompt Flow is in basically managing that connection between all those resources with the goal of creating an endpoint. You'll be, in our example here, calling from a web app, just regular React API endpoint. But likewise, you wanted to call that same endpoint from within semantic kernel or any other orchestrator, you could do that same thing there as well.

And if I may add, what I like also about Prompt Flow is that when you go to AI Studio, because it's very, very well integrated in AI Studio, so you have the visual aspect of it. But if you go to the playground of AI Studio, you can configure a RAG application and also build a code interpreter application and export it to Prompt Flow.

So instead of having to come up with a whole design of a Prompt Flow-based application, you can just use the playground, configure it as you want, and export it and have a ready-to-use Prompt Flow application that you can re-use and embed in your project. So you also have a workflow that goes from the UI to the code that way.

Which you don't with a semantic kernel. If you'd like to play around with that side of it, it's not part of the workshop. But feel free, because you have an instance of Azure AI Studio running already in your virtual machine. You go through the steps of selecting the one project we have here, which is called Contoso Chat SF AI approach.

One of the things you might want to play around is the playground. We've provided you with GPT-4 and GPT-35 Turbo endpoints, I believe. And this is the place where you can interface through a playground to test out the endpoints you've created. Same place as when you get into the Prompt Flow thing a little bit later on, you can then test out the connections between those endpoints and the databases and everything else you've built your RAG application around.

From there you have the Prompt Flow button where you can export to Prompt Flow. That's what I was talking about. Yeah, absolutely. That's what we're here for. Yeah, so Azure AI Studio is for mainly the purpose of creating an endpoint that works against an LLM and evaluating those endpoints.

They're the main two use cases of AI Studio. Copilot Studio by hand is about building applications, not endpoints. Either building a complete application like a chatbot application or any kind of user interface that has an LLM element or integrating those types of applications into applications like teams. So Copilot Studio is for building entire apps.

AR Studio is just for building endpoints. And as you might have guessed, Copilot Studio was built with AI Studio. In terms of the, in terms of, in terms of the, in terms of the, in terms of the Yeah. We were getting a bit of a big question for an exploring that.

Mm-hm. Will you see it that, that we build an application in Copilot Studio and then call it that we build in, um, under AI Studio? If you, if you want that level of customization, you don't need to with Copilot Studio. Is it designed in such a way you can build complete apps without having to customize your own endpoints?

But if that's the position you're in for a particular use case, then yes, absolutely. You can call any endpoint from Copilot Studio, including ones created with AI Studio. I don't believe, I'm not so familiar with that product, but I don't believe they have evaluation features. Yeah. Yeah, you would evaluate the, the LLM through its endpoint here in AI Studio.

Yep. We'll come to that in probably about 20 minutes or so. Yeah. But here is the very simplest prompt flow I just generated from the chat playground. All it does is take an input, passes it through to the LLM called chat there, and then generates the output directly back into the breakers.

That's a completely unfiltered AI endpoint. Whereas if you use an application like ChatGPT, there's a whole much more going on with your prompt and the context and everything else in a more rag style than just passing it directly to an endpoint. I'll keep on going to catch up to where you all are so I can demo the good bits when we go to the end.

That's where I would get the documentation to make it a little bit clearer. Okay. Cool. And then, let's see. Now, make sure. So, you're just missing that in the document that makes it clear. Oh, it couldn't do it. So, you're already there. There's no file. I think you're already there.

Oh, I think you're already there. Oh, no. Yep, yep. Oh, I already see there. Yep, yep, yep. Oh, I already see there. So, just set up. Yep, yep. So, just set up. Yep, yep. And then, you're going to... So, you're already there. There's no file. I think you're already there.

Oh, I already see there. Yep, yep, yep. Oh, I already see there. Yep, yep. So, just set up. Yep, yep. And then, you're going to execute the database. That's it. That's it. Okay. So, you're just missing that in the document that makes it clear. Okay. Oh, they couldn't do it.

Okay. So, you're already there. There's no file. Okay, you're already there. Oh, no. Yep, yep. Oh, I already see there. So, just set up. Yep, yep. Yep, yep. And then, you're going to execute the... Set up the... The database. That's it. Thank you. Yeah, just one sec. The question was, are we able to use another vector database besides Cosmos DB?

Yeah, absolutely we can. When we get the prompt flow up, I'll show you that. But if we go into the prompt flow node, which I think is the retrieve product information node, you can see it's set up directly as a connection to AI search. But you can set it up as a connection to other vector databases as well, or even just an endpoint.

What do you think is the biggest differentiator between AWS and Microsoft? What do I think is the biggest differentiator between AWS and Microsoft? Like why should I go with using Microsoft for startups versus AWS for startups? That's a different question because I was going to go like, the main differentiator between Azure and AWS is the enterprise scale.

You know, Azure is a database that's used by big companies like Microsoft and so forth and is designed with a lot of features in that around authentication, security, scaling, monitoring that big companies that run production apps really need. That actually makes it a little more difficult for startups, honestly.

Because the first thing you do when you start working with Azure is start dealing with things like resource groups and security, you know more about this than I do. We have to deal with that a lot. That's why we set up Microsoft for startups, is to help startups get into the process of understanding the different kind of process of working with resources in Azure, which is quite different from AWS in the sense that you can just spin up a single VM in AWS and you're done.

Microsoft, when you spin up a VM, there's actually six different resources that get created because it's there to support the enterprise use case as opposed to just, I want a single VM. And I will say Microsoft for startups, they have a pretty cool program, the Microsoft for startups hub.

They can give you up to $1,000, $100,000 in credits for Azure. It's like different tiers. So, you know, level one, they give you this many credits. Level two, they give you this many credits. So, you really get a lot of credits for developing your first startup application. And they also have a pretty good ecosystem.

They have, like, different mentors that you can get paired up with. They have, like, different sessions that you can attend to learn, like, how do I use this tool? How do I, like, implement this technology? So, they give you the credits and a lot of the guidance to make that happen.

And the whole Microsoft for startups team is here at the booth in, in, um, salon nine. So, chat to them about the startups up program and they can get you in at the right level. Yeah. Yeah. Yeah. Yeah. And, um, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh.

Thank you. Thank you. Thank you. Thank you. That's correct, yeah. Yeah, I had to start over because I ran that script before I did all the configuration. There we go. Yeah, so, no, don't do that actually. Can you click on text embedding? Yeah, it is connected. So, yeah, I'm not sure.

Let me talk to David and ask him. Can you click the deployment? Yeah. You could, absolutely. Yeah, the way PromptFlow is set up is it actually manages the chat history for you as it comes through the flow. But if you want to restore it beyond that interaction, then yeah, you can store it in the database.

Oh, yes. There might be the documentation that we missed. What's that? They're going to go in and it's going to go in and put the actual cost going to the LLM. Some of them are. Yep, yep. Okay. Yep. Maybe you might ask about like products or like a specific item.

So it's going to say for those. And then it's going to put those as a part of the workflow, right? Actually it's AOAI, and by default it is correctly set. Okay. I missed it when I went through it. Okay. But a few of them were, and what's confusing is that in addition to the Azure AOAI connection, you do have a default.

Right, yeah, there's two sets of resources. Yeah, so when you switch to default, actually you break things. Oh, okay. Okay. All right, when folks get to that, I'll have you show them. Yep. For those of you who haven't done step eight yet, the custom connection for Cosmos DB, I'm going to go through that now.

Go to the connected resources and view them all, and then create a new connection. So this is one of the things that the Prompt Flow does for you, is it has standardized connections to all these tools. But if you want to connect to any other service, you can use a custom connection, and that in fact is what we're going to be doing here.

Cedric looks like we have a question over the back right there. I'm going to add the key value pairs for our connection to Cosmos DB. I'm going to grab that value from our .env file. We have another good question here. Yeah, what is it? How does the vector search perform at scale?

So if you have a million vectors and you want to perform a nearest neighbor search against a query vector, what are some of the strategies to prune that? And then how does that impact recall as well? Yeah, well, first of all, that's the entire reason why vector databases exist, is to exactly do that search quickly.

They're indexed in such a way that they can do that nearest neighbor connection at scale and at speed. So you can certainly do that. It's not very difficult to do it yourself, to do an embedding for a bunch of documents or a bunch of chunks of documents, and then do a nearest neighbor algorithm to find what is closest to the embedding for the customer's question.

But the advantage of doing it within a vector database is you can do that quickly and at scale. And what was the second part of your question? I guess there's different neighbor algorithms, right? Yeah. There is like ANN as well. And then you have to, I think there's parameters that you have to tune.

That affects recall as well, depending on how far you want the tree search to occur. Yeah. And so what are some strategies to balancing that and getting the highest recall instead of just doing brute force? Because brute force, I'm assuming, takes a while at scale. Interestingly, and this might not be the answer you expect, is that, at least in our experience working with real-world applications, vector search by itself actually isn't enough, despite playing around with the algorithms and choosing the parameters so you can expand out the search and not miss documents.

What we've actually found is actually a combination of keyword search and vector search together actually outperforms either. And that's a feature that's built straight into Azure AI search. I think by default these days, it actually defaults to a hybrid search. In this particular example, just to keep things simple, we're just doing a vector search.

But for practical applications, we actually recommend a combo of keyword and vector search. And I'm assuming that metadata that you can search, like, for example, keywords can be updated for a product, in this case, at any point in time for any vector, right? Yeah, that's right. And then, is there, like, the last question, is there an upper limit on the dimension size of the embeddings?

I'm sure there is an upper limit somewhere, but we haven't come across it for AI search, at least. Yeah, I don't think there's one sort of in there by design. I don't think there's one sort of in there, but we haven't come across it, but we haven't come across it, but we haven't come across it, but we haven't come across it, so we haven't come across it now.

Yeah. What about using Entry ID with this, certainly external Entry ID, if you're trying to develop this for an externally facing application, multi-tenanted, and everybody having their own little space, shall we say? Yeah. Is there any sort of frameworks or things like this that are set up to show how to sort of set that up across all levels?

Because you're sort of across, like, if you are, if you're going from what I'm talking about, you're talking about Copilot Studio into this, into AI search, into Cosmos DB. Is there any sort of, is there anything written down anywhere? I haven't been able to find it, basically. Okay, that's an area I'm not an expert in, but Cedric or Miguel might be, either of you, too.

Entry ID. So, but, and, using external Entry IDs, I know it's only come out a couple of months ago, but being able to create a scope across all of those systems, so that the scope that you can see is for that Entry, external Entry ID, is there anything written down to show how to do that across each of the individual systems in the right way, or is there anybody working on that, or is, do you know?

You're, when you said Intry ID, so from an authentication standpoint? So, say again? Yeah, you're talking about the Intry authentication mechanism? Yeah, Entry, yeah, specifically the external ID, so using external IDs, so social logins and things like that. So, if you're facing this to the outside world, enabling external users, external IDs to actually get scope across the whole thing.

Yeah, no, it's, to be honest, it's not a domain where I've spent much time yet, so I'm not going to be able to talk much about it. I can add a little bit of color. You wouldn't have external users log into AI Studio, and that's for the developers and for the IT managers.

But what you are exposing to the app, which the end users are then accessing, is those endpoints. And you can manage those endpoints either by tokens or by managed identity. So, whatever way that you want the app to talk to the endpoint based on that identity controls, obviously, what the endpoint is then able to do.

And then all the way through. Exactly, yeah. Okay. And then this, at the moment, on the, if I'm reading this correctly, the retrieval document, retrieve documentation from AI search, is retrieving the vector, and then it's handing it off to the actual document itself. So, it's only the vector that's getting passed, not the document itself?

No, well, the vector is then used to retrieve the document, in this case a markdown file. And then the markdown file actually gets inserted into the prompt so that the LLM can see it, so the OpenAI can see it, and form its answers based on that information. Okay. And so that chunk is being sent?

Exactly, yeah. And I'll show you how that gets put together in a sec. There is one file there that has the prompt, and then it has the variable for the document, so that variable simply gets replaced by the text that it grabs from the search. So, just on, to continue building on the entry ID piece, because one thing that we're looking into is, for example, if we want to do granular access control, and make sure that we don't pass to our prompt flow, you know, the ability for it to search any database and retrieve any data from a user, just to make sure that a nefarious actor might not be able to get data from other users by something, through something like prompt injection.

Yeah. I noticed that in the authentication type for prompt flow, we're able to use token-based, so Azure AD tokens. Let's say that we do have a user log into our app, pass through that token to call the prompt flow endpoint. Can we then use that token authentication of the user to call subsequent components of our Azure stack?

So, to pass through that user's delegated token, and just make sure that we're retrieving data for that user? I don't think so. Go ahead, Cedric. I was going to say yes, but... I'm not a security expert. These guys don't love that than I did. So, the only thing is that I don't know much about intra.

So, any intra-specific things, I don't know much. But, because I joined Microsoft not so long ago, but when it comes to, like, normal security, you could use any kind of token, such as JWT token, and then, like you would, you know, encode a JWT token for any kind of REST API.

You can pass the JWT token to the prompt flow endpoint, and use it inside the prompt flow definition to pass on to whatever internal service you would like to. So, I think a very concrete example of that would be, let's say we have the Cosmos DB endpoint. And I want to ensure that I can only access the specifics user data in that Cosmos DB.

So, I want to actually use, like, row-based access control, where that user is only allowed to see certain rows in that. Oh, yeah. I can actually show you something to do with that right here, now that I understand. So, what we have here is the simple prompt flow for this particular application, where I do the input.

There's the embedding to retrieve the product documentation. There's nothing secret in that. What's relevant here to your question is, let me make this a bit bigger so we can see. There we go. And the customer lookup, which is using information that is provided by the customer through them having logged into the website.

So, for example, when I go to this website, do I have it open still? Did I close it? I get a mess, slash, AITour, Contoso Web. This link is given in one of the last steps of your AITour T. That's not right, Contoso Chat. So, in this case, when the LLM gets the prompt, it already knows who you are.

It already knows you are David. And it gives to the LLM only David's information. It will be different if the LLM asks, you know, what's your name? You're like, David. Never mind, I'm Miguel. Because then it could be jailbreak. But in this case, all of that is set up so that when the LLM gets there, it already has your name and it's authenticated.

And it has only your information and only your information. And I think that's just about what you were going to show. Yeah, this is what I was trying to get to. All right. So, when we're at this chat application, we can see Sarah Lee is logged in already. So, as we go through the prompt flow, one of the inputs there is the customer ID.

And that's come from this app through the token that's being provided to the endpoint. Actually, no, it's a parameter to the endpoint in this particular case. It's not set up exactly that way. But when we ask the question, what did I order last time? What's important to understand there is that there's nothing in this app that is searching any database.

All it's doing is passing the user ID and that question, what did I order last time, into that whole prompt flow. And then as part of that prompt flow process, which is a privileged account, it is then query the database with that user ID to get back her list of product purchases.

Then the LLM is operating on that information with that question to generate that bit of text that you see. And that bit of text is the only thing that actually goes back to the app. So, the app doesn't have any direct access to databases at all. Is that what you're getting at there?

I think that is what we're getting at. And I think you hit the nail on the head when you're saying that prompt flow has privileged access to the database. What I'm trying to avoid is for prompt flow to have privileged access. What I want to do is to inherit the access of the calling user through, for example, his AD token.

Right. So, I'm speculating here. This is not my domain. But I think the way that would work is through the features of the database where you give that authentication information with your query that prevents you otherwise. Then would that prevent what you're trying to do? Yeah, absolutely. I'm just wondering whether or not that's already something that you're looking into with, for example, the prompt flow connection to Cosmos DB.

That I'm not sure of, but I'm sure they are. Because that's the whole purpose for this thing existing in the first place. This is what Copilot Chat was built on, for example. And sort of that's all based on enterprise logins and things like that. Questions there? Yeah. Like, if you want to, like, have variable loops in the control DAG, is that possible?

Yeah, the question on that, do you have variable control points in the loop? Yeah, that is absolutely. If you run this within Visual Studio Code, you can set breakpoints in the Python code that runs within each of the nodes. Is that the question you're asking? I guess, like, depending on the results, like, of one node, for example.

Oh, conditional results. You might want to route to, like, different nodes. Yeah, absolutely. Because within, what's actually being run within each of those nodes, let's actually even take a look at that. If I go over to the prompt flow itself, let's have a look, for example, at the LLM response node, I think.

Actually, that one's not very exciting. Let's have a look at the concept of the prompt node. So, what's actually happening at this point is it's got, this is actually just the prompt that actually gets built together. But you can see, like, it's got this, like, metaprogramming language, you know, for item and documentation and so forth.

So, what that's doing through at that point is looping through all of the matched products that are related to the user query, extracting them out from Azure AI search as vectors, then extracting out the markdown files that relate to those vector indices, putting that directly into this prompt. So, when I ask the question of the app, you know, what's a good pair of shoes, that's not the only bit of text that is going to OpenAI at that point.

What is, in fact, going to OpenAI is a whole bunch of text defined by this customer prompt here, including telling OpenAI, you're an AI agent for the Contours of Outdoor Product Retailer, you should always reference factual statements, the following documentation should be used in the response, and this is where the individual relevant products are inserted into the prompt.

And, to our question earlier on, for this particular customer, this is where their previous orders are inserted into the prompt. And then, finally, the question, you know, what's a good pair of shoes, is sent to OpenAI. So, it has all that context from that RAG process to formulate a meaningful response based on that particular customer, their purchase history, the question, and the products that are related to their question.

Yeah, I want to give you another example, here's a better example of that kind of thing, in this case, this particular node is just running Python code. So, you can put conditionals into that Python code, for example, based on the inputs to do different kinds of things, anything you like, in fact.

May I think I know the specifics that you have in mind. What David showed, he showed two things, he showed in the templates, in the template nodes, he showed looping logic, and conditional, but it's looping and conditional string rendering constructs. And in the Python code, you can have any Python, like you can have conditions, loops, whatever, but to your point, all the nodes in the DAG are going to be executed.

You cannot have conditional node execution, but what you can have is inside a node, in the Python code, you can conditionally execute something. But all the nodes are systematically going to be executed. It is not a business process orchestration system. It is really tailored towards building LLM applications. So, it is simplified, it is not generic.

Does that make sense? Yeah, it looked like in that Python, though, that you had, that's where I was doing the customer lookup. So, how does that tie together? I mean, I see the line connecting it, and then I see the Jinja template for the prompt, and the Jinja template was iterating over customers, and that, you know, for, or sorry, iterating over the orders.

So, how does that, how does that tie together? So, where are you at right now? This one? Yeah, there was, like, the customer lookup Python that we were just looking at. Trying to make it bigger so I can see it. Oh, yeah. The one on the right, yeah, the customer lookup.

You have one node which queries the database, fetches all the information from the database, stores it into a variable, into the context, and then the Jinja template uses the previously set collection of results for the rendering. Right, that makes sense. So, when you say it stores it into it, is that where the response, the orders on line 13, is that doing it?

Correct. Okay, and then if we, if we click on the next one down, the customer prompt, and we go to that loop again, there it is, well, but, oh, okay, customer.orders, so that's, that's how it ties, then, eh? Correct. Yeah. Okay, thank you. Using input and output bindings on each node.

Yeah, and the arrows that are coming into the top of the representation in this graph, those are the inputs, and the arrow coming out of the back is the outputs, and there could be multiple of those. Thank you. Yes. It's called a visual editor, but it's really more of a visual reader, and that is absolutely true.

I want to highlight a little subtlety, too, when you get to step 10, when you first run your prompt flow in Visual Studio Code, you're going to be clicking on the run button once you've viewed the prompt flow itself in the Visual Studio Code environment. You can see the commands it's running, it's just running a little Python command to launch in the YAML file, but what I want to emphasize here is that, in reality, everything here is running locally, and in fact, in the usual developer environment, it will be running directly on your laptop or a shared machine.

In this case, in this case, it's running on the GitHub code space environment, and the whole idea behind this is you have a very fast responsive place to try out different prompts, to make sure your connections are working, perhaps testing different types of LLMs, replacing them in the LLM steps, so you can actually figure out what are the bits of the puzzle that go together to give you a good experience for the endpoint that you're trying to create, just in a local environment.

Now, I'll say local because, of course, the database is still in the cloud, and the OpenAI endpoint is still in the cloud, but all the orchestration is happening directly on your local machine. Our next step after this is going to be then publish that prompt flow into Azure, inside its own container app, as it turns out, and then that's going to be a hosted cloud version of that same prompt flow, which is going to support the production use of that endpoint in your application.

Yeah, side effect of what David just said is that because it's building a Docker container, you can actually customize environment and add packages. So, earlier we were talking about the differences between 17 kernel and prompt flow, one of the nice things with prompt flow is that it's very interesting for web developers, because they don't have to care about creating an environment, deploying a Docker environment, scaling it, the whole scaling is done automatically by the platform.

So, you just need to add packages, so you can combine an LLM with some packages for some specific processing, and the whole deployment is done automatically, so you can focus on the UI and the user experience. So, all right, and then let me get to the evaluation. So, all right, and then let me get to the evaluation.

So, all right, and then let me get to the evaluation. So, all right, and then let me get to the evaluation. So, all right, and then let me get to the evaluation. So, all right, and then let me get to the evaluation. So, all right, and then let me get to the evaluation.

so, all right, and then let me get to the evaluation. You just need the purchase history for that question. So, in that specific DAG, we will systematically query both the vector database and the customer information. So, yes, when it comes to answering the question of, can you repeat the question?

Yeah, it was what else did I purchase, then because we will query the order history from the relational database, the LLM is going to pay more attention to that part of the context than the product documentation side of things. But it's really, but it's really, we are really relying on the feature of the LLM to be able to pay attention to what matters.

One more question. Is there is a form of a form of a form of a form of relational query to ? Not in that specific prompt flow DAG. Because that specific prompt flow DAG just returns the, I believe, the last ten orders from the history is what we do, I think.

And that's it. And that's what we put in the context, because that RAC application is for workshops and demos. What you're talking about is to do something else, which is text to SQL, where you take a query in natural language, you transform it into a SQL query that you execute against a database where you have a filter where date is one month ago, or whatever.

So that's a similar use case, but a different implementation. And that's also an area to be wary of, too, because that's an area where prompt injection could come into the fact. If you're forming a SQL query on the basis of user input, you've got to recognize that there might be malicious input in that process, which might generate SQL.

There's still an intermediate step, it's not directly pasting the string into a SQL query, but there still is an opportunity there for bad actors to control what happens at that SQL generation step. And I believe we have a template for that, I believe Pamela has created a template called RAG on PostgreSQL, which is now in the same Azure samples GitHub account.

That does exactly what you're saying. It takes a natural language query, transforms it into a SQL query, and executes it on a PostgreSQL database, but you could do the same with Cosmos DB. So that actually leads me into another topic, which I wanted to get to before we run out of time here today, which is about evaluation.

So this is an important step. Any time that you put any kind of an LM-based application into production, where users are going to be providing input to that. And in this context of a chatbot, the kind of questions you want to ask are, did my chatbot give a relevant answer to my user's question?

Was the chatbot's answer grounded in the information that is available in my databases that is part of my RAG flow? Was the chatbot's answer coherent? Was it good English? Was it understandable? And the other metric that is in that list, which I'm trying to remember right now, is, I'll get back to in a minute.

But when you get to step number 13, we're going to take you through a Python notebook, which shows you a process for answering these questions manually, essentially. And then I'm going to show you how that's built into the Azure AI Studio platform itself. But we think of debugging in just regular apps and tests that we write for applications.

And it's very simple. It's a yes/no. Like, did the application return a positive value when it should be a positive value? Very easy thing to test for in programming style. Much more difficult test to answer the question was, is the answer generated by my chatbot relevant? How would you even program such a thing?

And the answer is, you get an LLM to answer that question. Now, this particular chatbot application we have running on GPT-35 Turbo. Very cheap, very fast, very reliable LLM. GPT-40 Turbo just came out recently. I haven't played around with it a lot myself. But I imagine that will probably take the place of GPT-35 Turbo in a lot of these applications pretty soon.

Next time we run this workshop, we're going to switch it over to using GPT-40. But you've also heard of GPT-4. There are very large, very powerful LLMs that have reasoning capabilities in some sense. Now, you wouldn't want to use GPT-4 in a production application like this. Because every time the user types in a chat, not only are they going to have to wait quite a long time for a response, but it's going to cost you a lot of money on the endpoint.

In this RAG architecture, GPT-35 works great. As long as you give it the context it needs to answer that question. But for this testing paradigm, for asking the question, is the answer, trail master jackets are good, to the question, what jackets should I buy? Is that relevant? Is that relevant?

That is the kind of question a powerful LLM like GPT-4 can answer quite readily. So think about how you might automate that process. You might use the prompt to GPT-4. Given this question, and this context, the stuff that we put into the RAG, and this answer, ask GPT-4 on a scale of 0 to 5, how relevant is this answer?

How grounded is this answer in the data that I've also provided here? Is that answer coherent? And these are all things that GPT-4 can do quite readily. And we can use the scores in this case that GPT-4 provides as a ranking of how well GPT-3-5 is doing in our endpoint for generating its answers based on the RAG process.

And that's exactly what goes on here. In this notebook, at the top of it, you can put in a question. I just ran it on, can you tell me about your jackets? You can have a look at all the code. You can even see the prompts that it's using to GPT-4 to answer these questions.

And you can see the actual answers that came back are in the next node up here. Here we go. Hey, Sarah Lee, let me tell you about our jackets. We have two awesome things, awesome options that will go over your previous purchase. Summer Breeze Jacket, et cetera, et cetera.

So that's the answer that the LM came back. This is the context that the RAG process was provided by that it used to generate that answer. And then with that information, we can ask those questions we just asked. Was that answer about jackets grounded in Contoso's product database? And GPT-4 ranked that a scale of five.

Sorry, a rank of five on a scale of zero to five. And likewise, we can ask questions about coherence, fluency, groundedness, and relevance. And we get the answers. This particular question is doing really well. You probably also want to test out your LLM on some adversarial types of responses.

For example, you might ask the question, you know, I want to buy a toothbrush. Now, remember, this is a camping store. They don't have toothbrushes. Nothing is going to come up in the database when we do the RAG search. Well, actually, something will come back because we always get back some responses that are somewhat close to our query.

But let's see how our LLM actually does here. When I run this notebook, it's going to run through those scripts. It's going to pass that question to our RAG flow, generate the response with GPT-3-5, and then ask GPT-4 to rank it on those four scripts using the prompts that are linked to in this script.

And when we come back to it, we can see the answer it came back with was, hi, Sarah. Since you're into outdoor adventures, I recommend the Fresh Breeze -- where's my scroll bar? Here we go. Fresh Breeze Travel something, something like that. There it is. There's my scroll bar.

Fresh Breeze Travel Toothbrush. All right. This is interesting. Qantasda does not sell a Fresh Breeze Travel Toothbrush. GPT-3-5 just made that up out of whole cloth. But this is what LLMs do. And we have to test to see whether or not they're doing these kinds of things for the type of use cases that we anticipate.

And we can detect that particular test is not going well. Groundedness. Score of one out of five. It really wasn't grounded in our data because there was nothing about toothbrushes in our context data that we provided through RAG. And similarly, coherence -- well, it was in nice English. So I've got a score of four for coherence, a score of five for fluency, but one for groundedness and one for relevance.

And so now you can think about automating this process. You can think about what are the types of questions that we want our application to do well at. What are the types of questions that we might want to, say, not give any responses to at all and score accordingly.

And I won't go through all the details of this, but when we get into AI Studio, there's a whole section here on evaluation. Let me just discard that. And this is the process where you can actually load into it a bunch of tests, which in this case are not Python code or C# code or whatever.

It's responses -- so questions, responses, and context. And then automate the process of evaluating how your end point, how your RAG process does on all those questions. So that next time when you add new products, or next time when you decide to upgrade from GPT-3.5 to GPT-4.0, you've got a series of tests ready to go to evaluate how well your application does in face of those changes.

All right. Questions here? Quick question on that. Yeah. So for -- after you've evaluated the model and you sort of understand the performance of it, what typically are your next steps and what actions do you take to drive improvement on the measures that you see there? That is an excellent question.

I have a slide just for that. I honest, this was not planned. But it's -- this is essentially the LLM Ops process, which is essentially the same as DevOps, but with a fancier name that gets you lots of funding. Here. Here's the life cycle. So exactly the same idea as when we build applications.

We go through the ideation process. We figure out our use case. We do some exploration, testing it to our data. We build our basic prompt flow in the LLM case. And we develop our first version of that flow. And then we actually run it against sample data. This is the evaluation step.

It was still very early on the process here. If the evaluations don't give the scores that we're looking for before we put it to production, the next step then is to modify our prompts. You saw that ninja template with a bunch of prompts around do this, don't do that.

You would modify those until you get the behavior that you're looking for. Maybe you change the process in rag. Maybe you chunk the data differently. Maybe you present it differently to the rag process. Then once you get satisfied in that process, you would keep on testing that against perhaps a live user cohort or bring in some testers.

Bring in a red team to try and break it. And again, go through that same evaluation process until you're satisfied. And then finally you'll be ready to actually deploy that to production. You would put monitoring in. You would actually monitor live probably a sample of actual user questions responses and have, you know, real-time charts.

Not real-time charts actually, probably daily charts of how your model is doing in scores like roundness for the types of questions that you can ask. And that might be detecting, you know, maybe things are drifting because your product set has changed and there are trigger words in your products that are making the GPT model do strange things.

Maybe you've got some adversaries that are coming in to try and hack into your system. That might come up in some of your monitoring scores. And then you go back through that iteration process to go back and build and augment the model for its next deployment. Is that the kind of question you're looking for?

Yes. Great. Any other questions? I have one question. Yeah. You know, right now when we look at the input from the user, you put text. You know, you put what you have purchased or something like that. Mm-hmm. Can this be improved to take, like, you know, graph or PDF file?

I'm not sure my question is-- Yeah. I understand how graph fits in there, yeah. Oh, a PDF file, yeah. I'd say, you know-- PDF file, yeah. PDF file, let's say, I want to input my PO rather than I type something. Yeah. Can this accommodate that kind of reason? Absolutely, yeah.

Azure Search, for example, can index PDF files and then you can do a search to find the PDF file that's most relevant to that user's question. Mm-hmm. You can then extract down from that PDF file context, which is put into the prompt, which is then used to generate the response.

And then you can put references back to those source files if it's a trusted user kind of a situation, so they can come back to see them. That's how Copilot works, for example. Okay. Yeah. But still, you know, from the, you know, prompt, you can only input text, right?

Right. The prompt, well, yes and no. This particular example, everything is converted into text. But today, we have what we call multimodal models. Yeah. GPT-40, for example, as the prompt, you can input not just text, but also images, even audio. Not video yet. Not video yet. But you could set up that RAG application to insert into the prompt the images or the audio or whatever it is you want the LLM to be able to reference.

Great, great. That's still relatively new, because 4.0 doesn't have all of its multimodal capabilities out yet, but the principle exists. Okay. Yeah. Good. And we have Florence that we can use. And Florence version 2 was actually released earlier this week, which is a model which allows you to do image to text.

So you can analyze an image, generate text out of it, and then take the text and give it to GPT 3.5 or something else. Are you giving a talk about that tomorrow? Yes. Well, I mean, briefly. It's one of the things I'm talking about, yes. Okay. And what time is your talk tomorrow?

What time? That's a very good question. I forgot. It's in my calendar. That's a killer. Yeah. . Because that's one kind of primary request from our team. And right now, I have built, you know, this, you know, customized prompt window. But it can only take text. And now they want to say, okay, I want to use PDF file or even GPT file.

So, well, like we said, for GPT file, when GPT 4.0, when we make available on Azure the multimodality capabilities, then you will be able to use it directly. For now, you can use another image to text model, such as Florence 2 or something else. For PDFs, so it really depends exactly what your use case is.

Like, is it a transient use case? Are you storing the PDF long term? Because if it's the latter... The end. Okay. So that's, okay. So if it's transient, then an alternative approach is instead of taking the PDF and storing it into the vector database and indexing it long term, which you can use...

Ah, one, two, three. What you can use is do that... You upload the PDF, you chunk it, and you can actually... The algorithms for embedding the chunks, you can actually run them in Python in memory. You don't have to do it like, you know, in a long term vector database.

So you can do the chunking and the embedding in memory. And actually, the vector similarity search function that to find what's relevant, you can execute those functions in memory to provide your users with transient fMRI experience where they upload the PDF, and you query the PDF just for the sake of the current conversation.

Okay. I think I need to take this offline with you. Yeah. My use case is a little bit different. I'll explain a little bit, you know, in PDF and then you can... Sure. Oh, yeah. And you can do it in memory or in a PostgreSQL database or a Cosmos DB that works too.

And delete the data after once you are over with it. And the last thing I wanted to say regarding, because that's a good question, is the last thing I wanted to say is right now, today, you can go to Azure OpenAI, you can deploy it at GPT-4 model, and in the chat, they have, like, a chat section where you can chat with your model.

Yes. There you can enter images. There, today, right now, you can go there. There you can enter a picture and you can do things with it. Like, hey, here is a picture of a website. Can you write the code for the website? And it will write code according to the picture.

I know this one, I have done that already. And this is more related to RG prompt window. Mm-hm. And so far, you know, what I have, you know, developed can only take text. It cannot take picture on a PDF file. And not to cut you off, because we love these detailed questions, but I've been told that I'm going to get cut off up here in just a minute.

And before I do that, I just want to let you all know that we'll be here for a few minutes for in-person questions around, but also come to the Microsoft booth in Salon 9. Lots of people there to have, you can ask exactly these kinds of questions of, so please go ahead and do that.

Cedric is also giving a talk tomorrow about multimodal models at 12:30 PM in Salon 10. So please come on and check that. But thank you, everybody, for coming today. If you'd like to do this at home, the repository is already in your GitHub account. And if you happen to miss that step, there's a QR code where you can get to it there as well.

But thank you, everybody, for coming today. And enjoy the rest of the conference. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. We'll see you next time.

Build, Evaluate and Deploy a RAG-Based Retail Copilot with Azure AI: Cedric Vidal and David Smith

Transcript