back to indexBuild, Evaluate and Deploy a RAG-Based Retail Copilot with Azure AI: Cedric Vidal and David Smith

00:00:00.000 |
This is the workshop on developing a production level RAG workflow so you're 00:00:20.240 |
in the right place if you want to learn how to build the backend for a chat 00:00:24.840 |
application that works also off of open AI and builds its answers based on 00:00:29.400 |
information that we draw from databases and vector databases we'll see all about 00:00:34.320 |
that in this presentation today my name is David Smith I'm a principal AI 00:00:40.680 |
advocate at Microsoft I've been with Microsoft for about eight years now after 00:00:46.200 |
my startup was acquired back in the big data space and I've been at Microsoft 00:00:49.680 |
ever since my background is in data science also did a lot of work as as a 00:00:55.500 |
statistician and these days I'm a specialist in AI engineering and I have 00:00:59.760 |
with me today two other members from Microsoft that are also specialists in 00:01:04.560 |
AI engineering first I'd like to introduce Cedric Vidal who's on my team in AI 00:01:08.820 |
advocacy you wouldn't mind introduce yourself Cedric hello everyone so like 00:01:13.920 |
David said I'm a Cedric Vidal I'm a principal AI advocate at Microsoft and I 00:01:19.780 |
I have a background in AI said driving cars software design architectures and 00:01:27.820 |
everything in between I've been working in space for 20 years and today I'm gonna 00:01:33.640 |
help David with the workshop welcome everyone thank you Cedric and we also got 00:01:40.180 |
Miguel Martinez come all the way from Houston is a technical specialist at 00:01:44.860 |
Microsoft we go tell the crowd a little bit about yourself absolutely hello 00:01:49.240 |
everyone welcome today my name is Miguel Martinez I am a senior technical 00:01:53.860 |
specialist for data and AI at Microsoft so a lot of our clients you know this can be 00:01:59.440 |
startups businesses they hear about open AI and chat GPT and all of those things so 00:02:05.500 |
they think about well how can I actually use it for my business how can I use those 00:02:10.540 |
tools to drive business value and that's where me and my team come in and we 00:02:15.140 |
help all of our clients develop some new solutions to drive that value all right 00:02:20.440 |
well Cedric and Vidal will be here after the next two hours helping you as you go 00:02:24.400 |
through this workshop so they'll be wandering around if you guys want to head 00:02:27.640 |
out and if you need any help during the workshop raise your hand and one of the 00:02:31.120 |
three of us will come up and help you so thanks guys all right let's jump right in if 00:02:36.020 |
you would like to get started you can use that URL on your screen right there just pop 00:02:41.540 |
up a browser on your laptop I'll give you more information about what's going on when 00:02:46.120 |
you get there but you should be able to follow along and get started if you like now to participate 00:02:53.840 |
in this workshop it is going to be hands-on so you will need to have your own laptop and 00:02:57.880 |
I can see looks like everybody does have their own laptop that's great this is not something 00:03:02.060 |
you'll be able to do on your phone or on a tablet because there's lots of lots of work that we'll be 00:03:07.740 |
going through as we work through a github repository and that is the second thing that 00:03:12.540 |
you'll need to have to participate in this workshop is a github account if you don't yet have a github 00:03:19.360 |
account please go ahead now to github.com/signup and create yourself a brand new account we'll be using the 00:03:27.060 |
github code spaces feature to provide our development environment but the free github 00:03:33.060 |
code spaces is more than sufficient for the work that we'll be doing here today we are going to be 00:03:38.820 |
building an application in the Azure cloud but you do not need to have an Azure account for this 00:03:45.780 |
workshop we're going to provide you with a login to an Azure account and we will have already set up all the 00:03:52.020 |
resources that you need to work with this application and that specifically is things like 00:03:58.020 |
Azure AI search the vector database cosmos DB the database which we're going to be using for our 00:04:03.940 |
customer information connections to open AI which we're going to be using for our LLM and various other 00:04:10.020 |
resources and tools that we'll be using in Azure you can do all of this on your own if you do have your 00:04:16.980 |
own Azure accounts and are willing to spend a few dollars in credits to run the resources over a few 00:04:22.420 |
hours for this resource by the end of this workshop you will have excuse me you will have all of the 00:04:30.020 |
information code data and everything you need to recreate everything that i show you here today one of the 00:04:35.860 |
first things that we will do in fact is to clone to fork in fact a repository into your own github account 00:04:40.980 |
account and everything you need will be right there in there all right but if you would like to run 00:04:47.380 |
through this at home you will need an Azure account and you can create one at the link you see on the screen 00:04:52.420 |
right there so before we jump in and i start giving you some demos about how this works let me orient 00:04:58.420 |
you a little bit that link that i gave you on the last slide will have launched you into a virtual machine 00:05:06.420 |
it is a windows virtual machine but we're not going to be using windows at all in fact the only thing 00:05:12.260 |
we're going to be using this virtual machine for is the instructions for the workshop which you can see 00:05:17.460 |
on the right hand side of the screen there in front of you you'll be going through that page after page 00:05:22.100 |
this instruction workshop feature has some nice tools you can use for example wherever you see the green 00:05:29.940 |
text you can click on the green text and it will paste that directly into your browser that'll save 00:05:36.660 |
you some time when you're entering in the passwords and urls and things like that and so forth i will 00:05:42.660 |
mention that we are in a virtual machine environment it looks like the wi-fi is pretty good here but if you 00:05:48.420 |
find that the virtual machine is slow or there's lots of lagginess because of the wi-fi or if you just 00:05:54.020 |
prefer to use your own browser on your own laptop you can totally do that you can just open up a browser 00:05:59.780 |
and follow all the same set of instructions the only difference is you'll have to open up that virtual 00:06:03.940 |
machine every now and again to look at those instructions and you'll have to manually cut and 00:06:07.860 |
paste the green parts from the guide into your own browser so it's your choice about which direction 00:06:14.580 |
when we actually come to running shell commands and things like that the dev environment that we're going 00:06:26.820 |
to have you use is github code spaces you could totally run all this on your local desktop 00:06:32.740 |
but we've got to make sure you've got vs code installed you've got to have all the right python 00:06:37.060 |
libraries installed etc etc so to make things easy here we're just asking everybody to go straight into 00:06:42.820 |
github code spaces where everything is set up and you can do exactly the same process for yourself at home 00:06:49.060 |
if you're not familiar with github or not familiar with github code spaces just pop your hand up when 00:06:54.340 |
we get started we'll come and chat to you and talk to you about how that works and what's going on there on 00:06:58.660 |
there but if you have used github code spaces before it should be pretty straightforward right there 00:07:07.620 |
all right so this is what we're going to do today we are going to build this app right here or at least 00:07:14.980 |
we're going to build the back end to this app and have you interact with that back end through a little 00:07:20.500 |
ui that you'll be working through if you do want to build the front end to this as well we do provide 00:07:25.460 |
the code for the front end and all the data that's also linked from the github repository so you'll have 00:07:30.180 |
have access to that but i'm going to give you a little demo about this website just to set the scene 00:07:36.340 |
so let me go over to my browser here i think i have the website open so the the um the sort of the the 00:07:45.140 |
the idea here is that we are engineers working for a retail company you know something like rei that sells 00:07:53.860 |
camping equipment and backpacks and trail hiking shoes things like that they already have a website 00:07:59.460 |
which you can scroll through and you can see the products that they have available it's pretty 00:08:04.340 |
limited selection just to keep things simple and we can click through to see product information for 00:08:09.380 |
each of the products that this company sells so for these these uh trailblaze hiking pants we've got 00:08:15.300 |
lots of details about the features there are some roof reviews faqs there's a return policy 00:08:20.580 |
some cautions technical specifications user guide care maintenance lots and lots of information 00:08:26.580 |
available to the customer about all the products that are available at this store in addition this 00:08:34.180 |
storefront has a customer login feature in this particular example the customer sarah lee is already 00:08:40.820 |
logged into this system what we're going to be building here is a chat bot that operates on this website 00:08:50.740 |
you'll be able to access that click on the chat bot button and ask the question for example 00:08:55.540 |
what can you do and this is going to connect to the system that we're going to build 00:09:03.300 |
to get an answer to that question this is a pretty simple question it's coming straight from the llm and 00:09:08.580 |
its context to tell it that as as an ai agent i can provide you with information about our products to help 00:09:13.460 |
you with your purchases so on and so forth now we can also ask questions about specific products for 00:09:21.540 |
for example i'm going to pop up my paste here let's try one like here we go what is a good tent that goes 00:09:31.620 |
with um the trail walker shoes okay so in this case it's actually consulting all the information product 00:09:44.420 |
information that's available in that website to formulate that that formulate that answer with 00:09:49.140 |
the llm and it comes up with for your upcoming trip to andalusia i recommend pairing your trail walker 00:09:54.180 |
hiring shoes with the trail master x4 tent we're also going to provide the llm with information about the 00:10:01.620 |
customer themselves their name their status in the loyalty program where they live and their order history 00:10:08.180 |
so it's able to answer questions like this what have i already purchased from you so in this case the llm is 00:10:15.940 |
able to consult the customer's purchase history and give back the answer to say that sarah lee a value 00:10:21.620 |
customer who's purchased the cozy night sleeping bag the trailblazer hiking tent and so on and so forth 00:10:27.620 |
so this is the system that you'll be building on the back end so that that chat bot can use an llm like open ai 00:10:36.420 |
to answer those kinds of questions based on the customer's purchase history and the products available 00:10:44.260 |
from this particular retail store any questions on that so far very straightforward okay great 00:10:52.420 |
all right we are going to be building this on azure and using various azure resources which we will provide 00:11:01.220 |
to you to do that we're going to be using the azure ai studio platform to manage the large language models 00:11:08.820 |
and the various resources that we're working with we're going to be using azure ai services which are 00:11:14.820 |
going to provide our various tools to us but in particular the open ai models that the chat bot is 00:11:20.740 |
going to use to generate its responses we'll be building an azure ai project which is going to manage all the 00:11:27.780 |
flows that we have to to have the chat bot gather its information and generate its responses 00:11:33.540 |
the product database is going to be stored in azure ai search and as an ai search is a vector database 00:11:42.100 |
or at least it has a vector database feature which we're going to be using to match the customer's 00:11:47.700 |
questions to the nearest or the most relevant products that are related to those customer questions and using that to provide 00:11:55.540 |
context to the chat bot and lastly the customer information is going to be stored in a database 00:12:02.180 |
regular rows and columns type of database relational database where we will extract the customer's information 00:12:16.020 |
okay so let's have a look at the architecture we're building for the back end in a little bit more detail 00:12:23.460 |
customer types of question into the chat bot on the website and that gets extracted out and sent to an 00:12:32.180 |
endpoint just as a simple string along with the customer's chat history as well 00:12:37.140 |
that question gets sent to azure open ai where it will get embedded into a vector format if you're 00:12:44.340 |
not familiar with embedding it's basically the idea of converting a piece of text piece of string into a 00:12:50.260 |
point in multi-dimensional space in such a way that the other points in multi-dimensional space we have 00:12:56.900 |
already defined by embedding the product information pages the ones that are closest to the question 00:13:03.540 |
are the products that are most relevant to that and we'll use that to extract out the most relevant products 00:13:08.260 |
and push them into the context of the large language model at the same time we'll also extract out 00:13:14.660 |
information about the customer and their purchase history from cosmos db 00:13:20.180 |
so we do the embedding do the search in azure search to get the product information we extract out the 00:13:25.540 |
customer information with cosmos db we feed all that information into a prompt so we create a large 00:13:32.340 |
prompt with information about the problem to solve you have the customer information about our products 00:13:37.220 |
information about the relevant products from the azure ai vector search information about the product's 00:13:42.980 |
customer history and then generating a response based on the customer's question any questions so far 00:13:49.860 |
all right we'll get right into it just shortly 00:13:54.500 |
i'll come into all this a little bit later on but we're going to be using a tool within azure ai search 00:14:01.540 |
called prompt flow i'll come back into the details a little bit later on but what we're going to be doing 00:14:06.820 |
is taking that state that same retrieval augmented generation process that i just outlined to you 00:14:14.980 |
the steps are to take the question retrieve related data which in this case is our product data 00:14:20.260 |
augment the prop for that information that's come from the knowledge base generate a response with the 00:14:25.700 |
large language model and then send the results back to the user we'll be creating a version of that 00:14:32.580 |
retrieval automatic generation flow using prompt flow which is a development tool that's in azure 00:14:38.580 |
ai studio to streamline this process of putting all that data together we take the inputs we embed it 00:14:44.340 |
we retrieve information from ai search we look up customer information for the database put all that 00:14:49.620 |
together with the customer prompt to generate the response and that's what the customer sees in the 00:14:54.420 |
website so that's what you will be doing so with that let me go back to the start here 00:15:01.380 |
has anybody not yet been able to log in to that website if you anybody having trouble if so cedric 00:15:09.540 |
and miguel will help you guys up let the password will come up in just a second all right so let me 00:15:16.180 |
actually show you that okay so i've gone to that website and you should have come to a page just like 00:15:24.420 |
like this we can go ahead and launch the virtual machine 00:15:27.540 |
so on the right hand side of the virtual machine is the instructions the bottom right hand corner is 00:15:44.500 |
the next button that'll take you to the next page of the instructions and that is the page where you'll find 00:15:51.460 |
the password for the windows machine which is curiously password and from then you should be 00:15:59.620 |
able to log into the windows virtual machine right there 00:16:09.940 |
the next step after that and i can actually go to the next step here 00:16:13.220 |
is to open up the browser i'm doing this directly within the virtual machine 00:16:21.780 |
we're going to browse to a particular github repository that we provided to you for this workshop you can 00:16:29.940 |
have it typed into the browser in the virtual machine or you can cut and paste it into your own browser as 00:16:34.260 |
okay you'll need to log in to get up at this point just a slight warning is that if you use single sign-on 00:16:50.420 |
this happens to us at microsoft we use signal sign-on to access github that does not work through the 00:16:55.380 |
virtual machine so i'm going to show you actually doing that directly from the browser 00:17:04.260 |
and we're going to do code spaces it looks like i've actually already started a code space here 00:17:12.660 |
the instructions will tell you to launch a code space oh i forgot one very important step 00:17:18.740 |
before you launch any code spaces we're going to switch to a different branch 00:17:21.380 |
rather than doing the main branch which is the one that you'll use if you want to do this at home 00:17:27.060 |
we've got a special branch here for this lab it's called ms build dash lab 322 00:17:32.180 |
the only difference is this version of the lab skips all the deployment instructions because we've done 00:17:41.460 |
and then from that from that branch of the repository we're going to launch a code space on that specific 00:17:49.700 |
branch i'll show you that in just a moment it's a launch one that i already have here 00:17:55.860 |
this is what happens once code space is set up takes a couple of minutes for code spaces to warm up warm up 00:18:05.060 |
and what we have here in case you've never seen this before is basically an instance of visual studio 00:18:16.740 |
code running in the github cloud directly in your browser so it's the same user interface as visual 00:18:24.740 |
studio code but it's just running in your browser if you're experienced using code spaces you probably 00:18:30.020 |
know that you can also run this directly from within visual studio code in your desktop and connect to 00:18:34.740 |
that instance if you've got questions about that happy to show you 00:18:37.700 |
one of the things that we are going to be doing with code spaces and it looks like i've already 00:18:45.220 |
closed the terminal is logging in to your azure account 00:18:53.540 |
at the terminal and the instructions to that if you've been following along you might be ahead of me 00:18:59.460 |
is to log in to um the portal and actually do this directly within the the local browser in this case 00:19:18.740 |
okay this is the username we provided to you for this temporary azure account 00:19:23.460 |
it'll only exist um for the duration of this workshop and everything will be deleted once we're finished 00:19:28.900 |
one thing you'll probably find is because we're in a brand new windows virtual machine it thinks 00:19:40.340 |
you're a brand new user to windows it thinks you're a brand new user to azure keeps on popping up all these hints 00:19:47.220 |
if you're familiar with azure you can just delete them if you'd like to do an introduction to azure and 00:19:51.300 |
you can do it here or you could do it at home and the trick is to get back to the starting place is to 00:19:57.380 |
go to the home button there and then i won't go through these steps on the screen for you but you'll be 00:20:02.580 |
following through the steps on here to have a look at the resource groups we have available also the 00:20:08.020 |
resource group we provided to you in azure and you'll be able to inspect all the resources that we've already 00:20:12.340 |
deployed for you that you'll be working with lastly i'll just show you this last part 00:20:22.580 |
is when we log into visual studio code let me show an example of copying 00:20:28.740 |
sorry i don't normally do this way there we go 00:20:42.660 |
where's my copy button i'm not using my usual laptop here 00:20:46.020 |
all right so what i will do this might be a neat trick is just paste it directly up here 00:20:57.540 |
and copy that back into my other visual studio window 00:21:09.140 |
all right it's usually easier than that just to show you that process of how you're going to connect 00:21:20.180 |
your visual studio code instance in github code spaces with your az account is with this az login 00:21:26.660 |
after a moment it will give me an eight digit code which i'll copy 00:21:31.860 |
and i'll go to this device login page to actually log into azure 00:21:38.180 |
and now all the commands that i run within the visual studio code command line terminal 00:21:47.380 |
will run against the azure account that we've set up for you so that's kind of the main things to 00:21:51.620 |
watch out for as you get started just continue to work with you through the instructions as you go 00:21:56.020 |
cedric miguel and i'll be wandering around to help you and put your hand up if you get stuck and 00:22:00.180 |
in about 20 minutes or so i'll jump in and give you a bit more context about the prompt flow that we're 00:22:04.340 |
going to be able to find out for you so that's what's going to be able to find out for you so 00:22:19.300 |
that's what's going to be able to find out for you so that's what's going to be able to find out for you 00:22:21.940 |
so that's what's going to be able to find out for you so that's what's going to be able to find out for you 00:22:23.940 |
so that's what's going to be able to find out for you so that's what's going to be able to find out for you 00:23:08.820 |
so if you're already logged into github it'll just continue to use that login 00:23:46.820 |
it looks like you're logged in yeah yeah yeah as long as you're logged in you're good 00:23:54.820 |
yeah and when you forked it did you um did you uncheck this copy of the main branch only 00:24:00.820 |
uh oh i haven't forked it yet okay oh good yep 00:24:04.820 |
oh wait the this is what i got to fork then the contoso chat 00:24:08.820 |
yep yep so click the fork button first of all yeah which is up here yeah 00:24:20.820 |
and the nice thing is now whatever github account you're using you'll have this in that account 00:24:34.820 |
i mean the setup's awesome i love that github and that's everything 00:24:46.820 |
just telling you that you just skipped ahead yeah 00:25:16.820 |
so click on the code spaces tab up there yeah 00:25:24.820 |
so that's available because of my like private account 00:25:33.820 |
it's a nice consistent environment and what we can do on our end 00:25:36.820 |
which we do for all of our samples is we make sure that when you launch code spaces 00:25:40.820 |
you get a command line that has everything installed that you need to go 00:25:53.820 |
so it's gonna take probably 60 seconds or so to do things 00:25:56.820 |
so what we actually recommend is you go to the next step and then come back to it later 00:26:05.820 |
uh... so i mean it's like you gave me an azure account 00:26:20.820 |
yeah cause some of them take a little while to provision so it's it's for these workshop environments it's easy for people to be able to jump straight in 00:26:39.820 |
and the next step we can actually log into ai studio 00:26:58.820 |
you will have to wait for something you should be 00:28:26.880 |
Yep, that one is code spaces, the one before. 00:28:33.440 |
This tab right here where it says preview reading, that's code spaces right now. 00:28:37.260 |
So it's showing you the contents of the repository with a command line connected to that file 00:28:45.200 |
I'm so used to using, sorry, oh I don't do that, I'm sorry, I'm so used to just using VS code, 00:28:52.900 |
I don't even, sorry, so it doesn't seem like you are inside the, yeah, I'm not. 00:29:04.840 |
I don't know, I don't know, I don't know what happened, it used to have all the iPhones and 00:29:14.780 |
it used to have all the, it looked, it had showed I explore and everything else, right? 00:29:26.720 |
I think it just stopped but, yeah, I did it earlier so I think it just wiped itself or something. 00:29:46.540 |
I had it actually, hmm, like it, hmm, Apple, but I really logged into this like ages ago 00:30:07.480 |
Because, you know, the tab that you opened outside or no, that's okay. 00:30:15.480 |
Yeah, so that's why I'm thinking that I already have the, the, the, the, the, the, 00:30:24.480 |
Yeah, I guess you need to do it again. I'm sorry. I said you should be able to just click on it. It should tell it for you. You don't have to click on it. 00:30:43.480 |
Oh, cool. Well, no. The reason I'm saying is because I had this load. I had everything else load. The only thing I stopped at was after I did this. 00:31:11.480 |
No, I understand that. I'm just saying your stuff is buggy because it shouldn't not work across windows. 00:31:29.480 |
Say that again? Oh, no, no. I'm just catching up to you guys. Yeah. 00:31:39.480 |
I'm just logging into GitHub right now. Yeah, that's where we are. There we go. 00:31:45.480 |
Just to avoid confusion, you might want to close the other tabs. Otherwise, it's confusing. 00:31:51.480 |
No, that's why I'm confusing it. Like, these are other testing. 00:31:57.480 |
The issue. The issue I'm having is you're saying this thing should be . 00:32:15.480 |
Yeah, and this will, I already have that. So, after that, what I'm sorry. 00:32:23.480 |
What I'm sorry, what I'm sorry, what I'm sorry. 00:32:25.480 |
What I'm sorry, what I'm sorry, what I'm sorry, what I'm sorry. 00:32:31.480 |
What I'm sorry, what I'm sorry, what I'm sorry, what I'm sorry, what I'm sorry. 00:32:43.480 |
Okay, I got it. That's what I was trying to do. 00:32:45.480 |
Got it. So, copy and then join me. Yeah. So, that's why I was saying that . 00:32:57.480 |
Yes. No, you could just open the one you had already created. Here. Yes. Yes. 00:33:25.480 |
It takes about 60 seconds sometimes for the code to pop up. 00:33:37.480 |
And you do need to hit enter in the terminal afterwards as well to confirm. 00:33:49.480 |
Basically redo what you've done, but do it inside the VM instead of your macOS browser. 00:34:33.480 |
That's right. You're connected and you can close that tab now, you won't need that one again. 00:35:02.880 |
So, right now it's going to create the attribute table and the data. 00:35:46.880 |
Yeah, but you have already the cut space opened here. 00:35:51.880 |
Let me show you where you'll get to once you get to about step seven. 00:35:56.880 |
I've gone through the steps of logging into GitHub, cloning the repository, launching code spaces, 00:36:01.880 |
logging into the Azure portal and AI studio from the browser, and also logging into the 00:36:07.880 |
Azure from the visual studio from the visual studio command line. 00:36:12.880 |
So, they can run some commands, and I've run this post provision command here. 00:36:17.880 |
You're welcome to have a look at what that script does. 00:36:23.880 |
What we have done is, if you go to the home page of the Azure portal, you can also do this from the 00:36:30.880 |
You can have a look at the Azure CLI if you prefer. 00:36:31.880 |
You can have a look at the list of resources we've deployed into this single resource resource 00:36:38.880 |
And you can have a look at the list of resources we've deployed into this single resource group for you called 00:36:49.880 |
And you can have a look at the list of resources we've deployed into this single resource group for you called Contoso Chat RG. 00:37:04.880 |
And you can see that there's 11 resources that we have launched for you. 00:37:09.880 |
What that script did was to populate some data into Cosmos DB and to Azure AI Search. 00:37:16.880 |
Let's have a look first at Cosmoso DB, which is the Azure Cosmos DB account. 00:37:23.880 |
And here in the portal, I can go into and have a look at the data explorer. 00:37:31.880 |
Okay, there's a video to watch if you're interested. 00:37:40.880 |
Okay, and what you can see here is now within our Contoso Outdoor Cosmos database, we have a customer's database. 00:37:48.880 |
And you can actually drill in there if you're familiar with using databases and have a look at the tables. 00:37:53.880 |
There's about 12 customers in the table and/or their purchase history. 00:38:00.880 |
Likewise, if I go back to the resource group and have a look at the Azure AI Search research, which is called Search Service. 00:38:12.880 |
You can go to the indexes in the sidebar under Search Management. 00:38:20.880 |
By the way, if you can't see this sidebar, it happens if you're using a very small laptop screen or if you've got a large font. 00:38:27.880 |
It might be hidden behind this hamburger menu here in the corner. 00:38:46.880 |
I don't think I went through all the steps to actually pre-provision things there. 00:38:49.880 |
So we'll go back and have a look at that again. 00:38:52.880 |
I just want to give you a preview of where you're coming to. 00:38:54.880 |
Again, always any questions, just pop your hands up. 00:39:24.880 |
Let's see if I can figure what's going on here. 00:40:03.880 |
So I was just looking at the prompt flow example on step, sort of end of step seven, where we're 00:40:08.880 |
looking at the graph of the pre-existing prompt flow that's been created. 00:40:13.880 |
One of the questions that I had is, can the graph also be cyclical? 00:40:19.880 |
And that is, because I don't think there's any support for any kind of looping like that. 00:40:31.880 |
Is there a reason why you'd want it to be cyclical? 00:40:38.880 |
Well, like, within a single component of that prompt flow is anything you want. 00:40:46.880 |
So if you need to do that kind of interactivity within one of those nodes, you can do that. 00:41:02.880 |
I realized the step that I skipped, because I was heading ahead, was I didn't do the bit 00:43:25.880 |
This time, there's a lot of power in the DAG description that we can build. 00:43:32.880 |
One thing that we'd like to explore, that we're currently exploring with a couple of our clients 00:43:36.880 |
is how to hit these kinds of DAGs from Enterprise to GPT to use the Actions Framework 00:43:43.880 |
and then go to a middleware layer where a DAG resides. 00:43:46.880 |
What I'd like to understand is, what are our options to expose this as an endpoint to external sources other than our web app? 00:43:56.880 |
That's actually going to be about step 11 or 12. 00:43:58.880 |
We're actually going to deploy it as an endpoint and then connect it to the website. 00:46:06.880 |
I completely skipped step six, which is an important step. 00:46:10.880 |
You can confirm if you've done step six by having a look in your 00:46:15.880 |
Once you've done that, you should have a file called .env, 00:46:19.880 |
and that is where we've set up all of the endpoints and keys 00:46:23.880 |
that you will need to access the resources that we provided for you. 00:46:27.880 |
And you'll also have a config.json file, which does similar kinds of things for AI 00:46:36.880 |
And then once that is all set up, I should be able to go back and run this pre-provision script. 00:48:15.880 |
I think it's because I ran the script already once. 00:48:50.880 |
So one of the things we did in that pre-provision script 00:48:54.880 |
was to take each of the markdown files, which are in the repository. 00:48:57.880 |
There's one markdown file per product that the company sells. 00:49:01.880 |
And then we just script indexing that into the AI search database, 00:49:05.880 |
which essentially converts that entire markdown file into one point 00:49:16.880 |
Actually, in this example, we don't chunk it, just for simplicity. 00:49:19.880 |
If you actually do it through AI Studio on the search on your own data, 00:49:28.880 |
While we're getting going, how does PromptFlow, Autogen, and Semantic kernel 00:49:38.880 |
Are they all competing projects at Microsoft, or are they...? 00:49:42.880 |
Not so much competing, and they have slightly different versions. 00:49:48.880 |
Let's start with Semantic kernel and Autogen. 00:49:56.880 |
Semantic kernel we kind of view as the enterprise product. 00:49:59.880 |
that's the one that is designed for user enterprise settings, has strict versioning, 00:50:05.880 |
you know, API changes, all those kinds of stuff. 00:50:08.880 |
But like Langchain and other orchestrators, you can use it to connect different tools together 00:50:16.880 |
Autogen serves a similar kind of purpose, but it comes out of Microsoft Research. 00:50:22.880 |
It's a little bit more flexible, based on a slightly different paradigm. 00:50:25.880 |
But that's not the one that we recommend for enterprise applications today. 00:50:32.880 |
PromptFlow is directly within the AI Studio product, and it's purely for orchestrating 00:50:38.880 |
within an endpoint that you deploy through the AI Studio product. 00:50:47.880 |
The whole purpose is to create one endpoint that goes through, in this case, a RAG process. 00:50:51.880 |
But it's designed to be more flexible than just for RAG. 00:50:54.880 |
You could use semantic kernel to manage multiple PromptFlows as endpoints. 00:51:02.880 |
And also, you can embed PromptFlow as a library. 00:51:08.880 |
Deploying it as an endpoint is one possibility. 00:51:11.880 |
It can be used to write integration tests, evaluation tests, embed in your application. 00:51:19.880 |
Like, if you have a Python native application and you want to embed a DAG in it, it's also 00:51:26.880 |
Is there anything like PromptFlow that sits on top of semantic kernel to manage multiple 00:51:50.880 |
And honestly, when you work with PromptFlow for any length of time, you'll be working with 00:52:00.880 |
It's great for these workshops, because I can point to things and show you how these 00:52:04.880 |
And it's great for debugging, because you can actually see how the data is flowing through 00:52:08.880 |
In terms of an editing environment, that's not really what it's for. 00:52:17.880 |
In addition to the graphical YAML-based DAG, now you can also use PromptFlow in a more programmatic, 00:52:28.880 |
Where you write the PromptFlow configuration in Python instead of graphically. 00:52:39.880 |
To be honest, I'm not sure how generally available it is. 00:52:54.880 |
But ultimately, this is the representation of the PromptFlow. 00:52:59.880 |
It's a YAML file, which just defines each node with a bunch of tags associated with its 00:53:04.880 |
inputs and outputs, and how it connects to the various endpoints and the types of nodes that 00:53:12.880 |
If you're running that as like a Python library, at that point it's fairly similar to using 00:53:25.880 |
So there's a command line, which is what we're doing in this thing, is to run that PromptFlow 00:53:28.880 |
with a given set of inputs to generate the outputs. 00:53:32.880 |
Which provides a more Longchain-like experience. 00:53:47.880 |
Autogen could, in theory, do similar stuff, but it's more of a labs thing, not ready for 00:53:58.880 |
Like David was saying, the team at Microsoft that works on both projects are very different, 00:54:11.880 |
Semantine-Kernel are very product-oriented, so they follow very strict software release lifecycle, 00:54:19.880 |
whereas the other team, the Autogen team, is really research. 00:54:26.880 |
So they try the latest cutting-edge AI things. 00:54:35.880 |
So if you use Autogen, you're literally on the bleeding edge, and things might break. 00:54:47.880 |
It might work for your application, but you need to know where you're going into. 00:54:54.880 |
We had used it for a hackathon, and we were able to put together an agentic-type flow really 00:55:01.000 |
quickly with it, so we liked it, but it's good to know that we need to be a little careful 00:55:10.540 |
So semantic-kernel is more for the orchestration, and this could be more closer to the app layer, 00:55:21.540 |
where I'm building a back-end, where I'd be using semantic-kernel to orchestrate the LLM, 00:55:42.540 |
Prompt flow is more on, I want to build a DAG, manage its lifecycle, potentially evaluation. 00:55:50.540 |
And like I was saying, you can also embed Prompt flow as a framework, as a library. 00:56:09.260 |
But that's the use case we'll be using in this one, as an endpoint. 00:56:15.260 |
And to make things even more confusing, sorry, semantic-kernel, you can also build agentic 00:56:36.260 |
Simple ones, but you can build agentic applications with it. 00:56:53.260 |
Personally, I like to use it that way, because you're in development. 00:56:58.260 |
It makes things easier to write tests, to write snippets of code that you can reuse. 00:57:09.260 |
So it's more like development convenience rather than anything else. 00:57:18.260 |
If you wanted to build, I don't know, like a rich Python desktop app where you happen to be 00:57:25.260 |
wanting to orchestrate LLMs, that would also be a relevant use case. 00:57:51.260 |
Prompt flow you'll find useful when you're managing all your research within Azure AI Studio. 00:57:57.260 |
But within Azure AI Studio, you can launch-- you can create endpoints for open AI models, 00:58:04.260 |
Mistral models, LLAMA models, anything from the model catalog. 00:58:07.260 |
You can create connections into other Azure resources like Cosmos DB and AI Studio-- 00:58:14.260 |
You can hook it up to our evaluation framework. 00:58:18.260 |
All the features in this for managing the entire lifecycle of the endpoint itself and all the 00:58:25.260 |
resources that are required to make that endpoint search. 00:58:27.260 |
That's the situation where you'll mainly be using Prompt Flow is in basically managing 00:58:33.260 |
that connection between all those resources with the goal of creating an endpoint. 00:58:38.260 |
You'll be, in our example here, calling from a web app, just regular React API endpoint. 00:58:46.260 |
But likewise, you wanted to call that same endpoint from within semantic kernel or any other orchestrator, 00:58:56.260 |
And if I may add, what I like also about Prompt Flow is that when you go to AI Studio, 00:59:01.260 |
because it's very, very well integrated in AI Studio, so you have the visual aspect of it. 00:59:06.260 |
But if you go to the playground of AI Studio, you can configure a RAG application and also 00:59:14.260 |
build a code interpreter application and export it to Prompt Flow. 00:59:21.260 |
So instead of having to come up with a whole design of a Prompt Flow-based application, 00:59:27.260 |
you can just use the playground, configure it as you want, and export it and have a ready-to-use 00:59:33.260 |
Prompt Flow application that you can re-use and embed in your project. 00:59:38.260 |
So you also have a workflow that goes from the UI to the code that way. 00:59:46.260 |
If you'd like to play around with that side of it, it's not part of the workshop. 00:59:58.260 |
But feel free, because you have an instance of Azure AI Studio running already in your virtual machine. 01:00:04.260 |
You go through the steps of selecting the one project we have here, which is called Contoso Chat SF AI approach. 01:00:12.260 |
One of the things you might want to play around is the playground. 01:00:15.260 |
We've provided you with GPT-4 and GPT-35 Turbo endpoints, I believe. 01:00:22.260 |
And this is the place where you can interface through a playground to test out the endpoints you've created. 01:00:28.260 |
Same place as when you get into the Prompt Flow thing a little bit later on, you can then test out the connections 01:00:32.260 |
between those endpoints and the databases and everything else you've built your RAG application around. 01:00:38.260 |
From there you have the Prompt Flow button where you can export to Prompt Flow. 01:00:58.260 |
Yeah, so Azure AI Studio is for mainly the purpose of creating an endpoint that works against 01:01:27.260 |
Copilot Studio by hand is about building applications, not endpoints. 01:01:31.260 |
Either building a complete application like a chatbot application or any kind of user interface 01:01:37.260 |
that has an LLM element or integrating those types of applications into applications like teams. 01:01:43.260 |
So Copilot Studio is for building entire apps. 01:01:56.260 |
And as you might have guessed, Copilot Studio was built with AI Studio. 01:02:01.260 |
In terms of the, in terms of, in terms of the, in terms of the, in terms of the 01:02:07.260 |
We were getting a bit of a big question for an exploring that. 01:02:11.260 |
Will you see it that, that we build an application in Copilot Studio and then call it 01:02:21.260 |
If you, if you want that level of customization, you don't need to with Copilot Studio. 01:02:25.260 |
Is it designed in such a way you can build complete apps without having to customize your own endpoints? 01:02:30.260 |
But if that's the position you're in for a particular use case, then yes, absolutely. 01:02:35.260 |
You can call any endpoint from Copilot Studio, including ones created with AI Studio. 01:02:40.260 |
I don't believe, I'm not so familiar with that product, but I don't believe they have evaluation 01:02:49.260 |
Yeah, you would evaluate the, the LLM through its endpoint here in AI Studio. 01:02:57.260 |
We'll come to that in probably about 20 minutes or so. 01:02:59.260 |
But here is the very simplest prompt flow I just generated from the chat playground. 01:03:15.260 |
All it does is take an input, passes it through to the LLM called chat there, and then generates 01:03:27.260 |
Whereas if you use an application like ChatGPT, there's a whole much more going on with your 01:03:33.260 |
prompt and the context and everything else in a more rag style than just passing it directly 01:03:39.260 |
I'll keep on going to catch up to where you all are so I can demo the good bits when we 01:03:53.260 |
That's where I would get the documentation to make it a little bit clearer. 01:04:00.260 |
So, you're just missing that in the document that makes it clear. 01:04:16.260 |
And then, you're going to execute the database. 01:04:20.260 |
So, you're just missing that in the document that makes it clear. 01:05:32.300 |
The question was, are we able to use another vector database besides Cosmos DB? 01:05:44.900 |
When we get the prompt flow up, I'll show you that. 01:05:47.380 |
But if we go into the prompt flow node, which I think is the retrieve product information 01:05:53.480 |
node, you can see it's set up directly as a connection to AI search. 01:05:58.240 |
But you can set it up as a connection to other vector databases as well, or even just an 01:06:04.300 |
What do you think is the biggest differentiator between AWS and Microsoft? 01:06:09.500 |
What do I think is the biggest differentiator between AWS and Microsoft? 01:06:12.660 |
Like why should I go with using Microsoft for startups versus AWS for startups? 01:06:18.080 |
That's a different question because I was going to go like, the main differentiator between 01:06:25.040 |
You know, Azure is a database that's used by big companies like Microsoft and so forth and 01:06:30.180 |
is designed with a lot of features in that around authentication, security, scaling, monitoring 01:06:36.120 |
that big companies that run production apps really need. 01:06:39.140 |
That actually makes it a little more difficult for startups, honestly. 01:06:43.100 |
Because the first thing you do when you start working with Azure is start dealing with things 01:06:46.560 |
like resource groups and security, you know more about this than I do. 01:06:53.240 |
That's why we set up Microsoft for startups, is to help startups get into the process of understanding 01:06:59.200 |
the different kind of process of working with resources in Azure, which is quite different 01:07:03.820 |
from AWS in the sense that you can just spin up a single VM in AWS and you're done. 01:07:10.080 |
Microsoft, when you spin up a VM, there's actually six different resources that get created 01:07:14.020 |
because it's there to support the enterprise use case as opposed to just, I want a single 01:07:19.180 |
And I will say Microsoft for startups, they have a pretty cool program, the Microsoft for 01:07:27.240 |
They can give you up to $1,000, $100,000 in credits for Azure. 01:07:33.920 |
So, you know, level one, they give you this many credits. 01:07:37.480 |
So, you really get a lot of credits for developing your first startup application. 01:07:43.860 |
They have, like, different mentors that you can get paired up with. 01:07:47.120 |
They have, like, different sessions that you can attend to learn, like, how do I use this 01:07:52.280 |
So, they give you the credits and a lot of the guidance to make that happen. 01:07:56.420 |
And the whole Microsoft for startups team is here at the booth in, in, um, salon nine. 01:08:02.480 |
So, chat to them about the startups up program and they can get you in at the right level. 01:08:06.620 |
And, um, uh, uh, uh, uh, uh, uh, uh, uh, uh, uh. 01:10:21.620 |
Yeah, I had to start over because I ran that script before I did all the configuration. 01:11:50.660 |
Yeah, the way PromptFlow is set up is it actually manages the chat history for you as it comes through the flow. 01:11:56.520 |
But if you want to restore it beyond that interaction, then yeah, you can store it in the database. 01:12:00.880 |
There might be the documentation that we missed. 01:12:06.660 |
They're going to go in and it's going to go in and put the actual cost going to the LLM. 01:12:13.120 |
Maybe you might ask about like products or like a specific item. 01:12:17.900 |
And then it's going to put those as a part of the workflow, right? 01:12:21.840 |
Actually it's AOAI, and by default it is correctly set. 01:12:34.480 |
But a few of them were, and what's confusing is that in addition to the Azure AOAI connection, you do have a default. 01:12:48.900 |
Yeah, so when you switch to default, actually you break things. 01:12:53.620 |
All right, when folks get to that, I'll have you show them. 01:12:57.540 |
For those of you who haven't done step eight yet, the custom connection for Cosmos DB, I'm going to go through that now. 01:13:11.760 |
Go to the connected resources and view them all, and then create a new connection. 01:13:28.980 |
So this is one of the things that the Prompt Flow does for you, is it has standardized connections to all these tools. 01:13:40.500 |
But if you want to connect to any other service, you can use a custom connection, and that in fact is what we're going to be doing here. 01:13:46.200 |
Cedric looks like we have a question over the back right there. 01:13:50.760 |
I'm going to add the key value pairs for our connection to Cosmos DB. 01:14:02.100 |
I'm going to grab that value from our .env file. 01:14:33.920 |
So if you have a million vectors and you want to perform a nearest neighbor search against a query vector, what are some of the strategies to prune that? 01:14:43.980 |
And then how does that impact recall as well? 01:14:46.480 |
Yeah, well, first of all, that's the entire reason why vector databases exist, is to exactly do that search quickly. 01:14:53.260 |
They're indexed in such a way that they can do that nearest neighbor connection at scale and at speed. 01:15:00.960 |
It's not very difficult to do it yourself, to do an embedding for a bunch of documents or a bunch of chunks of documents, 01:15:08.000 |
and then do a nearest neighbor algorithm to find what is closest to the embedding for the customer's question. 01:15:13.760 |
But the advantage of doing it within a vector database is you can do that quickly and at scale. 01:15:17.060 |
And what was the second part of your question? 01:15:18.740 |
I guess there's different neighbor algorithms, right? 01:15:23.360 |
And then you have to, I think there's parameters that you have to tune. 01:15:26.920 |
That affects recall as well, depending on how far you want the tree search to occur. 01:15:32.200 |
And so what are some strategies to balancing that and getting the highest recall instead of just doing brute force? 01:15:39.420 |
Because brute force, I'm assuming, takes a while at scale. 01:15:43.500 |
Interestingly, and this might not be the answer you expect, is that, at least in our experience working with real-world applications, 01:15:50.640 |
vector search by itself actually isn't enough, despite playing around with the algorithms and choosing the parameters so you can expand out the search and not miss documents. 01:16:00.440 |
What we've actually found is actually a combination of keyword search and vector search together actually outperforms either. 01:16:08.300 |
And that's a feature that's built straight into Azure AI search. 01:16:10.840 |
I think by default these days, it actually defaults to a hybrid search. 01:16:15.360 |
In this particular example, just to keep things simple, we're just doing a vector search. 01:16:19.060 |
But for practical applications, we actually recommend a combo of keyword and vector search. 01:16:25.880 |
And I'm assuming that metadata that you can search, like, for example, keywords can be updated for a product, in this case, at any point in time for any vector, right? 01:16:36.100 |
And then, is there, like, the last question, is there an upper limit on the dimension size of the embeddings? 01:16:42.280 |
I'm sure there is an upper limit somewhere, but we haven't come across it for AI search, at least. 01:16:50.300 |
Yeah, I don't think there's one sort of in there by design. 01:16:52.920 |
I don't think there's one sort of in there, but we haven't come across it, but we haven't come across it, but we haven't come across it, but we haven't come across it, so we haven't come across it now. 01:17:20.740 |
What about using Entry ID with this, certainly external Entry ID, if you're trying to develop this for an externally facing application, multi-tenanted, and everybody having their own little space, shall we say? 01:17:38.600 |
Is there any sort of frameworks or things like this that are set up to show how to sort of set that up across all levels? 01:17:51.200 |
Because you're sort of across, like, if you are, if you're going from what I'm talking about, you're talking about Copilot Studio into this, into AI search, into Cosmos DB. 01:18:00.100 |
Is there any sort of, is there anything written down anywhere? 01:18:03.660 |
Okay, that's an area I'm not an expert in, but Cedric or Miguel might be, either of you, too. 01:18:08.400 |
So, but, and, using external Entry IDs, I know it's only come out a couple of months ago, but being able to create a scope across all of those systems, 01:18:19.740 |
so that the scope that you can see is for that Entry, external Entry ID, is there anything written down to show how to do that across each of the individual systems in the right way, 01:18:31.020 |
or is there anybody working on that, or is, do you know? 01:18:33.360 |
You're, when you said Intry ID, so from an authentication standpoint? 01:18:41.460 |
Yeah, you're talking about the Intry authentication mechanism? 01:18:44.480 |
Yeah, Entry, yeah, specifically the external ID, so using external IDs, so social logins and things like that. 01:18:51.060 |
So, if you're facing this to the outside world, enabling external users, external IDs to actually get scope across the whole thing. 01:18:59.240 |
Yeah, no, it's, to be honest, it's not a domain where I've spent much time yet, so I'm not going to be able to talk much about it. 01:19:06.940 |
You wouldn't have external users log into AI Studio, and that's for the developers and for the IT managers. 01:19:13.960 |
But what you are exposing to the app, which the end users are then accessing, is those endpoints. 01:19:20.680 |
And you can manage those endpoints either by tokens or by managed identity. 01:19:24.700 |
So, whatever way that you want the app to talk to the endpoint based on that identity controls, obviously, what the endpoint is then able to do. 01:19:38.940 |
And then this, at the moment, on the, if I'm reading this correctly, the retrieval document, retrieve documentation from AI search, is retrieving the vector, and then it's handing it off to the actual document itself. 01:19:56.260 |
So, it's only the vector that's getting passed, not the document itself? 01:19:58.740 |
No, well, the vector is then used to retrieve the document, in this case a markdown file. 01:20:03.700 |
And then the markdown file actually gets inserted into the prompt so that the LLM can see it, so the OpenAI can see it, and form its answers based on that information. 01:20:16.060 |
And I'll show you how that gets put together in a sec. 01:20:17.900 |
There is one file there that has the prompt, and then it has the variable for the document, so that variable simply gets replaced by the text that it grabs from the search. 01:20:45.820 |
So, just on, to continue building on the entry ID piece, because one thing that we're looking into is, for example, if we want to do granular access control, and make sure that we don't pass to our prompt flow, you know, the ability for it to search any database and retrieve any data from a user, just to make sure that a nefarious actor might not be able to get data from other users by something, through something like prompt injection. 01:21:13.140 |
I noticed that in the authentication type for prompt flow, we're able to use token-based, so Azure AD tokens. 01:21:21.380 |
Let's say that we do have a user log into our app, pass through that token to call the prompt flow endpoint. 01:21:30.120 |
Can we then use that token authentication of the user to call subsequent components of our Azure stack? 01:21:39.960 |
So, to pass through that user's delegated token, and just make sure that we're retrieving data for that user? 01:21:57.400 |
So, the only thing is that I don't know much about intra. 01:22:01.080 |
So, any intra-specific things, I don't know much. 01:22:05.140 |
But, because I joined Microsoft not so long ago, but when it comes to, like, normal security, you could use any kind of token, such as JWT token, and then, like you would, you know, encode a JWT token for any kind of REST API. 01:22:24.100 |
You can pass the JWT token to the prompt flow endpoint, and use it inside the prompt flow definition to pass on to whatever internal service you would like to. 01:22:38.760 |
So, I think a very concrete example of that would be, let's say we have the Cosmos DB endpoint. 01:22:44.900 |
And I want to ensure that I can only access the specifics user data in that Cosmos DB. 01:22:52.840 |
So, I want to actually use, like, row-based access control, where that user is only allowed to see certain rows in that. 01:23:00.540 |
I can actually show you something to do with that right here, now that I understand. 01:23:04.640 |
So, what we have here is the simple prompt flow for this particular application, where I do the input. 01:23:11.520 |
There's the embedding to retrieve the product documentation. 01:23:16.220 |
What's relevant here to your question is, let me make this a bit bigger so we can see. 01:23:22.600 |
And the customer lookup, which is using information that is provided by the customer through them having logged into the website. 01:23:36.160 |
So, for example, when I go to this website, do I have it open still? 01:23:53.920 |
This link is given in one of the last steps of your AITour T. 01:24:04.680 |
So, in this case, when the LLM gets the prompt, it already knows who you are. 01:24:13.020 |
And it gives to the LLM only David's information. 01:24:16.360 |
It will be different if the LLM asks, you know, what's your name? 01:24:24.240 |
But in this case, all of that is set up so that when the LLM gets there, it already has your name and it's authenticated. 01:24:30.240 |
And it has only your information and only your information. 01:24:33.440 |
And I think that's just about what you were going to show. 01:24:39.960 |
So, when we're at this chat application, we can see Sarah Lee is logged in already. 01:24:45.720 |
So, as we go through the prompt flow, one of the inputs there is the customer ID. 01:24:50.980 |
And that's come from this app through the token that's being provided to the endpoint. 01:24:56.100 |
Actually, no, it's a parameter to the endpoint in this particular case. 01:25:00.940 |
But when we ask the question, what did I order last time? 01:25:05.480 |
What's important to understand there is that there's nothing in this app that is searching any database. 01:25:16.380 |
All it's doing is passing the user ID and that question, what did I order last time, into that whole prompt flow. 01:25:25.000 |
And then as part of that prompt flow process, which is a privileged account, it is then query the database with that user ID to get back her list of product purchases. 01:25:37.760 |
Then the LLM is operating on that information with that question to generate that bit of text that you see. 01:25:43.660 |
And that bit of text is the only thing that actually goes back to the app. 01:25:46.720 |
So, the app doesn't have any direct access to databases at all. 01:25:53.760 |
And I think you hit the nail on the head when you're saying that prompt flow has privileged access to the database. 01:25:58.400 |
What I'm trying to avoid is for prompt flow to have privileged access. 01:26:02.100 |
What I want to do is to inherit the access of the calling user through, for example, his AD token. 01:26:12.140 |
But I think the way that would work is through the features of the database where you give that authentication information with your query that prevents you otherwise. 01:26:20.480 |
Then would that prevent what you're trying to do? 01:26:23.440 |
I'm just wondering whether or not that's already something that you're looking into with, for example, the prompt flow connection to Cosmos DB. 01:26:33.900 |
Because that's the whole purpose for this thing existing in the first place. 01:26:37.640 |
This is what Copilot Chat was built on, for example. 01:26:41.020 |
And sort of that's all based on enterprise logins and things like that. 01:26:50.860 |
Like, if you want to, like, have variable loops in the control DAG, is that possible? 01:27:02.540 |
Yeah, the question on that, do you have variable control points in the loop? 01:27:05.720 |
If you run this within Visual Studio Code, you can set breakpoints in the Python code that runs within each of the nodes. 01:27:13.380 |
I guess, like, depending on the results, like, of one node, for example. 01:27:19.340 |
You might want to route to, like, different nodes. 01:27:22.660 |
Because within, what's actually being run within each of those nodes, let's actually even take a look at that. 01:27:27.640 |
If I go over to the prompt flow itself, let's have a look, for example, at the LLM response node, I think. 01:27:40.960 |
Let's have a look at the concept of the prompt node. 01:27:44.800 |
So, what's actually happening at this point is it's got, this is actually just the prompt that actually gets built together. 01:27:54.740 |
But you can see, like, it's got this, like, metaprogramming language, you know, for item and documentation and so forth. 01:28:00.040 |
So, what that's doing through at that point is looping through all of the matched products that are related to the user query, extracting them out from Azure AI search as vectors, then extracting out the markdown files that relate to those vector indices, putting that directly into this prompt. 01:28:19.040 |
So, when I ask the question of the app, you know, what's a good pair of shoes, that's not the only bit of text that is going to OpenAI at that point. 01:28:39.340 |
What is, in fact, going to OpenAI is a whole bunch of text defined by this customer prompt here, including telling OpenAI, you're an AI agent for the Contours of Outdoor Product Retailer, you should always reference factual statements, the following documentation should be used in the response, and this is where the individual relevant products are inserted into the prompt. 01:29:04.040 |
And, to our question earlier on, for this particular customer, this is where their previous orders are inserted into the prompt. 01:29:10.100 |
And then, finally, the question, you know, what's a good pair of shoes, is sent to OpenAI. 01:29:15.720 |
So, it has all that context from that RAG process to formulate a meaningful response based on that particular customer, their purchase history, the question, and the products that are related to their question. 01:29:28.780 |
Yeah, I want to give you another example, here's a better example of that kind of thing, in this case, this particular node is just running Python code. 01:29:40.900 |
So, you can put conditionals into that Python code, for example, based on the inputs to do different kinds of things, anything you like, in fact. 01:29:48.360 |
May I think I know the specifics that you have in mind. 01:29:57.460 |
What David showed, he showed two things, he showed in the templates, in the template nodes, he showed looping logic, and conditional, but it's looping and conditional string rendering constructs. 01:30:14.780 |
And in the Python code, you can have any Python, like you can have conditions, loops, whatever, but to your point, all the nodes in the DAG are going to be executed. 01:30:27.700 |
You cannot have conditional node execution, but what you can have is inside a node, in the Python code, you can conditionally execute something. 01:30:41.540 |
But all the nodes are systematically going to be executed. 01:30:47.280 |
It is not a business process orchestration system. 01:30:52.120 |
It is really tailored towards building LLM applications. 01:31:03.840 |
Yeah, it looked like in that Python, though, that you had, that's where I was doing the customer lookup. 01:31:16.620 |
I mean, I see the line connecting it, and then I see the Jinja template for the prompt, and the Jinja template was iterating over customers, and that, you know, for, or sorry, iterating over the orders. 01:31:29.000 |
So, how does that, how does that tie together? 01:31:38.380 |
Yeah, there was, like, the customer lookup Python that we were just looking at. 01:31:44.740 |
The one on the right, yeah, the customer lookup. 01:31:47.660 |
You have one node which queries the database, fetches all the information from the database, stores it into a variable, into the context, and then the Jinja template uses the previously set collection of results for the rendering. 01:32:09.340 |
So, when you say it stores it into it, is that where the response, the orders on line 13, is that doing it? 01:32:17.760 |
Okay, and then if we, if we click on the next one down, the customer prompt, and we go to that loop again, there it is, well, but, oh, okay, customer.orders, so that's, that's how it ties, then, eh? 01:32:31.140 |
Using input and output bindings on each node. 01:32:33.980 |
Yeah, and the arrows that are coming into the top of the representation in this graph, those are the inputs, and the arrow coming out of the back is the outputs, and there could be multiple of those. 01:33:44.200 |
It's called a visual editor, but it's really more of a visual reader, and that is absolutely true. 01:33:49.540 |
I want to highlight a little subtlety, too, when you get to step 10, when you first run your prompt flow in Visual Studio Code, you're going to be clicking on the run button once you've viewed the prompt flow itself in the Visual Studio Code environment. 01:34:05.020 |
You can see the commands it's running, it's just running a little Python command to launch in the YAML file, but what I want to emphasize here is that, in reality, everything here is running locally, and in fact, in the usual developer environment, it will be running directly on your laptop or a shared machine. 01:34:22.360 |
In this case, in this case, it's running on the GitHub code space environment, and the whole idea behind this is you have a very fast responsive place to try out different prompts, to make sure your connections are working, perhaps testing different types of LLMs, replacing them in the LLM steps, so you can actually figure out what are the bits of the puzzle that go together to give you a good experience for the endpoint that you're trying to create, just in a local environment. 01:34:51.220 |
Now, I'll say local because, of course, the database is still in the cloud, and the OpenAI endpoint is still in the cloud, but all the orchestration is happening directly on your local machine. 01:35:00.460 |
Our next step after this is going to be then publish that prompt flow into Azure, inside its own container app, as it turns out, and then that's going to be a hosted cloud version of that same prompt flow, which is going to support the production use of that endpoint in your application. 01:35:17.960 |
Yeah, side effect of what David just said is that because it's building a Docker container, you can actually customize environment and add packages. 01:35:43.820 |
So, earlier we were talking about the differences between 17 kernel and prompt flow, one of the nice things with prompt flow is that it's very interesting for web developers, because they don't have to care about creating an environment, deploying a Docker environment, scaling it, the whole scaling is done automatically by the platform. 01:36:10.040 |
So, you just need to add packages, so you can combine an LLM with some packages for some specific processing, and the whole deployment is done automatically, so you can focus on the UI and the user experience. 01:36:26.300 |
So, all right, and then let me get to the evaluation. 01:36:38.080 |
So, all right, and then let me get to the evaluation. 01:36:38.080 |
So, all right, and then let me get to the evaluation. 01:36:39.920 |
So, all right, and then let me get to the evaluation. 01:36:40.860 |
So, all right, and then let me get to the evaluation. 01:36:41.820 |
So, all right, and then let me get to the evaluation. 01:36:42.860 |
So, all right, and then let me get to the evaluation. 01:36:43.560 |
So, all right, and then let me get to the evaluation. 01:36:44.860 |
So, all right, and then let me get to the evaluation. 01:36:46.700 |
So, all right, and then let me get to the evaluation. 01:36:48.260 |
So, all right, and then let me get to the evaluation. 01:36:49.860 |
So, all right, and then let me get to the evaluation. 01:36:50.860 |
So, all right, and then let me get to the evaluation. 01:36:51.860 |
So, all right, and then let me get to the evaluation. 01:36:52.860 |
So, all right, and then let me get to the evaluation. 01:36:53.860 |
So, all right, and then let me get to the evaluation. 01:37:05.640 |
So, all right, and then let me get to the evaluation. 01:37:07.640 |
So, all right, and then let me get to the evaluation. 01:37:13.640 |
So, all right, and then let me get to the evaluation. 01:37:16.640 |
So, all right, and then let me get to the evaluation. 01:37:20.580 |
So, all right, and then let me get to the evaluation. 01:37:22.640 |
So, all right, and then let me get to the evaluation. 01:37:23.640 |
so, all right, and then let me get to the evaluation. 01:37:53.420 |
You just need the purchase history for that question. 01:38:03.900 |
So, in that specific DAG, we will systematically query both the vector database and the customer 01:38:19.100 |
So, yes, when it comes to answering the question of, can you repeat the question? 01:38:37.580 |
Yeah, it was what else did I purchase, then because we will query the order history from the relational database, the LLM is going to pay more attention to that part of the context than the product documentation side of things. 01:39:00.060 |
But it's really, but it's really, we are really relying on the feature of the LLM to be able to pay attention to what matters. 01:39:21.020 |
Is there is a form of a form of a form of a form of relational query to ? 01:39:27.500 |
Because that specific prompt flow DAG just returns the, I believe, the last ten orders from the history is what we do, I think. 01:39:48.460 |
And that's what we put in the context, because that RAC application is for workshops and demos. 01:39:48.460 |
What you're talking about is to do something else, which is text to SQL, where you take a query in natural language, you transform it into a SQL query that you execute against a database where you have a filter where date is one month ago, or whatever. 01:40:10.940 |
So that's a similar use case, but a different implementation. 01:40:18.140 |
And that's also an area to be wary of, too, because that's an area where prompt injection could come into the fact. 01:40:25.100 |
If you're forming a SQL query on the basis of user input, you've got to recognize that there might be malicious input in that process, which might generate SQL. 01:40:34.300 |
There's still an intermediate step, it's not directly pasting the string into a SQL query, but there still is an opportunity there for bad actors to control what happens at that SQL generation step. 01:40:45.500 |
And I believe we have a template for that, I believe Pamela has created a template called RAG on PostgreSQL, which is now in the same Azure samples GitHub account. 01:41:05.500 |
It takes a natural language query, transforms it into a SQL query, and executes it on a PostgreSQL database, but you could do the same with Cosmos DB. 01:41:18.700 |
So that actually leads me into another topic, which I wanted to get to before we run out of time here today, which is about evaluation. 01:41:32.020 |
Any time that you put any kind of an LM-based application into production, where users are going to be providing input to that. 01:41:40.020 |
And in this context of a chatbot, the kind of questions you want to ask are, did my chatbot give a relevant answer to my user's question? 01:41:52.020 |
Was the chatbot's answer grounded in the information that is available in my databases that is part of my RAG flow? 01:42:08.020 |
And the other metric that is in that list, which I'm trying to remember right now, is, I'll get back to in a minute. 01:42:15.020 |
But when you get to step number 13, we're going to take you through a Python notebook, which shows you a process for answering these questions manually, essentially. 01:42:31.020 |
And then I'm going to show you how that's built into the Azure AI Studio platform itself. 01:42:36.020 |
But we think of debugging in just regular apps and tests that we write for applications. 01:42:46.020 |
Like, did the application return a positive value when it should be a positive value? 01:42:53.020 |
Very easy thing to test for in programming style. 01:42:56.020 |
Much more difficult test to answer the question was, is the answer generated by my chatbot relevant? 01:43:08.020 |
And the answer is, you get an LLM to answer that question. 01:43:13.020 |
Now, this particular chatbot application we have running on GPT-35 Turbo. 01:43:28.020 |
I haven't played around with it a lot myself. 01:43:30.020 |
But I imagine that will probably take the place of GPT-35 Turbo in a lot of these applications pretty soon. 01:43:37.020 |
Next time we run this workshop, we're going to switch it over to using GPT-40. 01:43:44.020 |
There are very large, very powerful LLMs that have reasoning capabilities in some sense. 01:43:53.020 |
Now, you wouldn't want to use GPT-4 in a production application like this. 01:43:58.020 |
Because every time the user types in a chat, not only are they going to have to wait quite a long time for a response, 01:44:03.020 |
but it's going to cost you a lot of money on the endpoint. 01:44:05.020 |
In this RAG architecture, GPT-35 works great. 01:44:08.020 |
As long as you give it the context it needs to answer that question. 01:44:12.020 |
But for this testing paradigm, for asking the question, is the answer, trail master jackets are good, to the question, what jackets should I buy? 01:44:26.020 |
That is the kind of question a powerful LLM like GPT-4 can answer quite readily. 01:44:31.020 |
So think about how you might automate that process. 01:44:37.020 |
Given this question, and this context, the stuff that we put into the RAG, and this answer, ask GPT-4 on a scale of 0 to 5, how relevant is this answer? 01:44:53.020 |
How grounded is this answer in the data that I've also provided here? 01:45:01.020 |
And these are all things that GPT-4 can do quite readily. 01:45:04.020 |
And we can use the scores in this case that GPT-4 provides as a ranking of how well GPT-3-5 is doing in our endpoint for generating its answers based on the RAG process. 01:45:17.020 |
In this notebook, at the top of it, you can put in a question. 01:45:23.020 |
I just ran it on, can you tell me about your jackets? 01:45:27.020 |
You can even see the prompts that it's using to GPT-4 to answer these questions. 01:45:31.020 |
And you can see the actual answers that came back are in the next node up here. 01:45:37.020 |
Hey, Sarah Lee, let me tell you about our jackets. 01:45:39.020 |
We have two awesome things, awesome options that will go over your previous purchase. 01:45:48.020 |
This is the context that the RAG process was provided by that it used to generate that answer. 01:45:54.020 |
And then with that information, we can ask those questions we just asked. 01:45:58.020 |
Was that answer about jackets grounded in Contoso's product database? 01:46:05.020 |
Sorry, a rank of five on a scale of zero to five. 01:46:09.020 |
And likewise, we can ask questions about coherence, fluency, groundedness, and relevance. 01:46:17.020 |
This particular question is doing really well. 01:46:19.020 |
You probably also want to test out your LLM on some adversarial types of responses. 01:46:26.020 |
For example, you might ask the question, you know, I want to buy a toothbrush. 01:46:38.020 |
Nothing is going to come up in the database when we do the RAG search. 01:46:42.020 |
Well, actually, something will come back because we always get back some responses that are somewhat 01:46:48.020 |
But let's see how our LLM actually does here. 01:46:51.020 |
When I run this notebook, it's going to run through those scripts. 01:46:55.020 |
It's going to pass that question to our RAG flow, generate the response with GPT-3-5, 01:47:02.020 |
and then ask GPT-4 to rank it on those four scripts using the prompts that are linked to in this script. 01:47:08.020 |
And when we come back to it, we can see the answer it came back with was, hi, Sarah. 01:47:13.020 |
Since you're into outdoor adventures, I recommend the Fresh Breeze -- where's my scroll bar? 01:47:20.020 |
Fresh Breeze Travel something, something like that. 01:47:27.020 |
Qantasda does not sell a Fresh Breeze Travel Toothbrush. 01:47:31.020 |
GPT-3-5 just made that up out of whole cloth. 01:47:37.020 |
And we have to test to see whether or not they're doing these kinds of things for the type of use cases that we anticipate. 01:47:43.020 |
And we can detect that particular test is not going well. 01:47:50.020 |
It really wasn't grounded in our data because there was nothing about toothbrushes in our context data that we provided through RAG. 01:47:57.020 |
And similarly, coherence -- well, it was in nice English. 01:48:00.020 |
So I've got a score of four for coherence, a score of five for fluency, but one for groundedness and one for relevance. 01:48:07.020 |
And so now you can think about automating this process. 01:48:10.020 |
You can think about what are the types of questions that we want our application to do well at. 01:48:16.020 |
What are the types of questions that we might want to, say, not give any responses to at all and score accordingly. 01:48:22.020 |
And I won't go through all the details of this, but when we get into AI Studio, there's a whole section here on evaluation. 01:48:36.020 |
And this is the process where you can actually load into it a bunch of tests, which in this case are not Python code or C# code or whatever. 01:48:45.020 |
It's responses -- so questions, responses, and context. 01:48:49.020 |
And then automate the process of evaluating how your end point, how your RAG process does on all those questions. 01:48:57.020 |
So that next time when you add new products, or next time when you decide to upgrade from GPT-3.5 to GPT-4.0, you've got a series of tests ready to go to evaluate how well your application does in face of those changes. 01:49:14.020 |
So for -- after you've evaluated the model and you sort of understand the performance of it, what typically are your next steps and what actions do you take to drive improvement on the measures that you see there? 01:49:33.020 |
But it's -- this is essentially the LLM Ops process, which is essentially the same as DevOps, but with a fancier name that gets you lots of funding. 01:49:49.020 |
So exactly the same idea as when we build applications. 01:49:56.020 |
We do some exploration, testing it to our data. 01:49:59.020 |
We build our basic prompt flow in the LLM case. 01:50:02.020 |
And we develop our first version of that flow. 01:50:05.020 |
And then we actually run it against sample data. 01:50:12.020 |
If the evaluations don't give the scores that we're looking for before we put it to production, the next step then is to modify our prompts. 01:50:19.020 |
You saw that ninja template with a bunch of prompts around do this, don't do that. 01:50:23.020 |
You would modify those until you get the behavior that you're looking for. 01:50:30.020 |
Maybe you present it differently to the rag process. 01:50:33.020 |
Then once you get satisfied in that process, you would keep on testing that against perhaps a live user cohort or bring in some testers. 01:50:43.020 |
And again, go through that same evaluation process until you're satisfied. 01:50:47.020 |
And then finally you'll be ready to actually deploy that to production. 01:50:52.020 |
You would actually monitor live probably a sample of actual user questions responses and have, you know, real-time charts. 01:51:01.020 |
Not real-time charts actually, probably daily charts of how your model is doing in scores like roundness for the types of questions that you can ask. 01:51:08.020 |
And that might be detecting, you know, maybe things are drifting because your product set has changed and there are trigger words in your products that are making the GPT model do strange things. 01:51:19.020 |
Maybe you've got some adversaries that are coming in to try and hack into your system. 01:51:22.020 |
That might come up in some of your monitoring scores. 01:51:25.020 |
And then you go back through that iteration process to go back and build and augment the model for its next deployment. 01:51:31.020 |
Is that the kind of question you're looking for? 01:51:39.020 |
You know, right now when we look at the input from the user, you put text. 01:51:44.020 |
You know, you put what you have purchased or something like that. 01:51:48.020 |
Can this be improved to take, like, you know, graph or PDF file? 01:52:02.020 |
PDF file, let's say, I want to input my PO rather than I type something. 01:52:09.020 |
Azure Search, for example, can index PDF files and then you can do a search to find the PDF file that's most relevant to that user's question. 01:52:16.020 |
You can then extract down from that PDF file context, which is put into the prompt, which is then used to generate the response. 01:52:22.020 |
And then you can put references back to those source files if it's a trusted user kind of a situation, so they can come back to see them. 01:52:32.020 |
But still, you know, from the, you know, prompt, you can only input text, right? 01:52:40.020 |
This particular example, everything is converted into text. 01:52:43.020 |
But today, we have what we call multimodal models. 01:52:46.020 |
GPT-40, for example, as the prompt, you can input not just text, but also images, even audio. 01:52:55.020 |
But you could set up that RAG application to insert into the prompt the images or the audio or whatever it is you want the LLM to be able to reference. 01:53:04.020 |
That's still relatively new, because 4.0 doesn't have all of its multimodal capabilities out yet, but the principle exists. 01:53:18.020 |
And Florence version 2 was actually released earlier this week, which is a model which allows you to do image to text. 01:53:26.020 |
So you can analyze an image, generate text out of it, and then take the text and give it to GPT 3.5 or something else. 01:53:39.020 |
It's one of the things I'm talking about, yes. 01:53:50.020 |
Because that's one kind of primary request from our team. 01:53:58.020 |
And right now, I have built, you know, this, you know, customized prompt window. 01:54:06.020 |
And now they want to say, okay, I want to use PDF file or even GPT file. 01:54:12.020 |
So, well, like we said, for GPT file, when GPT 4.0, when we make available on Azure the multimodality 01:54:25.020 |
capabilities, then you will be able to use it directly. 01:54:28.020 |
For now, you can use another image to text model, such as Florence 2 or something else. 01:54:34.020 |
For PDFs, so it really depends exactly what your use case is. 01:54:51.020 |
So if it's transient, then an alternative approach is instead of taking the PDF and storing it into 01:54:57.020 |
the vector database and indexing it long term, which you can use... 01:55:07.020 |
You upload the PDF, you chunk it, and you can actually... 01:55:11.020 |
The algorithms for embedding the chunks, you can actually run them in Python in memory. 01:55:19.020 |
You don't have to do it like, you know, in a long term vector database. 01:55:23.020 |
So you can do the chunking and the embedding in memory. 01:55:26.020 |
And actually, the vector similarity search function that to find what's relevant, you can execute 01:55:34.020 |
those functions in memory to provide your users with transient fMRI experience where they 01:55:42.020 |
upload the PDF, and you query the PDF just for the sake of the current conversation. 01:55:50.020 |
I think I need to take this offline with you. 01:55:55.020 |
I'll explain a little bit, you know, in PDF and then you can... 01:56:00.020 |
And you can do it in memory or in a PostgreSQL database or a Cosmos DB that works too. 01:56:04.020 |
And delete the data after once you are over with it. 01:56:09.020 |
And the last thing I wanted to say regarding, because that's a good question, is the last 01:56:13.020 |
thing I wanted to say is right now, today, you can go to Azure OpenAI, you can deploy it 01:56:19.020 |
at GPT-4 model, and in the chat, they have, like, a chat section where you can chat with 01:56:30.020 |
There you can enter a picture and you can do things with it. 01:56:37.020 |
And it will write code according to the picture. 01:56:40.020 |
And this is more related to RG prompt window. 01:56:45.020 |
And so far, you know, what I have, you know, developed can only take text. 01:56:55.020 |
And not to cut you off, because we love these detailed questions, but I've been told that 01:56:58.020 |
I'm going to get cut off up here in just a minute. 01:57:00.020 |
And before I do that, I just want to let you all know that we'll be here for a few minutes 01:57:05.020 |
for in-person questions around, but also come to the Microsoft booth in Salon 9. 01:57:10.020 |
Lots of people there to have, you can ask exactly these kinds of questions of, so please go ahead 01:57:15.020 |
Cedric is also giving a talk tomorrow about multimodal models at 12:30 PM in Salon 10. 01:57:26.020 |
If you'd like to do this at home, the repository is already in your GitHub account. 01:57:30.020 |
And if you happen to miss that step, there's a QR code where you can get to it there as well.