back to indexBuilding Code First AI Agents with Azure AI Agent Service — Cedric Vidal, Microsoft

00:00:00.000 |
Well, this is very exciting because, I mean, 2025 is clearly the year of agents. 00:00:19.900 |
Compared to the past two years, things have moved so fast. 00:00:24.320 |
We went from very simple prompts, which were already incredible, 00:00:29.100 |
but now we move to the next step where we have agents that can autonomously achieve goals 00:00:37.400 |
without us knowing exactly how they do that, which is quite incredible. 00:00:49.280 |
I am Cedric Vidal. I am a Principal AI Advocate at Microsoft. 00:00:57.200 |
And today I'm going to be your host, and I am going to be helped today by Procter's, by Mark. 00:01:06.780 |
And I'm sorry, I forget your name. I feel terrible. 00:01:16.480 |
So a big thank you to you two to help me today. 00:01:21.280 |
So during the workshop, if you have any questions, please raise your hand. 00:01:26.020 |
And Mark or Argmar will come and help answer questions. 00:01:40.780 |
Anyway, so today, like I said, I'm going to set the scene first. 00:01:52.500 |
So in order to put our hands on the keyboard and create and show to you how to create an agent, 00:02:03.080 |
Imagine that you are working for an outdoor and hiking equipment company that sells equipment online. 00:02:11.560 |
We are going to, so what you want to do is you need to build a system that allows to analyze your sales data 00:02:22.260 |
mixed with product information, generate ad hoc diagrams, basically a UX that your sales people can use 00:02:31.160 |
very easily where we move away from the old paradigm where we had to hard code every single use case, 00:02:41.080 |
every single view, every single query, where now the database, the queries are going to be generated automatically. 00:02:49.360 |
The UX is going to be generated automatically to accommodate the type of information that you are displaying. 00:02:56.360 |
And we are going to see how to create such an application. 00:03:07.120 |
Because like the definition of an agent has changed so often. 00:03:12.780 |
Let's be honest, even the specialists in the industry don't agree exactly on what they are. 00:03:18.460 |
And even the definition of what an agency is has evolved over time over the past three years, 00:03:23.600 |
as people have got more acquainted and were discovering what we could do with it. 00:03:28.640 |
You would imagine that a definition should be set in stone, but in that case, it's been difficult to agree. 00:03:34.820 |
But the definition we're going to use today is that it's semi-autonomous software 00:03:45.360 |
using tools and information that you can pull from databases and data stores at large and iterate until it achieves that goal. 00:04:03.320 |
So until the system stabilizes and the goal are met. 00:04:11.520 |
But we're going to see that, depending on the context, it can be more or less simple or complex. 00:04:19.860 |
So in order to do that, an agent should be able to do three things, do reasoning over a provided context, 00:04:27.960 |
to provide a cognitive function such as deduction, correlation, understanding cause and effect. 00:04:34.000 |
So that's all that domain of cognition, the LLM, now has proven that it was able to do a lot of those. 00:04:41.580 |
Not perfectly, but it's getting better every day. 00:04:45.480 |
The second one is integrate with data sources for context, and the last one is act on the world. 00:04:51.620 |
Because in order to stabilize the system, in order to be useful, like before the first generations of LLM-powered system, 00:04:59.260 |
we are just about putting information and displaying it. 00:05:01.500 |
But now we are moving a step forward where we are acting on the world and modifying the environment 00:05:07.360 |
until it stabilizes and reaches the expected goal. 00:05:11.900 |
So, what kind of application are we going to build today? 00:05:20.280 |
It will look like this, so this is a screenshot of a slightly different application. 00:05:25.240 |
What you're going to build today does not look exactly like this. 00:05:28.020 |
But the idea is that you can ask a question in plain English, such as show the sales of backpacking tents by region, 00:05:35.880 |
and include a brief description in the table about each tent. 00:05:38.920 |
And it's going to pull information from the database, as well as the product information, 00:05:46.020 |
and mix all that information together, reason about it, and display the content, 00:05:50.920 |
and the shape and form of the display of the UX will depend on the type of information which is requested. 00:06:02.540 |
Because the question here, yeah, create a pie chart of sales by region. 00:06:06.920 |
So, the system is going to understand that we want to create a visualization. 00:06:16.820 |
What technology are we going to use today to build our system? 00:06:25.960 |
So, before I dig more into the details of what this is, you have so many ways to build an agent today. 00:06:35.640 |
So many frameworks, long chain, long graph, semantic kernel, and so many others. 00:06:42.840 |
The Azure AI agent service has the advantage that it's stateful and quite easy to put together. 00:06:53.580 |
Because usually, when you build any kind of LLM application, I don't know if you're aware, but it's stateless. 00:06:59.360 |
You need to manage the state client-side, so it's the responsibility of the application developer to store the conversations and to handle all the logic of pulling information from various systems, as well as executing the functions, the tools. 00:07:15.240 |
So, agent service moves all the responsibility to the cloud on the Azure platform, and everything is managed. 00:07:26.620 |
But when it's about the pros, the big advantage is that it provides a very simple development workflow, because all the state and the context and the agent configuration is managed in the cloud. 00:07:42.460 |
The integration with back-end data and data sources is also managed in the cloud. 00:07:48.180 |
So, even if you may also mix and match, you can mix things that are managed by agent service in the cloud with things that are managed locally, it's possible. 00:08:00.060 |
And it supports all the model families that are supported on the Azure AI Foundry model catalog, plus the Microsoft Enterprise Security, which is very well known to be very robust. 00:08:13.180 |
It's sometimes a bit difficult to set up, but that's the price to pay for security. 00:08:21.060 |
So, the application that I just showed was using Chainlit. 00:08:26.860 |
The one we are going to build today is going to be command line. 00:08:29.220 |
It's going to be slightly easier, but so basically at the top, you have the application layer with the framework. 00:08:39.180 |
So here, Chainlit, in our case, it's going to become a command line, very simple. 00:08:43.180 |
very basic, with a query function, which is going to use Azure AI agent service with instructions and models. 00:08:55.060 |
And actions, so for function coding, and I'm going to explain what it is, code interpreter, and I'm also going to explain what it is. 00:09:04.060 |
File search and grounding was being searched for web information grounding. 00:09:09.940 |
And we're going to go through each one of those during the workshop. 00:09:15.940 |
So, like I said, Azure AI agent service comes with pros and cons. 00:09:24.940 |
The pros is that it manages everything for you. 00:09:30.820 |
The con is that you need to understand the diagram. 00:09:35.820 |
You don't need to understand all the details, but one of the... 00:09:42.820 |
So, the first thing, you're going to have to follow a sequence of steps. 00:09:48.700 |
And you have quite a few steps to follow in order for agent to work. 00:09:56.700 |
Once you have created and configured your agent, your agent exists in the cloud. 00:10:02.700 |
It's kind of weird at first, because when you're used to stateless way of doing things, that stateful programming model is not so common those days anymore. 00:10:14.580 |
But it becomes relevant again in the age of agents. 00:10:20.580 |
Which means that if you have an application that wants to reuse an agent, if you have created the agent before, you need to reconnect to an existing agent. 00:10:29.220 |
It's kind of like in SQL, like the create or update table. 00:10:33.700 |
You only create the schema if it does not exist yet. 00:10:38.100 |
So, it's kind of a create or update agent for most applications. 00:10:43.700 |
Then, you create a thread or you reuse the thread. 00:10:55.780 |
Those are the big steps that you need to get familiar with when building with Azure AI agent service. 00:11:02.260 |
Then, you are willing to configure instructions. 00:11:06.660 |
Those instructions are going to be attached to the agent and they are stateful. 00:11:14.660 |
And once they are there, you can reuse your agent and you don't have to send the instructions every single time. 00:11:21.380 |
Which in terms of bandwidth and network is interesting. 00:11:29.460 |
So, one of the pros of using Azure AI agent service is that you can attach data sources directly to the agent. 00:11:36.900 |
And you can do that either graphically through Azure AI Foundry or you can do that programmatically through the SDK. 00:12:01.620 |
Today, we're going to see file search, code interpreter, function calling, and Bing search. 00:12:11.860 |
Here's an example of a thread, of what a thread looks like. 00:12:14.740 |
So, the user's message is going to be, "Tell me the total sales by region." 00:12:19.940 |
So, what's going to happen is that in order to get the total sales by region, we need first to get the sales. 00:12:29.380 |
I mean, we need to query, sorry, we need to query the sales data store. 00:12:35.620 |
And it happens that in this case, the sales data store is a SQL relational database. 00:12:42.340 |
And as you know it, the way to interact is using SQL queries. 00:12:47.380 |
So, we are going to generate a SQL query dynamically, depending on the user's request. 00:12:55.220 |
And then, we are going to send that SQL query, execute it on the database, get the list of records back, 00:13:04.020 |
re-inject those records into the LLM, which is going to generate a message in plain text from those list of records. 00:13:14.980 |
The SQL light is just for an example, it can be an example. 00:13:22.740 |
In the agent we're going to build today, just because it's convenient, but you can, of course, 00:13:27.380 |
connect it to any database you want, version all, the document, an API, it really doesn't matter. 00:13:34.420 |
For the ones which are managed locally, for the ones which are managed by agent service on the backend, 00:13:40.740 |
the list is more restrictive, and to be honest, I don't have it on the top of my mind. 00:13:45.780 |
You're not going to see any all-back or anything. 00:13:49.220 |
You're not going to see any all-based access control already. 00:14:01.380 |
So then, show as a pie chart, which is the second question we asked in the previous screen I showed you. 00:14:08.660 |
This one is going to be quite interesting, because we're going to use a tool called Code Interpreter. 00:14:18.100 |
What it does is that it's going to take the query, generate Python code, and the Python code is going to be 00:14:26.900 |
executed in a sandbox, in a secure, safe sandbox. 00:14:31.060 |
And it can be whatever is supported, whatever Python packages are available in the environment. 00:14:43.460 |
And it's going to generate the Python code, which is going to generate that visual representation. 00:14:52.660 |
The code is going to save the image somewhere on the file system inside the sandbox. 00:14:57.140 |
Then the agent is going to pull that image out of the sandbox and send it back to the client application. 00:15:05.060 |
Okay, so like what I said, this is quite a lot to digest, right? 00:15:10.340 |
But the thing is, you just have to go through and get a mental model of how that works. 00:15:15.940 |
And once you understand that, you don't have to manage it yourself, which is quite interesting. 00:15:36.180 |
Because we ingest the documents, and I think we ingest the documents inside an AI search instance. 00:15:42.420 |
One more very important thing, function calling. 00:15:47.860 |
So to be honest, function calling is not new when it comes to LLMs. 00:15:52.580 |
I was doing function calling like literally when the first version of TGPT was announced by 00:15:58.020 |
asking the LLM to give me answers from separated by commas and say generate a function and an argument 00:16:07.860 |
But nowadays, it's much more efficient to have structured output. 00:16:14.900 |
Even the LLMs are optimized under the hood inside the data center to generate 00:16:27.060 |
So the principle of function calling, actually, the name is bad. 00:16:31.940 |
I've hated that name ever since it was coined because it's not function calling. 00:16:45.780 |
So what it does, rather, is that it generates a JSON representation telling you 00:16:53.860 |
what function to call with what parameter values. 00:16:59.860 |
And it's going to map the natural language sentence and extract 00:17:06.420 |
values that it's going to pass as parameter to the function to be called. 00:17:11.940 |
And then it's the responsibility of the application code to take that function 00:17:16.420 |
call specification and actually execute the code. 00:17:25.140 |
How does LLM know which functions you use in your computer classification model? 00:17:31.860 |
Like, do you know under the hood how does it know? 00:17:34.900 |
Oh, that is an excellent-- that is a very good question. 00:17:40.100 |
Well, it's exactly the same way when you ask a question. 00:17:48.100 |
For example, when you ask, what is the color of the sky? 00:17:57.460 |
Obviously, the answer is going to be blue most of the time. 00:18:01.380 |
The way LLMs work is that it's a statistical distribution. 00:18:07.540 |
You have the probability that the word that comes after-- when you ask the question, 00:18:14.100 |
The most probable answer is going to be the color of the sky is. 00:18:19.220 |
And after is, the most probable color is going to be blue. 00:18:22.020 |
Because that's how the model has been pre-trained, right? 00:18:26.660 |
But if you say, what is the color of the sky? 00:18:32.020 |
And in order to get me the answer, you can only use, for example, color equal and the value of the color. 00:18:46.180 |
You tell the LLM, hey, here's how-- here's the output that I want. 00:18:49.940 |
And then it's going to generate color equals blue. 00:18:53.060 |
Just because you constrained, like statistically, in the world of all the possibilities that you can answer, 00:19:01.220 |
You're narrowing down the type of output that you want to be generated. 00:19:13.540 |
And you're going to interpret color equals blue into whatever you want with it. 00:19:17.860 |
Except in the world of function calling, it's not color equals blue. 00:19:20.980 |
It's set dash-- you have a tool, say, execute color, which takes a parameter with the value of the color. 00:19:29.780 |
And instead of having just one, you have many possible functions. 00:19:37.780 |
When you imagine you have one question to set the color and another function to order a pizza, 00:19:45.460 |
If you ask what the color of the sky is, it's not going to generate an order pizza call for the same reason. 00:19:54.500 |
So how do you know when you function calling versus creating your own classification model? 00:20:00.420 |
Okay, well, because a classification model cannot extract values, entities, out of a context and 00:20:11.140 |
like answer many, many answers across multiple dimensions at the same time. 00:20:17.540 |
Classification is just one out of n possibilities. 00:20:30.100 |
Okay, so I want you to, on your laptops, to open the following URL. 00:20:54.660 |
So Microsoft events, all attached, dot events, sorry, oh my god, dot learn on demand dot net. 00:21:31.380 |
Well, usually you're going to use Microsoft account. 00:21:36.980 |
Huh, I thought we had all the types of, okay, so if it's a personal account, you need Microsoft account. 00:21:43.860 |
If you have a corporate account, like if you have an account as part of your company, 00:21:48.740 |
which is managed by Android, which is managed by Antra, you need to select Antra ID. 00:21:56.820 |
Do not select Microsoft account if you use a corporate account. 00:22:16.260 |
That page, you're going to have a redeem training key link. 00:22:23.140 |
Click on it and then you're going to be asked to enter a training key. 00:22:32.980 |
So, hopefully this is big enough and everybody can see, but let me know. 00:22:38.100 |
Remember, raise your hands if you have any questions. 00:23:00.820 |
You can use also a personal Microsoft account. 00:23:19.860 |
You can use, if you have an Xbox, you can use an Xbox account. 00:23:26.980 |
Any Microsoft account or any of the consumer domains out there. 00:23:33.140 |
So, in your case, you might want to try a personal account. 00:23:38.260 |
And to be honest, in that case today, it's usually easier. 00:23:45.300 |
Like, you do not need to use a corporate account today. 00:23:50.340 |
I just have a question about that first slide you had up. 00:23:56.260 |
So, you're saying an agent can go do a SQL query, 00:23:59.140 |
come back, use the code interpreter, and then make a graph. 00:24:03.380 |
Is there any reason you're using agents to that, 00:24:06.180 |
as opposed to just training all the workflows and injecting the tools? 00:24:14.100 |
I'm just wondering if there's a reason why we're using agents to that sort of stuff. 00:24:20.020 |
Well, like I said, we need to go back to the definition of what an agent is. 00:24:25.460 |
And like I said, it's a bit of an overloaded term those days. 00:24:34.180 |
It encapsulates a wide range of definition, including the simplest. 00:24:46.340 |
one of the things that an agent can do is be goal driven and iterate until it achieves the goal. 00:24:54.260 |
Today, we are not going to see that specific thing. 00:24:58.820 |
We're going to be in the middle in terms of complexity. 00:25:02.980 |
We are going to be above the simple completion. 00:25:12.980 |
We are going to be seeing a mix of using tools and code interpreter with multiple data sources, 00:25:22.740 |
where the information from all those are mixed together to do reasoning and act on the words. 00:25:30.580 |
And yes, you could do that using just an LLM locally. 00:25:37.140 |
Except today, we're using app service, which manages everything server side. 00:25:48.900 |
Today, I'm going to give examples so that you get a sense and you touch exactly when the current 00:25:58.580 |
architecture hits its limits and when you need to go further and use a more orchestrated 00:26:09.780 |
planning and orchestration that iterates which loops until it reaches the goal. 00:26:15.940 |
And I'm going to give an example where you see where it breaks. 00:27:10.340 |
So that means we have eight minutes to answer questions. 00:27:24.820 |
If an agent is just an LLM running on a loop until it thinks that it's done what you've asked it to do. 00:27:34.420 |
Yeah, how does it know when to kick back out of that recursion? 00:27:38.500 |
Yeah, so today, we are not going to-- we are going to see the limits of not-- 00:27:47.460 |
Okay, you have two types of agents, two levels of complexity. 00:27:52.020 |
An agent does not have to do the looping to be called an agent in the simpler 00:28:02.180 |
In order to go one step further and do that looping, you need to use something like autogen, for example. 00:28:15.540 |
Those types of agents, you need to define a criteria of done, a definition of done, basically. 00:28:22.100 |
And you're going to have-- and the criteria can be implemented programmatically, deterministically, 00:28:35.860 |
And the workflow engine is going to loop until the goal is reached. 00:28:43.380 |
But it is the most tricky thing, in my opinion, fairly. 00:28:47.060 |
That's when you're going to develop an agent, figuring out when to stop is tricky. 00:28:54.340 |
For example, I often-- I've done a couple of prototypes with Browser Use, which is a famous 00:29:02.580 |
open source agenting system for Browser Navigation. 00:29:06.020 |
When you ask it to complete a task on the web, it's going to navigate from website to website, 00:29:12.980 |
The evaluation of when the task is done is clearly not always perfect. 00:29:32.820 |
The other one is, so my first question is, are we going to learn today, like, 00:29:37.540 |
how to send feedback to the agent, like, in case the agent gives, like, incorrect answers or incomplete answers? 00:29:53.700 |
engineer, what's the difference between, like, an AI agent and an MCP that's it? 00:30:05.380 |
So, basically, MCP is just a tool, a function. 00:30:13.460 |
Decide which function to call when you have a question. 00:30:20.020 |
MCP is just that plus management of the lifecycle of the program which completes the-- which executes the function. 00:30:33.780 |
Because normal function tooling, if you just take, like, a GPT or LAMA and you do function 00:30:41.940 |
calling, what the LLM is going to return is just a JSON telling you the name of the function 00:30:46.340 |
and the list of the values for each parameter, nothing more. 00:30:49.220 |
It's your responsibility as an application developer to execute the function. 00:30:59.380 |
It's that it's one of the existing technologies out there that exist that you can use as a tool. 00:31:06.180 |
And the advantage of MCP, at least as a client-side AI application developer, 00:31:13.780 |
is that the MCP protocol takes care of downloading the executable, the binary, 00:31:20.740 |
whether it's a node, Python, or whatever, download or a Docker image. 00:31:25.380 |
Like, pull the executable on your machine and execute it automatically. 00:31:32.980 |
And it's an overall protocol which comes with-- and also, it encapsulates the possibility for the tool 00:31:42.020 |
So, from the standpoint of the user, you just have to declare, oh, I want to use a file system MCP server 00:31:51.380 |
or I want to use a blender MCP server or whatnot, and that's all you have to do. 00:31:57.380 |
You can select it from a catalog and it's going to auto-declare what tools it has 00:32:06.260 |
Also, when you're done with your MCP client, it's going to stop all the MCP servers and clean up everything. 00:32:47.620 |
I was going to wait and see kind of how the demo played out later, but one of the 00:32:55.380 |
questions I often find making agents is the balance between 00:32:59.300 |
making a fairly general agent that can do lots of things and just kind of giving it the tools, 00:33:04.340 |
not giving it much direction versus having to be fairly controlled with it. 00:33:08.020 |
You're kind of trading off the autonomy to a bit of reliability. 00:33:12.660 |
The one that you set up before, I couldn't quite work out if you've literally just given it the tools 00:33:17.620 |
and then it can do everything with those tools or if you've been quite controlled with it. 00:33:22.500 |
How do you think about the trade-off and when people are using your tools do they tend to fall more on one side than the other? 00:33:27.140 |
It's a more complex answer than it looks like. 00:33:32.900 |
More complex question than it looks like because of two things. 00:33:40.100 |
When you give instructions, it's like I was talking about the space of probabilities. 00:33:48.180 |
The more vague you are, the more you leave options open. 00:33:53.380 |
The more your agent is going to be able to do a wide range of things, but it might get it wrong. 00:34:00.580 |
The more specific you are, the more you're restricting the things that it's going to do well, 00:34:10.420 |
And then there is, you didn't really ask this, but I'm assuming that's 00:34:16.180 |
on your mind, is how many tools can I give to my agent? 00:34:21.700 |
And also another common question that we often get is, 00:34:25.300 |
should I give all my tools to one agent or should I split tools on multiple agents or should I create 00:34:40.900 |
And the answer is same, it's complex, but it's kind of, imagine, same, you need to think in terms of 00:34:57.220 |
And so when, imagine you have one agent and you give all the tools. 00:35:03.700 |
And at every single instance for any question you ask, the LM has to decide which one of all 00:35:12.580 |
the probability that it gets it wrong is higher, right? 00:35:15.140 |
So the solution to that is to create agents which are more specialized by like areas of expertise 00:35:25.620 |
and do some kind of routing and multi-step selection. 00:35:29.300 |
Where instead of having one agent and giving all the tools, you, for example, you have a first round of agents 00:35:42.500 |
Classify the question and say, oh, this is a sales question or this is a product question. 00:35:48.820 |
And then route to an agent which is more specialized for to answer things about sales or things about 00:35:56.180 |
And then when the answer comes back, the first agent can say, oh, do I need to use another agent? 00:36:06.100 |
So you can imagine like a tree of tools and each tool can be composite or leaf. 00:36:10.900 |
You can see that way and autogen allows to build such topologies. 00:36:17.700 |
It may specialize in each one could be working with a different database or a different track. 00:36:33.220 |
Yes, but you can also have topologies where the different agents share memory. 00:36:40.820 |
Because the thing is, sometimes you also have the situation where you have like a team of agents 00:36:51.780 |
And it's good that they are specialized because you don't want them to pick the wrong tool. 00:37:01.460 |
Yeah, but what's going to prevent the hallucination is the grounding. 00:37:12.340 |
So each one of those agents is going to be grounded in something. 00:37:14.900 |
But something which is some kind of a grounding is memory. 00:37:21.380 |
Like for example, if you have multiple tools that answer questions about a consumer, 00:37:28.260 |
and you want to memorize, remember the preferences of your consumer, of your user. 00:37:34.500 |
Like for example, what's the name of his or her pet? 00:37:39.060 |
And you bet you want each one of your agents potentially to be able to use that information. 00:37:44.980 |
So what you want to do is take that memory, connect that memory to each one of those agents. 00:37:49.700 |
Even if you have multiple agents, they can-- each one of them have access to that shared memory. 00:37:54.500 |
Because you want all of them to be able to access the name of the pet. 00:38:02.500 |
Like you have many topologies, which make sense depending on your use case. 00:38:19.140 |
Can you please raise your hand when it's still building? 00:38:57.940 |
Can you raise your hand again for those for which it's still building? 00:40:00.020 |
So I'm going to show you how to make the most of it. 00:40:09.540 |
So when you have the black screen, I should have-- 00:40:16.420 |
When you have the black screen, the username should be pre-selected 00:40:24.860 |
So there, you can click on password here, which is going to fill the password box. 00:40:36.060 |
Every time you see a T like this, you do not need to type. 00:40:39.900 |
You just click on it and it's going to auto type. 00:41:03.660 |
So what I'm going to do here, what you can do, it's not mandatory, but you can click here and go to split windows to get more real estate. 00:41:11.580 |
So that way here, I'm moving the instructions away, and here, that allows me to have more space. 00:41:31.020 |
So we're going to move on to getting started. 00:41:35.500 |
What we are going to need to do is AZ login here. 00:42:05.260 |
Even if you connected the first time, even if you connected with your Microsoft account, 00:42:14.620 |
in this instance now, we are not going to use your account. 00:42:19.580 |
We are going to use a temporarily generated account, which is a work account. 00:42:34.140 |
So in the resources tab here, in the instructions here, in the pop-up, you click user. 00:43:23.660 |
And we're going to have to type that comment. 00:43:28.380 |
Because in Azure, you have some roles that need to be assigned. 00:43:32.060 |
And that's something we could not automate as part of the lab provisioning. 00:43:35.740 |
So that's something you need to execute manually. 00:43:43.260 |
You're going to have to say to agree to the warning and say pass anyway. 00:43:53.820 |
And at the end, normally everything should be fine. 00:44:12.860 |
So now, finally, we're going to be able to open the workshop. 00:44:15.980 |
So you type that command, which starts with git clone. 00:44:19.740 |
So what we're going to do is we're going to check out the git repository from GitHub. 00:44:23.660 |
And we're going to build the project and open it and install some code extension and open it in VS Code. 00:44:33.420 |
So I'm going to go back here, type the command, passed anyway, enter. 00:44:40.460 |
So like I said, it's cloning the repository, creating the Python virtual environment. 00:44:49.580 |
So today we're going to use Python for this workshop. 00:44:52.300 |
So when you get into the VM, you just open Edge, you open the browser, 00:45:08.300 |
and the instructions should be displayed right away. 00:45:12.620 |
You also have a form on the desktop right here. 00:45:14.620 |
So when you get into the VM, you just open Edge, you open the browser, and the instructions should be displayed 00:45:57.100 |
I don't know how you get here, and I'm not sure what the ... Can I see? 00:46:24.460 |
So I was at this step, where I cloned the repository, and I installed the PDF extension. 00:46:37.100 |
Then we're going to move on, and we're going to open VS Code. 00:46:56.940 |
So now we still have a few setup steps before we can get to coding. 00:47:09.020 |
So you're going to have to go to the Azure AI Foundry here. 00:47:22.460 |
For the sign-in, use that user from the instructions pane. 00:47:55.580 |
Then what we're going to do is we are going to search for ... 00:48:23.020 |
Once you are on the project, you can ignore those pop-ups. 00:48:36.940 |
And we're going to need that project connection string here. 00:48:49.980 |
So once you have copied the project connection string, 00:48:56.220 |
you can go back to VS Code, search for the .anv.sample file, rename it. 00:49:08.060 |
And we're going to pass the connection string here. 00:49:17.420 |
So that connection string is what allows us to connect to the AI Foundry project, 00:49:25.980 |
Be careful when you passed it to not forget characters, to pass between the double quotes. 00:49:40.060 |
So as you can see, we're going to use GPT-40. 00:49:50.620 |
This is important because now, finally, we have configured all the-- 00:50:41.500 |
So, the first thing that we are going to look at first is that I'm going to explain quickly 00:50:51.180 |
So the main file that we are going to look at today is the main.py file, 00:50:55.420 |
which is the entry point for the application. 00:50:59.820 |
The goal of this workshop is not to give you, like, what code should look like in production. 00:51:07.580 |
The goal of this workshop is for you to understand how it works, to understand all the pieces, 00:51:13.260 |
because once you understand, you're going to be able to use any of those other frameworks. 00:51:19.740 |
I mean, this one or another one, it's going to be the same. 00:51:22.380 |
What matters is to understand how NLLM works, how function coding works, how grounding works. 00:51:27.820 |
So, and the sales data, yeah, that's where the SQL query generation logic is. 00:51:42.220 |
Another directory which is very important is the shared/instructions directory. 00:51:54.860 |
So the first step, the most important thing is to understand function coding. 00:52:00.700 |
So the first example is it's going to show you what it does. 00:52:07.660 |
So in sales data, let's look at-- where am I? 00:52:35.260 |
So as you can see here, that function takes a SQL query. 00:53:03.740 |
I mean, let me scroll so that you can see it. 00:53:08.380 |
So the LLM is going to be the one generating the SQL query, 00:53:12.380 |
compliant with SQLite syntax, and pass it to the function. 00:53:20.860 |
It just uses the SQLite driver and executes the query, nothing more. 00:53:28.860 |
That's because all the smartness of generating SQL query is done by the LLM. 00:53:33.900 |
When you think about it, LLM is very good at language. 00:53:37.020 |
And SQL query is a kind of language, like coding, like Python. 00:53:41.420 |
And so-- and it happens that when the large language models are pre-trained, they are pre-trained on the massive amount of GitHub repository, and a lot of them contain SQLite queries. 00:53:54.700 |
So if you use a common database, it's going to work, like PostgreSQL, MySQL, SQLite, MongoDB. 00:54:06.140 |
If you use an exotic database that nobody has ever heard of, it's not going to work. 00:54:11.580 |
Because the model needs to have been pre-trained on it. 00:54:35.740 |
So those are the packages which are required to connect to AI Foundry. 00:54:41.580 |
To get the models to authenticate with Azure identity. 00:54:46.700 |
.onv is the package as it was to load the environment variable from a .onv file that we edited previously. 00:54:59.340 |
So here, that's how you connect to the AI Foundry workspace. 00:55:11.580 |
And so line 59, what you're going to do is comment the first instruction files. 00:55:23.260 |
So if you go to Instructions here, you open FunctionCoding here. 00:55:31.580 |
Those are the instructions that are going to be passed to the agent. 00:55:37.740 |
You are a sales analysis agent for Contoso, Retarder of Outdoor, Camping Gear. 00:55:46.540 |
So it explains what personality the agent should have, what the mission of the agent should be, 00:55:54.380 |
help users by answering sales related questions, at least what tools are available. 00:56:01.740 |
So here, sales data tool, use only the Contoso sales database via the provided tool, 00:56:09.980 |
So to be honest, those instructions are optional. 00:56:15.900 |
They are made to-- actually, it occurs to the question you were asking earlier. 00:56:25.980 |
Where you were asking how specific you must be. 00:56:32.300 |
And the LM is going to understand the JSON schema of the tool, which explains the function name, 00:56:38.300 |
the parameters, and you have an explanation of what each parameter is. 00:56:43.180 |
Here, in the system instructions, you can add some more information that are more specific 00:56:49.420 |
to your application or to the agent using the tool. 00:56:52.460 |
Then you have information regarding formatting and localization and examples, et cetera. 00:57:04.700 |
And here, so we are specifying the instruction files, and we are also specifying which tools to use. 00:57:15.020 |
So here, we are going to use the functions, which contains only one function tool, which is async, 00:57:22.460 |
which is the function I showed to you earlier. 00:57:28.540 |
And also, the documentation here, the Python doc, is passed to the LLM. 00:57:37.820 |
So what you write here is important, because the LLM is going to interpret that documentation, 00:57:44.460 |
as well as the name of the parameters, plus the documentation of the parameters, 00:57:57.900 |
And then I'm going to open a terminal, command palette. 00:58:05.900 |
And to be honest, I never remember where the terminal is. 00:58:08.540 |
So what I do is that I cheat, and I create a terminal here. 00:58:25.660 |
You need to remember we are on Windows, usually I have a Mac. 00:58:28.220 |
Okay, we have looked at this, I have explained to you all of this. 00:58:56.860 |
Yeah, you don't even need to open the terminal. 00:58:59.660 |
You don't even need to do what I just did before. 00:59:01.500 |
It's just a habit I have, but you can just click run here, it's going to work. 00:59:09.900 |
So it's doing what I mentioned earlier, which is that first you need to create the agent. 00:59:20.140 |
Once you created an agent, you're going to be assigned with an ID because it's stateful. 00:59:24.700 |
Like the agent is actually an entity which lives in the AI Foundry project. 00:59:30.940 |
Then we're going to enable the auto function calls. 00:59:36.940 |
So the thread is basically the conversation, which is stateful too, because you don't have to save the 00:59:44.780 |
messages yourself, and then we can finally enter a query. 00:59:48.860 |
So I'm going to go back because we have a list of questions we can ask here. 01:00:16.780 |
So to be honest, every time I see that, it still amazes me, even if I've been doing that for a long 01:00:24.060 |
Because basically from the simple question that we ask in plain English, based on the knowledge of the 01:00:33.500 |
schema of the database, which is somewhere in the code, it can automatically generate such a query. 01:00:43.340 |
Like, let me go back to the question I asked. 01:00:59.420 |
From sales data, because we are putting from the sales data table, and we want to group 01:01:09.980 |
So we want to sum the revenue, group by region, and automatically it limits to three. 01:01:18.460 |
And I believe it's because in the instructions, there is an instruction that says that by default, 01:01:33.660 |
And we can see that for Africa, we had 5.2 million Asia-Pacific, 5.3 million, et cetera. 01:01:52.460 |
To be honest, I don't know the scheme of the database very well. 01:01:59.740 |
So even for me, like I would have to look at it. 01:02:02.620 |
If I had to write a SQL query, I'm sure you've all done that. 01:02:05.580 |
Remember, the goal of such an agent is to enable non-technical people to use technical tools. 01:02:11.100 |
And in this case, the technical tool is a SQL database. 01:02:49.100 |
so the question was, what was last quarter's revenue? 01:02:55.260 |
yeah, select the sum of the revenues from sales data, where year equals 2024. 01:03:18.540 |
So I'm not sure why he selected those three months, 01:03:38.380 |
Every time we do that, obviously, we get different responses, right? 01:03:43.980 |
I think it's because Oro's last training date was November of '24. 01:03:49.500 |
So it's thinking last quarter from November is 7, 8, 9, or 4 November. 01:03:56.860 |
So it's possible, but actually, let's say we're in April. 01:04:18.060 |
which means that the last quarter is January, February, March. 01:04:21.100 |
And as you can see, it changed the request with 123. 01:04:25.580 |
So one thing we could do to improve this example is add a date tool. 01:04:30.620 |
We could add a tool that allows the LIM to ask for what is the current date, 01:04:37.340 |
so that we would not have to enter it manually. 01:05:03.340 |
So this time, it is interesting because we are generating a query against a different dimension, 01:05:16.700 |
So as you can see here, we are grouping by product type. 01:05:19.740 |
Okay, I'm going to pass on because you understand the concept. 01:05:25.260 |
I'm going to ask the last one because I want to move on to the next examples 01:05:31.740 |
and see what happens when you mix multiple data sources. 01:06:56.380 |
The way it works is that at initialization of the agent, so we are loading the instructions 01:07:05.300 |
So if I go back to load, replace, if I search for this in the function calling instruction 01:07:18.580 |
So sales data tool, so here, and I passed too fast on this when I was going through 01:07:26.040 |
the file and explaining it to you, but so what's going to happen here is that the instructions 01:07:33.020 |
of the agents is going to contain what the schema of the database is. 01:07:38.580 |
That's how the LLM knows what query to generate and what tables and columns are in the database. 01:07:58.560 |
That you're grounding, basically, the LLM with what the schema of the database is. 01:08:03.640 |
Can I do more specific, like, what if, just saying, like, what if one has some confidential data, 01:08:19.020 |
And actually, you don't even have to be too fancy for that. 01:08:22.160 |
You can literally add it to the instructions file. 01:08:24.380 |
You can say, hey, the column X of table Y is confidential. 01:08:34.500 |
It is not guaranteed that the LLM is going to follow those instructions. 01:08:39.220 |
If you have confidentiality, I mean, matters, problems in that space where you want to restrict 01:08:48.980 |
access, you should, somebody mentioned IAM, or I think it was you, you should implement this. 01:08:58.360 |
You should add column level, and the best is to do what you would do with normal code. 01:09:04.380 |
Because even if normal code is deterministic, and the odds of a user getting access to something 01:09:11.740 |
he's not supposed to get access to, it's still possible. 01:09:14.180 |
You can still use SQL injection, like all sorts of hacking, to exploit normal code. 01:09:22.200 |
Obviously, with LLMs, it's a whole different area. 01:09:26.940 |
So you should implement the exact same safety measures you would for a normal program. 01:09:36.220 |
If you were supposing that the schema is kind of perfect, it's a perfect work, a schema that 01:09:41.220 |
will point, I mean, our agents to the right and table to the right-- 01:09:45.220 |
I'm not sure what you mean by perfect, because the schema of a SQL database is, I mean, has to 01:09:58.220 |
Yeah, but I mean, it depends on the one that created, I mean, depends on the name of the 01:10:23.220 |
He asked, how can I make sure that confidential information is not returned if I want to? 01:10:29.360 |
You add an instruction saying, do not use that column, it is confidential. 01:10:34.400 |
To answer your question, what you would do is document your schema. 01:10:38.400 |
It would say, that table which has a cryptic name that nobody understands is actually the 01:11:55.400 |
Because to answer the previous question, which was total shipping costs by region. 01:12:06.400 |
Remember that the LLM has access to the context of the discussion, of the past messages. 01:12:13.400 |
It has all that context of the answers and questions that were previously asked. 01:12:25.400 |
For example, when I asked total shipping costs by region, it did not have the information required 01:12:38.400 |
But now, when I asked what regions have the highest sales, it can actually use a table that 01:12:48.400 |
There is no need to execute one more SQL query because it has everything it needs. 01:12:55.400 |
I mean, without doing anthropomorphic, but exactly as you, if you were reasoning about the problem 01:13:01.400 |
and you were like, do I need to go write a SQL query in order to get that information? 01:13:13.400 |
That's why here, you don't see any SQL query executed. 01:13:20.400 |
How do we know it's doing math the right way? 01:13:29.400 |
We asked it to-- oh, you mean because there is a top. 01:13:37.400 |
So, so highest-- okay, that's a very interesting question. 01:13:53.400 |
And that's easy because that's something that does not-- I mean, easy. 01:14:01.400 |
They are bad at calculus, bad at addition, multiplication, division. 01:14:08.400 |
And it's funny because I have an anecdote that's from this morning. 01:14:13.400 |
We had a team meeting and we were wondering about the compound value of a monthly-- and you were there-- and what would be after a year the compound effect of a daily rate of increase of 1%. 01:14:32.400 |
And in my mind, I remember the formula for this, which is you take the percentage, you add 1, and you put it to the power of the number of periods, right? 01:14:57.400 |
It looked like it was doing right, but it was not. 01:15:00.400 |
It was the original answer, which I typed in Google Sheet with a formula, was correct. 01:15:07.400 |
So to answer your question, when you need to do math, you need to use a tool. 01:15:12.400 |
Here, we're using code interpreter, which can do math, but to be honest, I've tried. 01:15:17.400 |
It's not the best way to do math, to use code interpreter. 01:15:20.400 |
It's good at generating diagrams, at reading files, extracting information from a CSV, from other types of structured documents. 01:15:31.400 |
If you try to do math with code interpreter for whatever reason, which I cannot explain, it does not work very well. 01:15:37.400 |
When you want to do math, it's better to create your own tool, like calculate. 01:15:42.400 |
And it takes like a LaTeX expression or some kind of mathematical formalism to represent an expression. 01:15:48.400 |
You have plenty of mathematical calculators out there that you can use to do the actual and accurate math calculation. 01:16:00.400 |
And so that the LLM can delegate to a tool the responsibility of doing calculations right. 01:16:11.400 |
Because me, I don't know, but 1% to the power of 365, I don't know how to do that myself. 01:16:55.400 |
We need to ask, again, to do a query on the database. 01:17:04.400 |
Because in the past information, we do not have that information. 01:17:07.400 |
We don't know the drill down of tense in the United States in April 2022. 01:17:31.400 |
No, I don't want to meet and chat with friends and family right now. 01:17:51.400 |
You're asking if you're using a vector database for whatever reason. 01:18:10.400 |
And so we're going to uncomment those lines in main.py. 01:18:22.400 |
Let me search for-- yeah, it's now the instructions. 01:18:54.400 |
I was trying to select multiple lines at once. 01:19:12.400 |
So now we're defining-- we're creating a vector store. 01:19:16.400 |
I'm not going to go into the details of the creation. 01:19:18.400 |
There is a utility which does all the heavy lifting. 01:19:22.400 |
But basically what it does is that it creates an AI search vector store. 01:20:27.400 |
So like I was saying, I was going to show you the files. 01:20:38.400 |
So it's a PDF which contains all the product information for my products. 01:20:55.400 |
And so what this function does is that it reads the PDF, chunks it, cuts it in small pieces, 01:21:02.400 |
and uploads all those pieces into the vector database. 01:21:07.400 |
If we have time at the end, I can explain to you in excruciating details how that works. 01:21:28.400 |
I wanted to show you the difference between the instructions file we were using before and 01:21:49.400 |
So I'm using the div tool here included in VS Code to show you the difference. 01:21:57.400 |
And so here, what's interesting is that you can see the difference between the instructions. 01:22:03.400 |
So in the file search instruction, we have new instructions. 01:22:17.400 |
So now we have a contested product information vector store. 01:22:21.400 |
We have a search tool which allows to search into the vector database. 01:22:28.400 |
And we have a few different things when it comes to content and clarification guidelines. 01:22:34.400 |
Such as the kind of questions that you can ask. 01:22:37.400 |
You know, with new questions like what brand of tents do we sell, which you could not ask before. 01:22:43.400 |
Because you didn't have the brand information into the sales database. 01:22:52.400 |
So what's interesting with the use case here, and really the colleague that created that content, 01:22:59.400 |
created something very interesting because it allows to show many things. 01:23:04.400 |
Like when you have a source of information, a source of data, think about how before when 01:23:10.400 |
we had to mix different data store, different databases, and we had to inject them into some 01:23:22.400 |
whatever, even forget to do some very complex aggregated queries to link things together. 01:23:31.400 |
Like here, when I want to link to join information from a database with information in a PDF, 01:23:37.400 |
which is not structured, I can use the LLM for that, which is quite extraordinary. 01:23:43.400 |
So when I hear all the skeptics about AI, I just don't understand because, I mean, that's just insane what you can do. 01:23:51.400 |
Anyway, I'm just going to skip for the rest of the difference between those instructions. 01:24:13.400 |
And actually, the chunking, I said it was chunking. 01:24:17.400 |
Actually, the chunking is done by AI search, I think. 01:24:20.400 |
I forgot if it's client-side or server-side, but I think it's server-side. 01:24:23.400 |
It's one of the features of agent services that you do not have to take care of chunking yourself. 01:25:03.400 |
And so this time-- so what's interesting is that this is not using the SQL database. 01:25:12.400 |
Nothing else because there is no sales information required to answer that question. 01:25:17.400 |
Also, remember that I killed the previous instance of the agent. 01:25:23.400 |
So all the past conversation was lost on purpose. 01:25:27.400 |
This is-- I could have reused the same conversation if I wanted to keep the previous context, but I 01:25:39.400 |
So outdoor living and alpine gear plus some information about it. 01:26:09.400 |
But what's interesting here is that there is no information about hiking shoes in the PDF. 01:26:15.400 |
So now what's interesting is that it's referring to the previous brands we talked about. 01:26:27.400 |
So what product type and categories are these brands associated with? 01:26:40.400 |
To be honest, it could have summarized it since both do the same, but it did not. 01:26:48.400 |
Because now we are asking for specific sales information about a specific year. 01:26:53.400 |
So now it's going to need to query the SQL database. 01:26:56.400 |
So we are asking for the sales of tents in 2024 by product type. 01:27:09.400 |
So what's very interesting here is that the product type and the total sales, I think, come-- yes. 01:27:17.400 |
The product type and the revenue come from the SQL database. 01:27:27.400 |
So because in the questions before, we read the mapping between the product type and the brand, 01:27:35.400 |
that's how it's capable of adding the brand into the table here. 01:27:38.400 |
And that's also-- I'm not going to do the demonstration right now. 01:27:50.400 |
And I'm going to re-ask the exact same question. 01:27:52.400 |
And you're going to see the difference in the question. 01:27:55.400 |
Hint, because I'm going to ask that question without the previous question. 01:28:00.400 |
And because what I said earlier-- remember when I said that an agent can be goal-oriented 01:28:06.400 |
and relentlessly work until it achieves the goal. 01:28:10.400 |
In this instance, we have not implemented the loop. 01:28:20.400 |
It's not going to be able to say, oh, but in order to answer that question, I need to look 01:28:25.400 |
into the PDF and into the-- well, you might get lucky to be honest. 01:28:30.400 |
It might do it out of luck, because sometimes it does. 01:28:33.400 |
Because sometimes the planning is simple enough that it doesn't need multiple steps. 01:28:36.400 |
But if it gets slightly complicated and you need multiple-step planning, it's going to fall short. 01:28:50.400 |
What were the sales of Alpine Gear in 2024 by region? 01:28:58.400 |
So very interesting, too, because the database does not contain the brand. 01:29:15.400 |
So here, what it does is say where product type like-- it automatically generates a like matching 01:29:24.400 |
criteria based on the brand, because it knows that the brand does family camping tents. 01:29:33.400 |
So now we're going to generate charts, and we're going to use a code interpreter for this. 01:29:39.400 |
Oh, and like I told you, I told you before I do that, I'm going to make an experiment. 01:29:45.400 |
And I'm going to ask again-- the last question actually, the most-- yeah, this one. 01:29:59.400 |
And this one most likely is going to fall short. 01:30:11.400 |
Okay, and I'm going to create-- to ask the last question. 01:30:45.400 |
I guess it figured out that it needed to use the product database. 01:30:48.400 |
Or maybe there is an instruction in the file search that says that when you ask a sales question, 01:30:57.400 |
requiring product information to go read the PDF. 01:31:26.400 |
I got-- it gave me Alpine and Alp-- it doubled up on the second result. 01:31:32.400 |
It doubled up on the-- when you did it, it said it was just-- it was just-- it was outdoor 01:31:37.400 |
job living for backpacking and camping tent was Alpine Gary. 01:31:41.400 |
But from my-- it said I wouldn't put a little bit of hope. 01:32:02.400 |
Like, I guess that's where I struggle sometimes. 01:32:06.400 |
So I give a breakout session, by the way, on evals. 01:32:14.400 |
But I'm going to give a breakout session tomorrow specifically on how to evaluate agents. 01:32:22.400 |
It's a little bit because it's like, well, the answer might be right. 01:32:35.400 |
So now, we're going to go back to our main.py. 01:32:57.400 |
So we can also just have all the tools running, right? 01:33:02.400 |
At the end, when I'm going to uncomment everything, all the tools will be working at the same time. 01:33:07.400 |
But what's very interesting is to uncomment-- it's a workshop. 01:33:11.400 |
It's so that you get an understanding of exactly the certainty of how everything works. 01:33:40.400 |
And we're going to compare code interpreter with file search. 01:33:50.400 |
So here, with code interpreter, we add visualizations. 01:34:01.400 |
We add a chapter, a section in the instruction, which explains how to do visualization. 01:34:07.400 |
So when you have questions involving visualization, we basically say, hey, go use the code interpreter 01:34:29.400 |
Remember, in the previous-- in the slide I showed before, we were first asking for the sales by region, and then for a pie chart. 01:34:44.400 |
Here, we're asking for both at the same time. 01:35:00.400 |
And so it's tricky because sometimes when you need-- because basically what this is doing is that it's calling two tools. 01:35:13.400 |
And sometimes, it's simple enough, the reasoning capabilities of the LLM is capable in, like, one question and answer to-- 01:35:23.400 |
it has enough reasoning power to say, oh, I need to call that tool and then this one. 01:35:30.400 |
And sometimes it falls short and it does not have enough reasoning power. 01:35:43.400 |
So here, we saved the results here, so I'm going to-- OK. 01:35:50.400 |
With our answer, I can try to zoom to show-- OK. 01:35:56.400 |
So we have the drill down of revenue by region. 01:36:06.400 |
I mean, another who is literally generating Python code to generate the pie charts. 01:36:12.400 |
So basically, what that means is that you can generate a pretty crazy diagrams idea. 01:36:18.400 |
You can be pretty imaginative in the type of information that you want to display. 01:36:27.400 |
So for the anecdote, last year at this conference, I gave a talk on cognitive pressure because at the time it was a hot topic. 01:36:41.400 |
And I had extracted one of my sessions where, you know, you're on the water and you do attacks, you do turns, you do jumps. 01:36:50.400 |
And I had the XML file of my session recorded from my watch, exported. 01:36:55.400 |
I imported it into a code interpreter, and I asked to calculate how many times I turned or how many times I jumped. 01:37:04.400 |
And from-- it was able to read the XML because XML is a very rich, strict format and self-sufficient. 01:37:15.400 |
And extracted all the information and the structure and generated the Python code to go through all the data points in the file and calculate how many turns and jumps and height I had during my session. 01:37:31.400 |
It's a pretty incredible tool when used right. 01:38:11.400 |
Continue asking questions about-- yeah, because there was no tools involved. 01:38:16.400 |
Yeah, just interpreted whatever was in the context before and downloaded the JSON. 01:38:40.400 |
By the way, you hear a lot about one of the latest things in AI is deep research. 01:38:48.400 |
Under the hood, deep research is an agent that has those tools. 01:38:54.400 |
Just that it's been developed by an army of engineers and they've made sure that everything works well. 01:39:04.400 |
I don't know how it's implemented, but I'm assuming it's a mix of or some kind of orchestration system that mixes tools together and it loops until it achieves a goal. 01:39:19.400 |
So what would be the impact of a shock event, 20% sales drop in one region? 01:39:30.400 |
Can you elaborate on the difference between what this can do, how it's not looping, but if you use Autodesk and you can loop through the different tools that it needs? 01:39:46.400 |
It's able to use several tools at the same time. 01:39:52.400 |
And, like for example, let me try to find an example and, like for example, if you ask a question, you have two data sources, right? 01:40:05.400 |
Imagine you ask a question for which it is obvious that you need to query one, but it's not obvious that you need to query the other. 01:40:14.400 |
The information is in the other, but you have no hint in the phrasing of the question that you should be querying the second in order to answer the question as a whole. 01:40:29.400 |
This system would fall short, not the Autodesk? 01:40:35.400 |
And this one, the falling short, would also depend on what instructions you gave it, because you can hack your way through those kinds of problems. 01:40:44.400 |
Because if you, you can just, in your instruction, add some instructions saying, hey, if the user asks for that kind of question, you should also go look into the data source. 01:40:54.400 |
But, the difference, sorry, the difference with a multi-agent, multi-step planning system, with a more complex topology, is that you have an LLM, which every time you get an answer, looks at the answer and asks itself the question, is the answer correct? 01:41:13.400 |
Is this answer answering the question or not? 01:41:18.400 |
It can do something such as, huh, this is not answering the question. 01:41:24.400 |
What could I do more to try to answer the question? 01:41:27.400 |
And then, maybe the first step, it did not query the second data store, but it's going to say, actually, maybe we should go look into the data store, because actually, maybe the answer is in there. 01:41:47.400 |
Automatic kernel or other more, like, collaborative multi-agent system. 01:41:58.400 |
It's a workshop of its own, just looking at multi-agent systems. 01:42:09.400 |
So you said that sometimes, I mean, the answer is not obvious, or the question is the prompt itself. 01:42:18.400 |
It doesn't point to the right, maybe, set of data. 01:42:35.400 |
And a coordinator, when you use the word coordinator, it's actually one of the patterns, we call it a topology, it's one of the topologies of multi-agent systems. 01:42:45.400 |
And you can build such a topology with autogen. 01:42:56.400 |
The hardest question in multi-agent systems is, in my opinion, how to specify the definition of DOM. 01:43:27.400 |
Impact of a 20% sales drop in North America on global sales distribution. 01:43:41.400 |
Because in North America, here it's dropping. 01:43:50.400 |
So the percentage, obviously, of North America goes down, while the percentage of others goes up. 01:44:03.400 |
I mean, I'm going to skip because it's pretty obvious what it's going to do. 01:44:08.400 |
So, which regions have sales above or below their range? 01:44:16.400 |
So maybe the previous questions-- the previous question was-- no, I don't think so. 01:44:33.400 |
Interesting. So maybe the previous question was-- no, I don't think so. I think it's-- huh. Do we have a problem? Connectivity problem, maybe? 01:45:31.400 |
OK. Let me just-- so we are-- OK. I want to show you real quick the Bing grounding with Bing Search. 01:45:40.400 |
So I'm going to skip over that one because this is important. Very often you want to add internet search capabilities. 01:45:50.400 |
So I'm going to comment-- uncomment this. Use the other instruction file. 01:46:14.400 |
And the last one, code interpreter multilingual. We skip it. It's important because that's how to configure code interpreter. 01:46:22.400 |
Once you want to use-- you want to work on non-English languages. So with specific encoding, specific fonts, that kind of thing. 01:46:34.400 |
But we're going to skip it for today. Resource not found? Huh. 01:46:43.400 |
Well, that's interesting. Did you have that problem? Did you try to that point? Did you have that issue? 01:46:52.400 |
Is it working for you? Is it working for you? I have that error. 01:47:01.400 |
Maybe we've had to change. OK. And I'm not sure what happened. It may be-- because built was recently. And a lot of APIs have changed. And maybe that one changed too? I'm not sure. 01:47:23.400 |
Anyway, what would have happened is that-- let's look at the question. 01:47:42.400 |
OK. So here, what beginner tends to sell? Queries, our products, PDF. What beginner tends to our competitors sell, include prices, needs to go query the internet. 01:47:57.400 |
You need to go query information about what competitors are out there and what they sell and for how much. 01:48:06.400 |
So that would use the Bing grounding tool, et cetera. So same logic as before, we showed-- and I'm going to wrap up and go back to my slide. 01:48:35.400 |
OK. But just to finish my sentence. So what we bring-- one, two, three-- Bing grounding is just one more tool. 01:48:47.400 |
That same as the database query tool, the vector database file indexing tool is one more source of data that the agent can use to make informed decisions. 01:49:09.400 |
So we've seen how to do more with function calling, like with a bunch of very powerful tools. 01:49:25.400 |
It's always a question when you do workshops like this, like do we go with a web UI which is complicated and adds some React and web stuff that some people might not be familiar with. 01:49:32.400 |
That one is like bare bone, like it's just CLI, the bare minimum of code focusing explicitly on making sure you understand how to build such an application. 01:49:48.400 |
That one works because it's a whole new paradigm. 01:49:54.400 |
And as you mentioned, there is a question of evaluation. 01:49:57.400 |
And so when it comes to evaluating normal, just an LLM, question answer, there has been many frameworks out there for some time now. 01:50:09.400 |
But when it comes to evaluating agents, it's a whole new world again. 01:50:14.400 |
Because not only you are evaluating one answer, but you need to evaluate a whole conversation. 01:50:23.400 |
So you need to evaluate whether the good tool, the accuracy of which tools were selected. 01:50:30.400 |
Anyway, we're going to see all of this tomorrow during the breakout session where I'm going to introduce the Azure AI evaluation SDK with the agent evaluation capabilities, which is fascinating. 01:50:44.400 |
You have on that QR code a bunch of additional resources. 01:50:56.400 |
And I'm going to point back to my contact if you want to. 01:51:27.400 |
The tools which are used by agents, should they be part of the Azure ecosystem? 01:51:36.400 |
When you create an AI foundry, an agent, you have a bunch of pre-made tools that are readily available. 01:51:43.400 |
You just have to click on it and configure it. 01:51:46.400 |
Like I said, I can't remember from the top of my mind. 01:51:51.400 |
But the tools we define today, they are client side. 01:51:58.400 |
Well, except Bing grounding, because Bing grounding is you decline client side, but the actual Bing grounding call is happening on the server side. 01:52:16.400 |
So I know that the agent service and ad foundry is crazy right now. 01:52:22.400 |
So what are the -- why should we use the agent service versus just the serving service for, like, GPT 4.0, 4.1? 01:52:31.400 |
So I guess if we can instantiate the tools in the application itself, do we need to use the agent service, then, inside of Azure? 01:52:46.400 |
Or can we just stick with the regular old LLM kind of thing? 01:52:54.400 |
Like I said earlier, everything I just showed today with agent -- Azure AI agent service is just a managed AI system. 01:53:08.400 |
managed in the sense that it takes -- like most developers, every time you build an AI application those days, you need a vector database, you need conversations, you need to remember those, you need to store those, you need to search on the web. 01:53:23.400 |
I mean, those are the basic features that every single AI application out there needs. 01:53:29.400 |
So Azure AI agent service makes it super easy, manages everything for you. 01:53:39.400 |
You can use the bare bone LLM completion from two years ago, and you can do whatever you want with it.