Building Code First AI Agents with Azure AI Agent Service

00:00:00.000 | Well, this is very exciting because, I mean, 2025 is clearly the year of agents.

00:00:19.900 | Compared to the past two years, things have moved so fast.

00:00:24.320 | We went from very simple prompts, which were already incredible,

00:00:29.100 | but now we move to the next step where we have agents that can autonomously achieve goals

00:00:37.400 | without us knowing exactly how they do that, which is quite incredible.

00:00:42.340 | But the question is, how do you make them?

00:00:46.500 | So that's what we're going to see today.

00:00:49.280 | I am Cedric Vidal. I am a Principal AI Advocate at Microsoft.

00:00:53.420 | You can find my information down below.

00:00:57.200 | And today I'm going to be your host, and I am going to be helped today by Procter's, by Mark.

00:01:06.780 | And I'm sorry, I forget your name. I feel terrible.

00:01:10.580 | Nakmar.

00:01:12.480 | Did I say it correctly?

00:01:14.020 | Okay, thank you.

00:01:16.480 | So a big thank you to you two to help me today.

00:01:21.280 | So during the workshop, if you have any questions, please raise your hand.

00:01:26.020 | And Mark or Argmar will come and help answer questions.

00:01:38.780 | I feel super bad.

00:01:40.780 | Anyway, so today, like I said, I'm going to set the scene first.

00:01:48.280 | So what are we going to do today?

00:01:52.500 | So in order to put our hands on the keyboard and create and show to you how to create an agent,

00:02:00.040 | we're going to use that use case.

00:02:03.080 | Imagine that you are working for an outdoor and hiking equipment company that sells equipment online.

00:02:11.560 | We are going to, so what you want to do is you need to build a system that allows to analyze your sales data

00:02:22.260 | mixed with product information, generate ad hoc diagrams, basically a UX that your sales people can use

00:02:31.160 | very easily where we move away from the old paradigm where we had to hard code every single use case,

00:02:41.080 | every single view, every single query, where now the database, the queries are going to be generated automatically.

00:02:49.360 | The UX is going to be generated automatically to accommodate the type of information that you are displaying.

00:02:56.360 | And we are going to see how to create such an application.

00:02:59.900 | An agent, okay, what is it?

00:03:07.120 | Because like the definition of an agent has changed so often.

00:03:12.780 | Let's be honest, even the specialists in the industry don't agree exactly on what they are.

00:03:18.460 | And even the definition of what an agency is has evolved over time over the past three years,

00:03:23.600 | as people have got more acquainted and were discovering what we could do with it.

00:03:28.640 | You would imagine that a definition should be set in stone, but in that case, it's been difficult to agree.

00:03:34.820 | But the definition we're going to use today is that it's semi-autonomous software

00:03:45.360 | using tools and information that you can pull from databases and data stores at large and iterate until it achieves that goal.

00:04:03.320 | So until the system stabilizes and the goal are met.

00:04:06.420 | So that's the general definition.

00:04:11.520 | But we're going to see that, depending on the context, it can be more or less simple or complex.

00:04:19.860 | So in order to do that, an agent should be able to do three things, do reasoning over a provided context,

00:04:27.960 | to provide a cognitive function such as deduction, correlation, understanding cause and effect.

00:04:34.000 | So that's all that domain of cognition, the LLM, now has proven that it was able to do a lot of those.

00:04:41.580 | Not perfectly, but it's getting better every day.

00:04:45.480 | The second one is integrate with data sources for context, and the last one is act on the world.

00:04:51.620 | Because in order to stabilize the system, in order to be useful, like before the first generations of LLM-powered system,

00:04:59.260 | we are just about putting information and displaying it.

00:05:01.500 | But now we are moving a step forward where we are acting on the world and modifying the environment

00:05:07.360 | until it stabilizes and reaches the expected goal.

00:05:11.900 | So, what kind of application are we going to build today?

00:05:20.280 | It will look like this, so this is a screenshot of a slightly different application.

00:05:25.240 | What you're going to build today does not look exactly like this.

00:05:28.020 | But the idea is that you can ask a question in plain English, such as show the sales of backpacking tents by region,

00:05:35.880 | and include a brief description in the table about each tent.

00:05:38.920 | And it's going to pull information from the database, as well as the product information,

00:05:46.020 | and mix all that information together, reason about it, and display the content,

00:05:50.920 | and the shape and form of the display of the UX will depend on the type of information which is requested.

00:05:58.740 | Here it's a table, here it's a pie chart.

00:06:02.540 | Because the question here, yeah, create a pie chart of sales by region.

00:06:06.920 | So, the system is going to understand that we want to create a visualization.

00:06:13.280 | Okay.

00:06:16.820 | What technology are we going to use today to build our system?

00:06:22.600 | We are going to use Azure AI agent service.

00:06:25.960 | So, before I dig more into the details of what this is, you have so many ways to build an agent today.

00:06:35.640 | So many frameworks, long chain, long graph, semantic kernel, and so many others.

00:06:42.840 | The Azure AI agent service has the advantage that it's stateful and quite easy to put together.

00:06:53.580 | Because usually, when you build any kind of LLM application, I don't know if you're aware, but it's stateless.

00:06:59.360 | You need to manage the state client-side, so it's the responsibility of the application developer to store the conversations and to handle all the logic of pulling information from various systems, as well as executing the functions, the tools.

00:07:15.240 | So, agent service moves all the responsibility to the cloud on the Azure platform, and everything is managed.

00:07:23.120 | So, it comes with pros and cons.

00:07:25.280 | We're going to see what they are.

00:07:26.620 | But when it's about the pros, the big advantage is that it provides a very simple development workflow, because all the state and the context and the agent configuration is managed in the cloud.

00:07:42.460 | The integration with back-end data and data sources is also managed in the cloud.

00:07:48.180 | So, even if you may also mix and match, you can mix things that are managed by agent service in the cloud with things that are managed locally, it's possible.

00:08:00.060 | And it supports all the model families that are supported on the Azure AI Foundry model catalog, plus the Microsoft Enterprise Security, which is very well known to be very robust.

00:08:13.180 | It's sometimes a bit difficult to set up, but that's the price to pay for security.

00:08:21.060 | So, the application that I just showed was using Chainlit.

00:08:26.860 | The one we are going to build today is going to be command line.

00:08:29.220 | It's going to be slightly easier, but so basically at the top, you have the application layer with the framework.

00:08:39.180 | So here, Chainlit, in our case, it's going to become a command line, very simple.

00:08:43.180 | very basic, with a query function, which is going to use Azure AI agent service with instructions and models.

00:08:55.060 | And actions, so for function coding, and I'm going to explain what it is, code interpreter, and I'm also going to explain what it is.

00:09:04.060 | File search and grounding was being searched for web information grounding.

00:09:09.940 | And we're going to go through each one of those during the workshop.

00:09:15.940 | So, like I said, Azure AI agent service comes with pros and cons.

00:09:24.940 | The pros is that it manages everything for you.

00:09:27.820 | That's a big pro, right?

00:09:30.820 | The con is that you need to understand the diagram.

00:09:34.820 | More or less.

00:09:35.820 | You don't need to understand all the details, but one of the...

00:09:40.820 | So, I'm going to change side.

00:09:42.820 | So, the first thing, you're going to have to follow a sequence of steps.

00:09:48.700 | And you have quite a few steps to follow in order for agent to work.

00:09:53.700 | First, you need to create an agent.

00:09:56.700 | Once you have created and configured your agent, your agent exists in the cloud.

00:10:02.700 | It's kind of weird at first, because when you're used to stateless way of doing things, that stateful programming model is not so common those days anymore.

00:10:14.580 | But it becomes relevant again in the age of agents.

00:10:18.580 | So, once you have your...

00:10:20.580 | Which means that if you have an application that wants to reuse an agent, if you have created the agent before, you need to reconnect to an existing agent.

00:10:29.220 | It's kind of like in SQL, like the create or update table.

00:10:33.700 | You only create the schema if it does not exist yet.

00:10:38.100 | So, it's kind of a create or update agent for most applications.

00:10:42.660 | So, you create the agent.

00:10:43.700 | Then, you create a thread or you reuse the thread.

00:10:48.100 | Then, you run the agent on the thread.

00:10:51.900 | Then, you check the run status.

00:10:54.020 | And then, you display the agent's response.

00:10:55.780 | Those are the big steps that you need to get familiar with when building with Azure AI agent service.

00:11:02.260 | Then, you are willing to configure instructions.

00:11:06.660 | Those instructions are going to be attached to the agent and they are stateful.

00:11:12.100 | So, they are deployed in the cloud.

00:11:14.660 | And once they are there, you can reuse your agent and you don't have to send the instructions every single time.

00:11:21.380 | Which in terms of bandwidth and network is interesting.

00:11:23.620 | Then, you configure your model.

00:11:27.860 | You add data sources.

00:11:29.460 | So, one of the pros of using Azure AI agent service is that you can attach data sources directly to the agent.

00:11:36.900 | And you can do that either graphically through Azure AI Foundry or you can do that programmatically through the SDK.

00:11:47.140 | Then, you can attach tools.

00:11:48.580 | And we're going to see how to do that.

00:11:51.620 | How we attach each one of those tools.

00:11:53.620 | Some of those tools can be client-side.

00:11:55.780 | So, you are managed by the application code.

00:11:58.580 | Some of them can be managed by Azure.

00:12:01.620 | Today, we're going to see file search, code interpreter, function calling, and Bing search.

00:12:11.860 | Here's an example of a thread, of what a thread looks like.

00:12:14.740 | So, the user's message is going to be, "Tell me the total sales by region."

00:12:19.940 | So, what's going to happen is that in order to get the total sales by region, we need first to get the sales.

00:12:29.380 | I mean, we need to query, sorry, we need to query the sales data store.

00:12:35.620 | And it happens that in this case, the sales data store is a SQL relational database.

00:12:42.340 | And as you know it, the way to interact is using SQL queries.

00:12:47.380 | So, we are going to generate a SQL query dynamically, depending on the user's request.

00:12:55.220 | And then, we are going to send that SQL query, execute it on the database, get the list of records back,

00:13:04.020 | re-inject those records into the LLM, which is going to generate a message in plain text from those list of records.

00:13:14.980 | The SQL light is just for an example, it can be an example.

00:13:20.660 | Yeah, of course, yeah, it's an example.

00:13:22.740 | In the agent we're going to build today, just because it's convenient, but you can, of course,

00:13:27.380 | connect it to any database you want, version all, the document, an API, it really doesn't matter.

00:13:34.420 | For the ones which are managed locally, for the ones which are managed by agent service on the backend,

00:13:40.740 | the list is more restrictive, and to be honest, I don't have it on the top of my mind.

00:13:45.780 | You're not going to see any all-back or anything.

00:13:48.660 | Sorry?

00:13:49.220 | You're not going to see any all-based access control already.

00:13:51.860 | Not today, not today.

00:13:53.380 | It's a more advanced topic, obviously.

00:13:57.620 | And it depends a lot on your use case.

00:14:01.380 | So then, show as a pie chart, which is the second question we asked in the previous screen I showed you.

00:14:08.660 | This one is going to be quite interesting, because we're going to use a tool called Code Interpreter.

00:14:18.100 | What it does is that it's going to take the query, generate Python code, and the Python code is going to be

00:14:26.900 | executed in a sandbox, in a secure, safe sandbox.

00:14:31.060 | And it can be whatever is supported, whatever Python packages are available in the environment.

00:14:39.940 | Usually, you use it to generate diagrams.

00:14:43.460 | And it's going to generate the Python code, which is going to generate that visual representation.

00:14:50.260 | It's going to execute the code.

00:14:52.660 | The code is going to save the image somewhere on the file system inside the sandbox.

00:14:57.140 | Then the agent is going to pull that image out of the sandbox and send it back to the client application.

00:15:05.060 | Okay, so like what I said, this is quite a lot to digest, right?

00:15:10.340 | But the thing is, you just have to go through and get a mental model of how that works.

00:15:15.940 | And once you understand that, you don't have to manage it yourself, which is quite interesting.

00:15:22.020 | Today?

00:15:26.180 | Do we have a vector database today?

00:15:30.580 | I'm blinking.

00:15:33.300 | I think we do.

00:15:34.260 | I think we do.

00:15:34.900 | I think we do.

00:15:34.980 | But I'm going to double check.

00:15:36.180 | Because we ingest the documents, and I think we ingest the documents inside an AI search instance.

00:15:42.420 | One more very important thing, function calling.

00:15:47.860 | So to be honest, function calling is not new when it comes to LLMs.

00:15:52.580 | I was doing function calling like literally when the first version of TGPT was announced by

00:15:58.020 | asking the LLM to give me answers from separated by commas and say generate a function and an argument

00:16:05.780 | separated by commas.

00:16:06.580 | You can still do it, by the way.

00:16:07.860 | But nowadays, it's much more efficient to have structured output.

00:16:12.420 | It generates a well-formed JSON.

00:16:14.900 | Even the LLMs are optimized under the hood inside the data center to generate

00:16:21.220 | JSON optimally.

00:16:27.060 | So the principle of function calling, actually, the name is bad.

00:16:31.940 | I've hated that name ever since it was coined because it's not function calling.

00:16:36.820 | It's function routing.

00:16:37.940 | The LLM does not call anything.

00:16:40.580 | An LLM runs on a GPU.

00:16:42.740 | It's not going to call any code.

00:16:45.780 | So what it does, rather, is that it generates a JSON representation telling you

00:16:53.860 | what function to call with what parameter values.

00:16:56.420 | And it's going to map.

00:16:57.780 | It's going to decide which function to call.

00:16:59.860 | And it's going to map the natural language sentence and extract

00:17:06.420 | values that it's going to pass as parameter to the function to be called.

00:17:11.940 | And then it's the responsibility of the application code to take that function

00:17:16.420 | call specification and actually execute the code.

00:17:19.460 | So that's what it does.

00:17:21.700 | Yes.

00:17:25.140 | How does LLM know which functions you use in your computer classification model?

00:17:31.860 | Like, do you know under the hood how does it know?

00:17:34.900 | Oh, that is an excellent-- that is a very good question.

00:17:40.100 | Well, it's exactly the same way when you ask a question.

00:17:48.100 | For example, when you ask, what is the color of the sky?

00:17:54.180 | Like, very-- one of the first use cases.

00:17:57.460 | Obviously, the answer is going to be blue most of the time.

00:18:01.380 | The way LLMs work is that it's a statistical distribution.

00:18:07.540 | You have the probability that the word that comes after-- when you ask the question,

00:18:12.980 | what is the color of the sky?

00:18:14.100 | The most probable answer is going to be the color of the sky is.

00:18:19.220 | And after is, the most probable color is going to be blue.

00:18:22.020 | Because that's how the model has been pre-trained, right?

00:18:26.660 | But if you say, what is the color of the sky?

00:18:32.020 | And in order to get me the answer, you can only use, for example, color equal and the value of the color.

00:18:44.660 | Then you give an instruction.

00:18:46.180 | You tell the LLM, hey, here's how-- here's the output that I want.

00:18:49.940 | And then it's going to generate color equals blue.

00:18:53.060 | Just because you constrained, like statistically, in the world of all the possibilities that you can answer,

00:19:01.220 | You're narrowing down the type of output that you want to be generated.

00:19:07.060 | So color equals blue.

00:19:08.500 | Imagine that--

00:19:10.020 | And so you're going to parse that answer.

00:19:13.540 | And you're going to interpret color equals blue into whatever you want with it.

00:19:17.860 | Except in the world of function calling, it's not color equals blue.

00:19:20.980 | It's set dash-- you have a tool, say, execute color, which takes a parameter with the value of the color.

00:19:29.780 | And instead of having just one, you have many possible functions.

00:19:34.500 | And same statistical probabilities.

00:19:37.780 | When you imagine you have one question to set the color and another function to order a pizza,

00:19:45.460 | If you ask what the color of the sky is, it's not going to generate an order pizza call for the same reason.

00:19:53.220 | It's exactly how it works.

00:19:54.500 | So how do you know when you function calling versus creating your own classification model?

00:20:00.420 | Okay, well, because a classification model cannot extract values, entities, out of a context and

00:20:11.140 | like answer many, many answers across multiple dimensions at the same time.

00:20:17.540 | Classification is just one out of n possibilities.

00:20:23.380 | So it's an entirely different type of task.

00:20:27.220 | Okay, let's move on.

00:20:30.100 | Okay, so I want you to, on your laptops, to open the following URL.

00:20:37.540 | So I'm sorry, it's not very big.

00:20:40.900 | I'm going to spell it.

00:20:42.020 | So Microsoft events dot

00:20:47.060 | learn on demand dot net.

00:20:54.660 | So Microsoft events, all attached, dot events, sorry, oh my god, dot learn on demand dot net.

00:21:10.660 | Is everybody on the page?

00:21:18.660 | Is everybody on the page?

00:21:19.700 | Okay.

00:21:21.140 | Then, you're going to log in.

00:21:24.740 | Be very careful.

00:21:25.700 | This is very important.

00:21:27.460 | You need to choose Microsoft account.

00:21:31.380 | Well, usually you're going to use Microsoft account.

00:21:36.980 | Huh, I thought we had all the types of, okay, so if it's a personal account, you need Microsoft account.

00:21:43.860 | If you have a corporate account, like if you have an account as part of your company,

00:21:48.740 | which is managed by Android, which is managed by Antra, you need to select Antra ID.

00:21:52.900 | So it's kind of misleading.

00:21:56.820 | Do not select Microsoft account if you use a corporate account.

00:22:00.980 | Use Antra ID.

00:22:01.940 | Then, you're going to land on that page.

00:22:16.260 | That page, you're going to have a redeem training key link.

00:22:23.140 | Click on it and then you're going to be asked to enter a training key.

00:22:28.500 | Enter the following training key.

00:22:32.980 | So, hopefully this is big enough and everybody can see, but let me know.

00:22:38.100 | Remember, raise your hands if you have any questions.

00:23:00.820 | You can use also a personal Microsoft account.

00:23:16.260 | Like, you can use an Outlook.com.

00:23:19.860 | You can use, if you have an Xbox, you can use an Xbox account.

00:23:26.980 | Any Microsoft account or any of the consumer domains out there.

00:23:33.140 | So, in your case, you might want to try a personal account.

00:23:38.260 | And to be honest, in that case today, it's usually easier.

00:23:45.300 | Like, you do not need to use a corporate account today.

00:23:49.140 | There is absolutely no need.

00:23:50.340 | I just have a question about that first slide you had up.

00:23:56.260 | So, you're saying an agent can go do a SQL query,

00:23:59.140 | come back, use the code interpreter, and then make a graph.

00:24:03.380 | Is there any reason you're using agents to that,

00:24:06.180 | as opposed to just training all the workflows and injecting the tools?

00:24:11.700 | You could totally do that.

00:24:13.540 | Yeah.

00:24:14.100 | I'm just wondering if there's a reason why we're using agents to that sort of stuff.

00:24:20.020 | Well, like I said, we need to go back to the definition of what an agent is.

00:24:25.460 | And like I said, it's a bit of an overloaded term those days.

00:24:34.180 | It encapsulates a wide range of definition, including the simplest.

00:24:38.740 | Today, we are not in the simplest use cases.

00:24:42.820 | We are not in the more complex use cases.

00:24:44.740 | For example, at the beginning, I said,

00:24:46.340 | one of the things that an agent can do is be goal driven and iterate until it achieves the goal.

00:24:54.260 | Today, we are not going to see that specific thing.

00:24:57.140 | We're going to stop one step.

00:24:58.820 | We're going to be in the middle in terms of complexity.

00:25:01.300 | Sorry.

00:25:02.980 | We are going to be above the simple completion.

00:25:06.580 | And we are not going to see the looping.

00:25:12.980 | We are going to be seeing a mix of using tools and code interpreter with multiple data sources,

00:25:22.740 | where the information from all those are mixed together to do reasoning and act on the words.

00:25:29.060 | We're going to see that.

00:25:30.580 | And yes, you could do that using just an LLM locally.

00:25:34.660 | Nothing would prevent you from doing that.

00:25:37.140 | Except today, we're using app service, which manages everything server side.

00:25:45.060 | Also, I'm going to show the limits.

00:25:48.900 | Today, I'm going to give examples so that you get a sense and you touch exactly when the current

00:25:58.580 | architecture hits its limits and when you need to go further and use a more orchestrated

00:26:09.780 | planning and orchestration that iterates which loops until it reaches the goal.

00:26:15.940 | And I'm going to give an example where you see where it breaks.

00:26:26.020 | Okay, where were we?

00:26:26.660 | Training key.

00:26:28.180 | Okay.

00:26:29.300 | After you click on launch.

00:26:30.260 | Building?

00:26:38.020 | Oh my god.

00:26:40.100 | It's not pre-built.

00:26:42.740 | I thought they would be pre-built.

00:26:58.260 | Did everybody click on the build button?

00:27:07.540 | Yes.

00:27:07.540 | Okay, please do.

00:27:10.340 | So that means we have eight minutes to answer questions.

00:27:13.700 | So we just had one, thank you.

00:27:19.300 | Yes, one more.

00:27:24.820 | If an agent is just an LLM running on a loop until it thinks that it's done what you've asked it to do.

00:27:30.980 | Yes.

00:27:31.940 | How does it know--

00:27:33.220 | When to stop?

00:27:34.420 | Yeah, how does it know when to kick back out of that recursion?

00:27:38.500 | Yeah, so today, we are not going to-- we are going to see the limits of not--

00:27:47.460 | Okay, you have two types of agents, two levels of complexity.

00:27:52.020 | An agent does not have to do the looping to be called an agent in the simpler

00:27:58.500 | area of the spectrum.

00:28:02.180 | In order to go one step further and do that looping, you need to use something like autogen, for example.

00:28:15.540 | Those types of agents, you need to define a criteria of done, a definition of done, basically.

00:28:22.100 | And you're going to have-- and the criteria can be implemented programmatically, deterministically,

00:28:28.420 | or using an LLM, which decides, is it done?

00:28:31.860 | Like, is the task finished?

00:28:33.380 | Or have we accomplished the goal?

00:28:35.860 | And the workflow engine is going to loop until the goal is reached.

00:28:43.380 | But it is the most tricky thing, in my opinion, fairly.

00:28:47.060 | That's when you're going to develop an agent, figuring out when to stop is tricky.

00:28:54.340 | For example, I often-- I've done a couple of prototypes with Browser Use, which is a famous

00:29:02.580 | open source agenting system for Browser Navigation.

00:29:06.020 | When you ask it to complete a task on the web, it's going to navigate from website to website,

00:29:10.740 | and read the pages, and do actions.

00:29:12.980 | The evaluation of when the task is done is clearly not always perfect.

00:29:24.100 | Yes?

00:29:26.100 | Sorry?

00:29:29.780 | One might be related.

00:29:31.780 | Yeah.

00:29:32.820 | The other one is, so my first question is, are we going to learn today, like,

00:29:37.540 | how to send feedback to the agent, like, in case the agent gives, like, incorrect answers or incomplete answers?

00:29:44.420 | You mean the looping?

00:29:46.580 | Yeah, yeah.

00:29:47.620 | So, no.

00:29:48.740 | Yeah.

00:29:49.460 | So, not today.

00:29:50.260 | Not today.

00:29:51.060 | Not today.

00:29:51.460 | The other question is, what's, like,

00:29:53.700 | engineer, what's the difference between, like, an AI agent and an MCP that's it?

00:29:59.620 | Oh.

00:29:59.620 | I don't know what that.

00:30:01.460 | So, it's very easy.

00:30:02.580 | It's a very easy explanation.

00:30:05.380 | So, basically, MCP is just a tool, a function.

00:30:08.580 | You remember I explained what a function is?

00:30:10.580 | So, an LLM is able to do function routing.

00:30:13.460 | Decide which function to call when you have a question.

00:30:20.020 | MCP is just that plus management of the lifecycle of the program which completes the-- which executes the function.

00:30:33.780 | Because normal function tooling, if you just take, like, a GPT or LAMA and you do function

00:30:41.940 | calling, what the LLM is going to return is just a JSON telling you the name of the function

00:30:46.340 | and the list of the values for each parameter, nothing more.

00:30:49.220 | It's your responsibility as an application developer to execute the function.

00:30:53.460 | So, I don't want to use the MCP for that?

00:30:57.540 | No, it's not that you need it.

00:30:59.380 | It's that it's one of the existing technologies out there that exist that you can use as a tool.

00:31:06.180 | And the advantage of MCP, at least as a client-side AI application developer,

00:31:13.780 | is that the MCP protocol takes care of downloading the executable, the binary,

00:31:20.740 | whether it's a node, Python, or whatever, download or a Docker image.

00:31:25.380 | Like, pull the executable on your machine and execute it automatically.

00:31:32.980 | And it's an overall protocol which comes with-- and also, it encapsulates the possibility for the tool

00:31:39.940 | to declare its functions.

00:31:42.020 | So, from the standpoint of the user, you just have to declare, oh, I want to use a file system MCP server

00:31:51.380 | or I want to use a blender MCP server or whatnot, and that's all you have to do.

00:31:57.380 | You can select it from a catalog and it's going to auto-declare what tools it has

00:32:02.100 | and automatically start it and stop it.

00:32:06.260 | Also, when you're done with your MCP client, it's going to stop all the MCP servers and clean up everything.

00:32:12.500 | Is it-- where are we with the--

00:32:19.620 | Okay.

00:32:21.940 | Where are we?

00:32:28.500 | Oh my god.

00:32:29.060 | Oh my god.

00:32:29.060 | Does it give it time?

00:32:31.700 | Seven minutes?

00:32:32.340 | Oh my god.

00:32:33.860 | I've got a question.

00:32:38.340 | Did we forget to do something?

00:32:39.700 | Yeah.

00:32:43.700 | Anyway.

00:32:44.660 | I had a question.

00:32:47.620 | I was going to wait and see kind of how the demo played out later, but one of the

00:32:55.380 | questions I often find making agents is the balance between

00:32:59.300 | making a fairly general agent that can do lots of things and just kind of giving it the tools,

00:33:04.340 | not giving it much direction versus having to be fairly controlled with it.

00:33:08.020 | You're kind of trading off the autonomy to a bit of reliability.

00:33:12.660 | The one that you set up before, I couldn't quite work out if you've literally just given it the tools

00:33:17.620 | and then it can do everything with those tools or if you've been quite controlled with it.

00:33:22.500 | How do you think about the trade-off and when people are using your tools do they tend to fall more on one side than the other?

00:33:27.140 | It's a more complex answer than it looks like.

00:33:32.900 | More complex question than it looks like because of two things.

00:33:40.100 | When you give instructions, it's like I was talking about the space of probabilities.

00:33:48.180 | The more vague you are, the more you leave options open.

00:33:53.380 | The more your agent is going to be able to do a wide range of things, but it might get it wrong.

00:34:00.580 | The more specific you are, the more you're restricting the things that it's going to do well,

00:34:08.980 | but it's going to do well more often.

00:34:10.420 | And then there is, you didn't really ask this, but I'm assuming that's

00:34:16.180 | on your mind, is how many tools can I give to my agent?

00:34:21.700 | And also another common question that we often get is,

00:34:25.300 | should I give all my tools to one agent or should I split tools on multiple agents or should I create

00:34:37.620 | should I only give one tool per agent?

00:34:40.900 | And the answer is same, it's complex, but it's kind of, imagine, same, you need to think in terms of

00:34:52.740 | statistical probabilities.

00:34:55.140 | like it's stochastic.

00:34:57.220 | And so when, imagine you have one agent and you give all the tools.

00:35:03.700 | And at every single instance for any question you ask, the LM has to decide which one of all

00:35:10.980 | those tools to call.

00:35:12.580 | the probability that it gets it wrong is higher, right?

00:35:15.140 | So the solution to that is to create agents which are more specialized by like areas of expertise

00:35:25.620 | and do some kind of routing and multi-step selection.

00:35:29.300 | Where instead of having one agent and giving all the tools, you, for example, you have a first round of agents

00:35:38.260 | that is going to determine, classify.

00:35:39.940 | We're talking about classification earlier.

00:35:42.500 | Classify the question and say, oh, this is a sales question or this is a product question.

00:35:48.820 | And then route to an agent which is more specialized for to answer things about sales or things about

00:35:54.740 | products.

00:35:56.180 | And then when the answer comes back, the first agent can say, oh, do I need to use another agent?

00:36:03.060 | So it's like an agent is like a multi-tool.

00:36:06.100 | So you can imagine like a tree of tools and each tool can be composite or leaf.

00:36:10.900 | You can see that way and autogen allows to build such topologies.

00:36:17.700 | It may specialize in each one could be working with a different database or a different track.

00:36:26.500 | Yeah.

00:36:26.900 | Yes.

00:36:28.420 | And then the coordinator or the maestro.

00:36:32.260 | Yes.

00:36:33.220 | Yes, but you can also have topologies where the different agents share memory.

00:36:40.820 | Because the thing is, sometimes you also have the situation where you have like a team of agents

00:36:49.060 | and they are specialized.

00:36:51.780 | And it's good that they are specialized because you don't want them to pick the wrong tool.

00:36:58.020 | For the reason I just--

00:36:59.220 | Or hallucinate or--

00:36:59.940 | Sorry?

00:37:00.260 | Or hallucinate or--

00:37:01.460 | Yeah, but what's going to prevent the hallucination is the grounding.

00:37:12.340 | So each one of those agents is going to be grounded in something.

00:37:14.900 | But something which is some kind of a grounding is memory.

00:37:21.380 | Like for example, if you have multiple tools that answer questions about a consumer,

00:37:28.260 | and you want to memorize, remember the preferences of your consumer, of your user.

00:37:34.500 | Like for example, what's the name of his or her pet?

00:37:39.060 | And you bet you want each one of your agents potentially to be able to use that information.

00:37:44.980 | So what you want to do is take that memory, connect that memory to each one of those agents.

00:37:49.700 | Even if you have multiple agents, they can-- each one of them have access to that shared memory.

00:37:54.500 | Because you want all of them to be able to access the name of the pet.

00:37:57.540 | So to be honest, it's complex.

00:38:02.500 | Like you have many topologies, which make sense depending on your use case.

00:38:06.260 | Where are we?

00:38:09.220 | Ah, finally.

00:38:12.100 | So where are we?

00:38:14.180 | Is every-- ah, one more.

00:38:15.620 | Ah, still building.

00:38:17.220 | One more.

00:38:19.140 | Can you please raise your hand when it's still building?

00:38:22.260 | And how did you try?

00:38:35.380 | And you refresh or something else?

00:38:36.820 | I think it went back to the .

00:38:40.340 | OK.

00:38:40.900 | So you can--

00:38:41.300 | I mean, I've already tried doors.

00:38:43.060 | Yeah.

00:38:43.620 | I don't like multiple times.

00:38:45.140 | OK.

00:38:45.540 | The way time is less.

00:38:46.980 | OK.

00:38:52.180 | Hopefully it's good.

00:38:52.900 | So is it still-- no.

00:38:53.860 | Is it still building?

00:38:56.100 | No.

00:38:56.420 | You're good.

00:38:57.380 | So sorry.

00:38:57.940 | Can you raise your hand again for those for which it's still building?

00:39:01.060 | For whom it's still building?

00:39:02.260 | So just one.

00:39:03.620 | OK.

00:39:05.140 | So I'm going to move--

00:39:06.100 | It's 20 seconds.

00:39:06.820 | No.

00:39:07.140 | OK.

00:39:07.620 | So hopefully it is actually 20 seconds.

00:39:09.700 | OK.

00:39:10.660 | I'm going to move ahead.

00:39:11.700 | Yes.

00:39:12.260 | I'm going to move ahead.

00:39:13.700 | Sorry?

00:39:17.940 | Yeah.

00:39:18.500 | Oh, cool.

00:39:18.980 | OK.

00:39:19.300 | So everybody is there.

00:39:20.260 | Awesome.

00:39:21.300 | So let me move on.

00:39:25.540 | And let me see, because I want to reconnect.

00:39:30.100 | I heard 404.

00:39:47.300 | That's not good.

00:39:54.340 | OK.

00:39:56.180 | So I have the same environment as you.

00:40:00.020 | So I'm going to show you how to make the most of it.

00:40:02.740 | Login as admin?

00:40:05.620 | Yes.

00:40:06.260 | No.

00:40:06.580 | Sorry.

00:40:07.380 | Yeah, I went too fast.

00:40:08.140 | I forgot.

00:40:09.540 | So when you have the black screen, I should have--

00:40:14.020 | OK, let me--

00:40:16.420 | When you have the black screen, the username should be pre-selected

00:40:23.340 | and should be admin.

00:40:24.860 | So there, you can click on password here, which is going to fill the password box.

00:40:31.260 | And then you hit Enter.

00:40:34.460 | Do not type.

00:40:36.060 | Every time you see a T like this, you do not need to type.

00:40:39.900 | You just click on it and it's going to auto type.

00:40:41.980 | Is everyone on that screen?

00:40:47.980 | Yeah.

00:40:49.020 | Yeah.

00:40:49.500 | OK.

00:40:50.540 | So we can start.

00:40:52.940 | OK.

00:40:56.940 | It says we're going to fast on this.

00:40:59.820 | I'm going to move on to--

00:41:03.660 | So what I'm going to do here, what you can do, it's not mandatory, but you can click here and go to split windows to get more real estate.

00:41:11.580 | So that way here, I'm moving the instructions away, and here, that allows me to have more space.

00:41:19.500 | Then, I'm going to go to getting started.

00:41:26.300 | Do we need getting started?

00:41:27.420 | I don't think we do.

00:41:28.140 | No, I think we can move on.

00:41:30.300 | No, we do, actually.

00:41:31.020 | So we're going to move on to getting started.

00:41:35.500 | What we are going to need to do is AZ login here.

00:41:38.700 | So I'm going to open up from here, Terminal.

00:41:41.740 | Type AZ login.

00:41:47.580 | Type select work or school account.

00:42:00.220 | So this is very important.

00:42:03.180 | Continue.

00:42:05.260 | Even if you connected the first time, even if you connected with your Microsoft account,

00:42:14.620 | in this instance now, we are not going to use your account.

00:42:19.580 | We are going to use a temporarily generated account, which is a work account.

00:42:24.780 | So you need to select work account.

00:42:28.540 | And there, you're going to select the user.

00:42:34.140 | So in the resources tab here, in the instructions here, in the pop-up, you click user.

00:42:40.460 | It's going to auto-type.

00:42:45.580 | Then next.

00:42:46.140 | Then the password.

00:42:50.460 | Sign in.

00:42:54.060 | Yes.

00:43:08.540 | We're all set.

00:43:09.260 | We're all set.

00:43:09.260 | There we go.

00:43:10.460 | Okay.

00:43:19.580 | So now I'm logged into AZ.

00:43:21.180 | So I can go back to the instructions here.

00:43:23.660 | And we're going to have to type that comment.

00:43:28.380 | Because in Azure, you have some roles that need to be assigned.

00:43:32.060 | And that's something we could not automate as part of the lab provisioning.

00:43:35.740 | So that's something you need to execute manually.

00:43:38.940 | So you just type that command.

00:43:40.700 | Pass it in the terminal here.

00:43:43.260 | You're going to have to say to agree to the warning and say pass anyway.

00:43:49.420 | And sure.

00:43:49.980 | And you're going to see a bunch of logs.

00:43:53.820 | And at the end, normally everything should be fine.

00:43:56.620 | Okay.

00:43:59.340 | Here we go.

00:43:59.820 | So you have a JSON.

00:44:01.900 | Is everybody there?

00:44:02.860 | Yeah.

00:44:04.620 | Okay?

00:44:04.940 | Okay.

00:44:12.860 | So now, finally, we're going to be able to open the workshop.

00:44:15.980 | So you type that command, which starts with git clone.

00:44:19.740 | So what we're going to do is we're going to check out the git repository from GitHub.

00:44:23.660 | And we're going to build the project and open it and install some code extension and open it in VS Code.

00:44:33.420 | So I'm going to go back here, type the command, passed anyway, enter.

00:44:40.460 | So like I said, it's cloning the repository, creating the Python virtual environment.

00:44:49.580 | So today we're going to use Python for this workshop.

00:44:52.300 | So when you get into the VM, you just open Edge, you open the browser,

00:45:08.300 | and the instructions should be displayed right away.

00:45:12.620 | You also have a form on the desktop right here.

00:45:14.620 | Yeah.

00:45:14.620 | Okay.

00:45:14.620 | So when you get into the VM, you just open Edge, you open the browser, and the instructions should be displayed

00:45:21.820 | right away.

00:45:25.340 | You must have clicked on the link.

00:45:48.780 | I don't know.

00:45:49.660 | I don't know how you get here.

00:45:55.100 | I don't know how you get here.

00:45:57.100 | I don't know how you get here, and I'm not sure what the ... Can I see?

00:46:05.420 | Yeah, I can start a little bit.

00:46:13.660 | Oh, here we go.

00:46:24.460 | So I was at this step, where I cloned the repository, and I installed the PDF extension.

00:46:37.100 | Then we're going to move on, and we're going to open VS Code.

00:46:43.420 | There we go.

00:46:52.620 | Okay.

00:46:53.020 | So now we have VS Code here.

00:46:56.940 | So now we still have a few setup steps before we can get to coding.

00:47:09.020 | So you're going to have to go to the Azure AI Foundry here.

00:47:13.020 | Sign in.

00:47:22.460 | For the sign-in, use that user from the instructions pane.

00:47:28.060 | Password.

00:47:39.260 | Okay.

00:47:44.060 | Stay signed in.

00:47:45.740 | Yes.

00:47:46.300 | Okay.

00:47:51.740 | So Azure AI Foundry.

00:47:53.660 | So now we are in AI Foundry.

00:47:55.580 | Then what we're going to do is we are going to search for ...

00:48:01.500 | Go down.

00:48:06.140 | Go down.

00:48:07.180 | We're going to ... Oh, my God.

00:48:10.140 | Come on.

00:48:10.540 | Come on.

00:48:10.540 | So we're going to open the project here.

00:48:17.180 | So be careful.

00:48:20.060 | You need to select the project, not the hub.

00:48:23.020 | Once you are on the project, you can ignore those pop-ups.

00:48:36.540 | Okay?

00:48:36.940 | And we're going to need that project connection string here.

00:48:41.340 | So you can copy it.

00:48:42.780 | And we're going to pass it in VS Code.

00:48:49.980 | So once you have copied the project connection string,

00:48:56.220 | you can go back to VS Code, search for the .anv.sample file, rename it.

00:49:04.700 | So we are going to remove the sample.

00:49:08.060 | And we're going to pass the connection string here.

00:49:17.420 | So that connection string is what allows us to connect to the AI Foundry project,

00:49:23.820 | in which we're going to deploy the AI agent.

00:49:25.980 | Be careful when you passed it to not forget characters, to pass between the double quotes.

00:49:33.740 | Otherwise, it's not going to work.

00:49:35.180 | So once we've done that, we can move on.

00:49:40.060 | So as you can see, we're going to use GPT-40.

00:49:44.140 | You can save the file.

00:49:45.980 | Okay.

00:49:48.060 | Is everybody at this point?

00:49:50.620 | This is important because now, finally, we have configured all the--

00:49:55.420 | I get a node over there.

00:49:56.460 | Can you guys go help?

00:49:59.100 | Mark?

00:50:07.660 | Is everybody else at this point?

00:50:13.500 | Okay.

00:50:16.140 | So, do we have everyone at this tape?

00:50:18.620 | Is everyone set up?

00:50:34.380 | Yes.

00:50:35.740 | So we can look at the code?

00:50:41.180 | Okay.

00:50:41.500 | So, the first thing that we are going to look at first is that I'm going to explain quickly

00:50:49.820 | what the files are.

00:50:51.180 | So the main file that we are going to look at today is the main.py file,

00:50:55.420 | which is the entry point for the application.

00:50:58.060 | It's where everything is configured.

00:50:59.820 | The goal of this workshop is not to give you, like, what code should look like in production.

00:51:07.580 | The goal of this workshop is for you to understand how it works, to understand all the pieces,

00:51:13.260 | because once you understand, you're going to be able to use any of those other frameworks.

00:51:19.740 | I mean, this one or another one, it's going to be the same.

00:51:22.380 | What matters is to understand how NLLM works, how function coding works, how grounding works.

00:51:27.820 | So, and the sales data, yeah, that's where the SQL query generation logic is.

00:51:38.620 | You have some stream handling.

00:51:40.620 | I mean, you have a bunch of things.

00:51:42.220 | Another directory which is very important is the shared/instructions directory.

00:51:51.900 | OK, so let's move on.

00:51:54.860 | So the first step, the most important thing is to understand function coding.

00:52:00.700 | So the first example is it's going to show you what it does.

00:52:07.660 | So in sales data, let's look at-- where am I?

00:52:20.860 | Yes, sales data.

00:52:22.540 | So that's where all the SQLite logic is.

00:52:31.340 | So you have a bunch of functions.

00:52:35.260 | So as you can see here, that function takes a SQL query.

00:53:03.740 | I mean, let me scroll so that you can see it.

00:53:07.340 | It's a SQL query.

00:53:08.380 | So the LLM is going to be the one generating the SQL query,

00:53:12.380 | compliant with SQLite syntax, and pass it to the function.

00:53:17.020 | And the function is actually pretty dumb.

00:53:18.700 | The function does not do much.

00:53:20.860 | It just uses the SQLite driver and executes the query, nothing more.

00:53:28.860 | That's because all the smartness of generating SQL query is done by the LLM.

00:53:33.900 | When you think about it, LLM is very good at language.

00:53:37.020 | And SQL query is a kind of language, like coding, like Python.

00:53:41.420 | And so-- and it happens that when the large language models are pre-trained, they are pre-trained on the massive amount of GitHub repository, and a lot of them contain SQLite queries.

00:53:54.700 | So if you use a common database, it's going to work, like PostgreSQL, MySQL, SQLite, MongoDB.

00:54:06.140 | If you use an exotic database that nobody has ever heard of, it's not going to work.

00:54:11.580 | Because the model needs to have been pre-trained on it.

00:54:14.620 | Here we go.

00:54:19.740 | OK.

00:54:23.340 | So the exercise is in the main.py file.

00:54:28.860 | So I'm going to open the main.py file here.

00:54:31.420 | So we're importing a bunch of packages.

00:54:35.740 | So those are the packages which are required to connect to AI Foundry.

00:54:41.580 | To get the models to authenticate with Azure identity.

00:54:46.700 | .onv is the package as it was to load the environment variable from a .onv file that we edited previously.

00:54:54.940 | You have some logging.

00:54:56.780 | OK.

00:54:58.300 | Let me move on.

00:54:59.340 | So here, that's how you connect to the AI Foundry workspace.

00:55:05.180 | Or project, I should say.

00:55:09.100 | OK.

00:55:11.580 | And so line 59, what you're going to do is comment the first instruction files.

00:55:17.740 | And then we're going to look at its content.

00:55:23.260 | So if you go to Instructions here, you open FunctionCoding here.

00:55:31.580 | Those are the instructions that are going to be passed to the agent.

00:55:37.740 | You are a sales analysis agent for Contoso, Retarder of Outdoor, Camping Gear.

00:55:46.540 | So it explains what personality the agent should have, what the mission of the agent should be,

00:55:54.380 | help users by answering sales related questions, at least what tools are available.

00:56:01.740 | So here, sales data tool, use only the Contoso sales database via the provided tool,

00:56:07.660 | with the name of the function.

00:56:09.980 | So to be honest, those instructions are optional.

00:56:15.900 | They are made to-- actually, it occurs to the question you were asking earlier.

00:56:23.420 | I didn't get your name, but--

00:56:24.460 | James.

00:56:25.260 | James.

00:56:25.980 | Where you were asking how specific you must be.

00:56:28.700 | The tool comes with its own schema.

00:56:32.300 | And the LM is going to understand the JSON schema of the tool, which explains the function name,

00:56:38.300 | the parameters, and you have an explanation of what each parameter is.

00:56:43.180 | Here, in the system instructions, you can add some more information that are more specific

00:56:49.420 | to your application or to the agent using the tool.

00:56:52.460 | Then you have information regarding formatting and localization and examples, et cetera.

00:57:01.740 | Well, it's quite thorough.

00:57:04.700 | And here, so we are specifying the instruction files, and we are also specifying which tools to use.

00:57:15.020 | So here, we are going to use the functions, which contains only one function tool, which is async,

00:57:22.460 | which is the function I showed to you earlier.

00:57:24.780 | Here we go.

00:57:28.540 | And also, the documentation here, the Python doc, is passed to the LLM.

00:57:37.820 | So what you write here is important, because the LLM is going to interpret that documentation,

00:57:44.460 | as well as the name of the parameters, plus the documentation of the parameters,

00:57:50.780 | to understand how to call your function.

00:57:57.900 | And then I'm going to open a terminal, command palette.

00:58:05.900 | And to be honest, I never remember where the terminal is.

00:58:08.540 | So what I do is that I cheat, and I create a terminal here.

00:58:15.980 | There we go.

00:58:25.660 | You need to remember we are on Windows, usually I have a Mac.

00:58:28.220 | Okay, we have looked at this, I have explained to you all of this.

00:58:41.260 | Run.

00:58:42.780 | So now we're going to hit F5.

00:58:46.780 | No, here, F5.

00:58:51.580 | Oh, I could do that too.

00:58:52.620 | I could click on the play button here.

00:58:54.700 | Okay.

00:58:56.860 | Yeah, you don't even need to open the terminal.

00:58:59.660 | You don't even need to do what I just did before.

00:59:01.500 | It's just a habit I have, but you can just click run here, it's going to work.

00:59:06.540 | I mean, hopefully, we'll see.

00:59:09.900 | So it's doing what I mentioned earlier, which is that first you need to create the agent.

00:59:20.140 | Once you created an agent, you're going to be assigned with an ID because it's stateful.

00:59:24.700 | Like the agent is actually an entity which lives in the AI Foundry project.

00:59:30.940 | Then we're going to enable the auto function calls.

00:59:34.940 | We're creating a thread.

00:59:36.940 | So the thread is basically the conversation, which is stateful too, because you don't have to save the

00:59:44.780 | messages yourself, and then we can finally enter a query.

00:59:48.860 | So I'm going to go back because we have a list of questions we can ask here.

00:59:57.260 | What were the sales by region?

00:59:59.980 | I go back to the terminal and past it.

01:00:02.940 | No, sorry, wrong terminal.

01:00:04.620 | This one.

01:00:09.980 | And if it works correctly, we'll see.

01:00:13.260 | Yes, it does work.

01:00:16.780 | So to be honest, every time I see that, it still amazes me, even if I've been doing that for a long

01:00:22.540 | time.

01:00:24.060 | Because basically from the simple question that we ask in plain English, based on the knowledge of the

01:00:33.500 | schema of the database, which is somewhere in the code, it can automatically generate such a query.

01:00:43.340 | Like, let me go back to the question I asked.

01:00:45.420 | What were the sales by region?

01:00:51.020 | So we are asking what the sales are.

01:00:54.300 | So where is it from?

01:00:59.420 | From sales data, because we are putting from the sales data table, and we want to group

01:01:09.260 | by region.

01:01:09.980 | So we want to sum the revenue, group by region, and automatically it limits to three.

01:01:18.460 | And I believe it's because in the instructions, there is an instruction that says that by default,

01:01:22.140 | you should limit to three.

01:01:23.100 | And so, no, 30, sorry, three, zero.

01:01:28.780 | So that's how we get that great table.

01:01:33.660 | And we can see that for Africa, we had 5.2 million Asia-Pacific, 5.3 million, et cetera.

01:01:41.260 | So we have our table.

01:01:44.140 | And now we can ask follow-up questions.

01:01:47.500 | What was last quarter's revenue?

01:01:51.580 | It gets more tricky.

01:01:52.460 | To be honest, I don't know the scheme of the database very well.

01:01:59.740 | So even for me, like I would have to look at it.

01:02:02.620 | If I had to write a SQL query, I'm sure you've all done that.

01:02:05.580 | Remember, the goal of such an agent is to enable non-technical people to use technical tools.

01:02:11.100 | And in this case, the technical tool is a SQL database.

01:02:15.260 | And so here, using--

01:02:17.100 | Oh.

01:02:21.980 | I don't remember that one.

01:02:26.060 | What's year?

01:02:27.740 | Let's try 2024.

01:02:42.700 | Okay, so this is interesting.

01:02:44.060 | Okay, so we did find--

01:02:49.100 | so the question was, what was last quarter's revenue?

01:02:53.660 | So I'm assuming that it's the--

01:02:55.260 | yeah, select the sum of the revenues from sales data, where year equals 2024.

01:03:02.460 | And we ask for the last quarter--

01:03:08.380 | sorry, did we say last?

01:03:09.740 | Last quarter.

01:03:11.980 | So I selected 7, 8, 9.

01:03:18.540 | So I'm not sure why he selected those three months,

01:03:29.020 | as opposed to, for example, 10, 11, 12.

01:03:33.580 | So let me correct it, actually.

01:03:37.180 | It's an interesting experiment.

01:03:38.380 | Every time we do that, obviously, we get different responses, right?

01:03:41.660 | So actually--

01:03:43.980 | I think it's because Oro's last training date was November of '24.

01:03:49.500 | So it's thinking last quarter from November is 7, 8, 9, or 4 November.

01:03:56.860 | So it's possible, but actually, let's say we're in April.

01:04:07.340 | Yeah, it works.

01:04:14.220 | So I just said, actually, we're in April,

01:04:18.060 | which means that the last quarter is January, February, March.

01:04:21.100 | And as you can see, it changed the request with 123.

01:04:25.580 | So one thing we could do to improve this example is add a date tool.

01:04:30.620 | We could add a tool that allows the LIM to ask for what is the current date,

01:04:37.340 | so that we would not have to enter it manually.

01:04:40.540 | Well, I'm going to leave it as an exercise.

01:04:43.340 | Which products sell best in Europe?

01:05:03.340 | So this time, it is interesting because we are generating a query against a different dimension,

01:05:13.660 | which is the product type.

01:05:16.700 | So as you can see here, we are grouping by product type.

01:05:19.740 | Okay, I'm going to pass on because you understand the concept.

01:05:25.260 | I'm going to ask the last one because I want to move on to the next examples

01:05:31.740 | and see what happens when you mix multiple data sources.

01:05:34.700 | Okay.

01:05:38.380 | Same one, bar region.

01:05:40.780 | Let's move on.

01:05:42.540 | Yes.

01:05:45.900 | Yes.

01:05:52.060 | Yes, good question.

01:05:57.340 | So let's go in here.

01:06:07.660 | Okay.

01:06:08.220 | Yes, sir.

01:06:08.380 | Thank you.

01:06:38.360 | Okay, so this is how.

01:06:56.380 | The way it works is that at initialization of the agent, so we are loading the instructions

01:07:02.460 | from the instructions files.

01:07:05.300 | So if I go back to load, replace, if I search for this in the function calling instruction

01:07:14.680 | file, there we go.

01:07:18.580 | So sales data tool, so here, and I passed too fast on this when I was going through

01:07:26.040 | the file and explaining it to you, but so what's going to happen here is that the instructions

01:07:33.020 | of the agents is going to contain what the schema of the database is.

01:07:38.580 | That's how the LLM knows what query to generate and what tables and columns are in the database.

01:07:50.380 | It would not work at all.

01:07:51.420 | Oh.

01:07:52.420 | Like, no way.

01:07:53.420 | Oh.

01:07:54.420 | It would completely hallucinate the schema.

01:07:58.560 | That you're grounding, basically, the LLM with what the schema of the database is.

01:08:03.640 | Can I do more specific, like, what if, just saying, like, what if one has some confidential data,

01:08:10.640 | and I don't really want to get data?

01:08:14.380 | Yes, you could.

01:08:15.020 | I can.

01:08:16.020 | Yeah.

01:08:17.020 | Yeah.

01:08:18.020 | You could.

01:08:19.020 | And actually, you don't even have to be too fancy for that.

01:08:22.160 | You can literally add it to the instructions file.

01:08:24.380 | You can say, hey, the column X of table Y is confidential.

01:08:29.640 | You can never return it.

01:08:33.540 | It is not bulletproof.

01:08:34.500 | It is not guaranteed that the LLM is going to follow those instructions.

01:08:39.220 | If you have confidentiality, I mean, matters, problems in that space where you want to restrict

01:08:48.980 | access, you should, somebody mentioned IAM, or I think it was you, you should implement this.

01:08:58.360 | You should add column level, and the best is to do what you would do with normal code.

01:09:04.380 | Because even if normal code is deterministic, and the odds of a user getting access to something

01:09:11.740 | he's not supposed to get access to, it's still possible.

01:09:14.180 | You can still use SQL injection, like all sorts of hacking, to exploit normal code.

01:09:22.200 | Obviously, with LLMs, it's a whole different area.

01:09:26.940 | So you should implement the exact same safety measures you would for a normal program.

01:09:31.200 | Where were we?

01:09:34.220 | Okay.

01:09:35.220 | Yes.

01:09:36.220 | If you were supposing that the schema is kind of perfect, it's a perfect work, a schema that

01:09:41.220 | will point, I mean, our agents to the right and table to the right--

01:09:45.220 | I'm not sure what you mean by perfect, because the schema of a SQL database is, I mean, has to

01:09:51.220 | be perfect, I hope.

01:09:57.220 | At least matching the data.

01:09:58.220 | Yeah, but I mean, it depends on the one that created, I mean, depends on the name of the

01:10:06.220 | column.

01:10:07.220 | It depends on the name of the column.

01:10:08.220 | So you mean clear?

01:10:09.220 | You mean interpretable?

01:10:10.220 | Okay.

01:10:11.220 | Yes.

01:10:12.220 | Yes.

01:10:13.220 | So in this case, but in a real world--

01:10:17.220 | Same answer as regarding security.

01:10:21.220 | It's the exact same answer.

01:10:23.220 | He asked, how can I make sure that confidential information is not returned if I want to?

01:10:29.360 | You add an instruction saying, do not use that column, it is confidential.

01:10:34.400 | To answer your question, what you would do is document your schema.

01:10:38.400 | It would say, that table which has a cryptic name that nobody understands is actually the

01:10:43.400 | list of orders.

01:10:44.400 | Yeah.

01:10:45.400 | So it's kind of a semantic layer.

01:10:47.400 | Yeah.

01:10:48.400 | Yes.

01:10:49.400 | That's exactly how it would work.

01:10:59.400 | Okay.

01:11:00.400 | Let's--

01:11:01.400 | Where am I?

01:11:09.400 | I'm lost.

01:11:10.400 | We did that.

01:11:17.400 | Okay.

01:11:18.400 | Okay.

01:11:19.400 | Basically explaining what I explained.

01:11:21.400 | Okay.

01:11:22.400 | Blah, blah, blah.

01:11:23.400 | Breakpoint, no.

01:11:24.400 | Okay.

01:11:25.400 | Okay.

01:11:26.400 | So this is interesting.

01:11:31.400 | Let me try that one.

01:11:37.400 | And I'm going to increase here.

01:11:43.400 | So what regions have the highest sales?

01:11:47.400 | Okay.

01:11:48.400 | So that's very interesting.

01:11:54.400 | Why is it interesting?

01:11:55.400 | Because to answer the previous question, which was total shipping costs by region.

01:12:06.400 | Remember that the LLM has access to the context of the discussion, of the past messages.

01:12:12.400 | It's stateful.

01:12:13.400 | It has all that context of the answers and questions that were previously asked.

01:12:21.400 | And it uses it when you ask a new question.

01:12:25.400 | For example, when I asked total shipping costs by region, it did not have the information required

01:12:32.400 | to answer within the previous answers.

01:12:35.400 | So it executed the SQL query.

01:12:38.400 | But now, when I asked what regions have the highest sales, it can actually use a table that

01:12:47.400 | was generated previously.

01:12:48.400 | There is no need to execute one more SQL query because it has everything it needs.

01:12:54.400 | Exactly like you.

01:12:55.400 | I mean, without doing anthropomorphic, but exactly as you, if you were reasoning about the problem

01:13:01.400 | and you were like, do I need to go write a SQL query in order to get that information?

01:13:06.400 | No, you don't.

01:13:07.400 | You have it already.

01:13:08.400 | So the LLM is going to do that reasoning.

01:13:13.400 | That's why here, you don't see any SQL query executed.

01:13:18.400 | Okay.

01:13:19.400 | Yes?

01:13:20.400 | How do we know it's doing math the right way?

01:13:22.400 | Because like--

01:13:23.400 | It's not doing math the right way.

01:13:26.400 | But here, we did not ask it to do math.

01:13:29.400 | We asked it to-- oh, you mean because there is a top.

01:13:34.400 | Okay.

01:13:35.400 | If you're adding the number--

01:13:37.400 | So, so highest-- okay, that's a very interesting question.

01:13:42.400 | So highest sales is not about doing math.

01:13:46.400 | It's about comparing.

01:13:47.400 | Like what's bigger or smaller?

01:13:50.400 | 11 or 1.

01:13:52.400 | Yeah.

01:13:53.400 | And that's easy because that's something that does not-- I mean, easy.

01:13:58.400 | What I mean is that LLMs are bad at math.

01:14:01.400 | They are bad at calculus, bad at addition, multiplication, division.

01:14:06.400 | It's going to get it wrong.

01:14:08.400 | And it's funny because I have an anecdote that's from this morning.

01:14:13.400 | We had a team meeting and we were wondering about the compound value of a monthly-- and you were there-- and what would be after a year the compound effect of a daily rate of increase of 1%.

01:14:32.400 | And in my mind, I remember the formula for this, which is you take the percentage, you add 1, and you put it to the power of the number of periods, right?

01:14:41.400 | And I executed it and I got a number.

01:14:44.400 | Somebody else was like, "Oh, that's big."

01:14:46.400 | And he typed it in GPT.

01:14:48.400 | And he got a different answer.

01:14:50.400 | And the LLM got it completely wrong.

01:14:53.400 | It was-- it was-- it was very convincing.

01:14:57.400 | It looked like it was doing right, but it was not.

01:15:00.400 | It was the original answer, which I typed in Google Sheet with a formula, was correct.

01:15:07.400 | So to answer your question, when you need to do math, you need to use a tool.

01:15:12.400 | Here, we're using code interpreter, which can do math, but to be honest, I've tried.

01:15:17.400 | It's not the best way to do math, to use code interpreter.

01:15:20.400 | It's good at generating diagrams, at reading files, extracting information from a CSV, from other types of structured documents.

01:15:30.400 | It's very good at that.

01:15:31.400 | If you try to do math with code interpreter for whatever reason, which I cannot explain, it does not work very well.

01:15:37.400 | When you want to do math, it's better to create your own tool, like calculate.

01:15:42.400 | And it takes like a LaTeX expression or some kind of mathematical formalism to represent an expression.

01:15:48.400 | You have plenty of mathematical calculators out there that you can use to do the actual and accurate math calculation.

01:16:00.400 | And so that the LLM can delegate to a tool the responsibility of doing calculations right.

01:16:07.400 | Exactly as you, as a human, would do it.

01:16:11.400 | Because me, I don't know, but 1% to the power of 365, I don't know how to do that myself.

01:16:19.400 | I use a calculator for that.

01:16:22.400 | And I'm going to have to reconnect again.

01:16:24.400 | Oh my god, this is annoying.

01:16:33.400 | OK.

01:16:34.400 | So, where was I?

01:16:36.400 | OK.

01:16:37.400 | What were the cells of tense?

01:16:40.400 | And here, we get a query.

01:16:55.400 | We need to ask, again, to do a query on the database.

01:17:04.400 | Because in the past information, we do not have that information.

01:17:07.400 | We don't know the drill down of tense in the United States in April 2022.

01:17:13.400 | So it needs to execute a SQL query.

01:17:16.400 | OK.

01:17:17.400 | I mean, you get the point.

01:17:18.400 | OK.

01:17:19.400 | Let's move on.

01:17:20.400 | I'm going to stop the tool.

01:17:25.400 | Exit.

01:17:26.400 | OK.

01:17:27.400 | Oh my god.

01:17:28.400 | What's happening now?

01:17:31.400 | No, I don't want to meet and chat with friends and family right now.

01:17:37.400 | OK.

01:17:38.400 | Sorry, mom.

01:17:42.400 | OK.

01:17:43.400 | Let's move on.

01:17:44.400 | Where am I?

01:17:45.400 | OK.

01:17:46.400 | Grounding with documents.

01:17:47.400 | Very important.

01:17:48.400 | So we're going to do RAG.

01:17:50.400 | VectorStore.

01:17:51.400 | You're asking if you're using a vector database for whatever reason.

01:18:06.400 | I was blanking.

01:18:07.400 | But yes, we are.

01:18:09.400 | OK.

01:18:10.400 | And so we're going to uncomment those lines in main.py.

01:18:15.400 | So I'm going to go here.

01:18:18.400 | Go back up.

01:18:19.400 | I think it's back up.

01:18:22.400 | Let me search for-- yeah, it's now the instructions.

01:18:28.400 | So file search.

01:18:31.400 | I'm going to comment that one.

01:18:34.400 | I'm going to uncomment the file store here.

01:18:40.400 | Can I do that?

01:18:41.400 | Yes.

01:18:42.400 | Cool.

01:18:43.400 | No.

01:18:44.400 | OK.

01:18:45.400 | It works just-- no.

01:18:46.400 | OK.

01:18:47.400 | Oh my god.

01:18:49.400 | Can I do that?

01:18:52.400 | No.

01:18:53.400 | Sorry.

01:18:54.400 | I was trying to select multiple lines at once.

01:18:59.400 | But I'm going to have to do it over the way.

01:19:03.400 | OK.

01:19:12.400 | So now we're defining-- we're creating a vector store.

01:19:16.400 | I'm not going to go into the details of the creation.

01:19:18.400 | There is a utility which does all the heavy lifting.

01:19:22.400 | But basically what it does is that it creates an AI search vector store.

01:19:28.400 | It reads all the documents which are here.

01:19:39.400 | Oh my god.

01:19:40.400 | The Wi-Fi today.

01:19:43.400 | I'd like to click on this.

01:19:45.400 | OK.

01:20:06.400 | I'm going to try to reconnect.

01:20:08.400 | OK.

01:20:24.400 | Password.

01:20:26.400 | OK.

01:20:27.400 | So like I was saying, I was going to show you the files.

01:20:34.400 | OK.

01:20:35.400 | Where are my files?

01:20:36.400 | Oh.

01:20:37.400 | Here we go.

01:20:38.400 | So it's a PDF which contains all the product information for my products.

01:20:53.400 | For tents, for whatever.

01:20:55.400 | And so what this function does is that it reads the PDF, chunks it, cuts it in small pieces,

01:21:02.400 | and uploads all those pieces into the vector database.

01:21:05.400 | I'm not going to go into the details.

01:21:07.400 | If we have time at the end, I can explain to you in excruciating details how that works.

01:21:13.400 | But not right now.

01:21:18.400 | So tool sets, yes.

01:21:22.400 | I think I wanted to show you something else.

01:21:24.400 | For whatever reason, I'm blanking.

01:21:25.400 | I forgot.

01:21:26.400 | Yes.

01:21:27.400 | I remember.

01:21:28.400 | I wanted to show you the difference between the instructions file we were using before and

01:21:42.400 | this one.

01:21:43.400 | So function coding and file search.

01:21:46.400 | So compare selected.

01:21:49.400 | So I'm using the div tool here included in VS Code to show you the difference.

01:21:57.400 | And so here, what's interesting is that you can see the difference between the instructions.

01:22:03.400 | So in the file search instruction, we have new instructions.

01:22:17.400 | So now we have a contested product information vector store.

01:22:21.400 | We have a search tool which allows to search into the vector database.

01:22:28.400 | And we have a few different things when it comes to content and clarification guidelines.

01:22:34.400 | Such as the kind of questions that you can ask.

01:22:37.400 | You know, with new questions like what brand of tents do we sell, which you could not ask before.

01:22:43.400 | Because you didn't have the brand information into the sales database.

01:22:50.400 | That kind of thing.

01:22:52.400 | So what's interesting with the use case here, and really the colleague that created that content,

01:22:59.400 | created something very interesting because it allows to show many things.

01:23:04.400 | Like when you have a source of information, a source of data, think about how before when

01:23:10.400 | we had to mix different data store, different databases, and we had to inject them into some

01:23:17.400 | kind of big cube system to elapses.

01:23:22.400 | whatever, even forget to do some very complex aggregated queries to link things together.

01:23:29.400 | Now you can do that with AI.

01:23:31.400 | Like here, when I want to link to join information from a database with information in a PDF,

01:23:37.400 | which is not structured, I can use the LLM for that, which is quite extraordinary.

01:23:43.400 | So when I hear all the skeptics about AI, I just don't understand because, I mean, that's just insane what you can do.

01:23:51.400 | Anyway, I'm just going to skip for the rest of the difference between those instructions.

01:23:56.400 | But I just wanted to show you that.

01:24:00.400 | Now we're going to execute it.

01:24:03.400 | And I can type run.

01:24:09.400 | OK.

01:24:10.400 | So as I said, it's uploading the PDF.

01:24:13.400 | And actually, the chunking, I said it was chunking.

01:24:17.400 | Actually, the chunking is done by AI search, I think.

01:24:20.400 | I forgot if it's client-side or server-side, but I think it's server-side.

01:24:23.400 | It's one of the features of agent services that you do not have to take care of chunking yourself.

01:24:32.400 | I'm creating a blah, blah, blah.

01:24:33.400 | OK, OK.

01:24:34.400 | OK.

01:24:35.400 | Same.

01:24:36.400 | So now we can type our query.

01:24:39.400 | I explained this.

01:24:44.400 | Run, run.

01:24:45.400 | OK.

01:24:46.400 | OK.

01:24:47.400 | Oh.

01:24:48.400 | Here are our questions.

01:24:49.400 | So what brands often do we sell?

01:24:52.400 | Now we can ask this.

01:25:03.400 | And so this time-- so what's interesting is that this is not using the SQL database.

01:25:10.400 | This is only reading from the PDF.

01:25:12.400 | Nothing else because there is no sales information required to answer that question.

01:25:17.400 | Also, remember that I killed the previous instance of the agent.

01:25:21.400 | I created a new one.

01:25:23.400 | So all the past conversation was lost on purpose.

01:25:27.400 | This is-- I could have reused the same conversation if I wanted to keep the previous context, but I

01:25:34.400 | don't want.

01:25:38.400 | OK.

01:25:39.400 | So outdoor living and alpine gear plus some information about it.

01:25:43.400 | OK.

01:25:44.400 | Cool.

01:25:45.400 | Simple enough.

01:25:46.400 | Now, that's interesting.

01:25:49.400 | We're asking which one do we sell.

01:25:51.400 | So are we going to query the database?

01:25:54.400 | No.

01:25:59.400 | Not yet.

01:26:00.400 | Oh, OK.

01:26:03.400 | That's because we do not sell hiking shoes.

01:26:06.400 | So it's not using the SQL database.

01:26:07.400 | It's just using the PDF.

01:26:09.400 | But what's interesting here is that there is no information about hiking shoes in the PDF.

01:26:13.400 | OK.

01:26:14.400 | Let's move on.

01:26:15.400 | So now what's interesting is that it's referring to the previous brands we talked about.

01:26:25.400 | Outdoor living and alpine gear.

01:26:27.400 | So what product type and categories are these brands associated with?

01:26:30.400 | Same.

01:26:31.400 | Only PDF.

01:26:32.400 | Outdoor living does tents.

01:26:35.400 | Camping and hiking.

01:26:37.400 | Both, actually.

01:26:38.400 | Both do the same.

01:26:40.400 | To be honest, it could have summarized it since both do the same, but it did not.

01:26:45.400 | What were the sales of tents?

01:26:47.400 | Oh, now it's interesting.

01:26:48.400 | Because now we are asking for specific sales information about a specific year.

01:26:53.400 | So now it's going to need to query the SQL database.

01:26:56.400 | So we are asking for the sales of tents in 2024 by product type.

01:27:02.400 | Include the brands associated with each.

01:27:09.400 | So what's very interesting here is that the product type and the total sales, I think, come-- yes.

01:27:17.400 | The product type and the revenue come from the SQL database.

01:27:21.400 | But the brand does not.

01:27:24.400 | The brand comes from the PDF.

01:27:27.400 | So because in the questions before, we read the mapping between the product type and the brand,

01:27:35.400 | that's how it's capable of adding the brand into the table here.

01:27:38.400 | And that's also-- I'm not going to do the demonstration right now.

01:27:45.400 | I'm going to do it at the end.

01:27:47.400 | At the end, I'm going to kill the program.

01:27:48.400 | And I'm going to restart it.

01:27:50.400 | And I'm going to re-ask the exact same question.

01:27:52.400 | And you're going to see the difference in the question.

01:27:55.400 | Hint, because I'm going to ask that question without the previous question.

01:28:00.400 | And because what I said earlier-- remember when I said that an agent can be goal-oriented

01:28:06.400 | and relentlessly work until it achieves the goal.

01:28:10.400 | In this instance, we have not implemented the loop.

01:28:14.400 | So it does not have the planning capability.

01:28:18.400 | So it's going to fall short.

01:28:20.400 | It's not going to be able to say, oh, but in order to answer that question, I need to look

01:28:25.400 | into the PDF and into the-- well, you might get lucky to be honest.

01:28:30.400 | It might do it out of luck, because sometimes it does.

01:28:33.400 | Because sometimes the planning is simple enough that it doesn't need multiple steps.

01:28:36.400 | But if it gets slightly complicated and you need multiple-step planning, it's going to fall short.

01:28:44.400 | Okay.

01:28:45.400 | Next one.

01:28:46.400 | Next one.

01:28:50.400 | What were the sales of Alpine Gear in 2024 by region?

01:28:58.400 | So very interesting, too, because the database does not contain the brand.

01:29:15.400 | So here, what it does is say where product type like-- it automatically generates a like matching

01:29:24.400 | criteria based on the brand, because it knows that the brand does family camping tents.

01:29:33.400 | So now we're going to generate charts, and we're going to use a code interpreter for this.

01:29:39.400 | Oh, and like I told you, I told you before I do that, I'm going to make an experiment.

01:29:45.400 | And I'm going to ask again-- the last question actually, the most-- yeah, this one.

01:29:57.400 | The last question.

01:29:59.400 | And this one most likely is going to fall short.

01:30:04.400 | So now I'm executing the agent again.

01:30:10.400 | Yes.

01:30:11.400 | Okay, and I'm going to create-- to ask the last question.

01:30:21.400 | Right.

01:30:22.400 | So here, in theory, it's not going to work.

01:30:27.400 | Yes.

01:30:28.400 | Oh.

01:30:29.400 | What the heck?

01:30:30.400 | What the heck?

01:30:43.400 | Well, it worked.

01:30:45.400 | I guess it figured out that it needed to use the product database.

01:30:48.400 | Or maybe there is an instruction in the file search that says that when you ask a sales question,

01:30:57.400 | requiring product information to go read the PDF.

01:31:00.400 | It's possible.

01:31:02.400 | It's possible.

01:31:03.400 | So, yeah, I think that's what happened here.

01:31:07.400 | Question on the board after the report.

01:31:09.400 | Oh, you tried?

01:31:10.400 | Yeah.

01:31:12.400 | I might have to quit and launch again.

01:31:24.400 | Yeah.

01:31:25.400 | I know my response is different from Gary's.

01:31:26.400 | I got-- it gave me Alpine and Alp-- it doubled up on the second result.

01:31:31.400 | It-- it what?

01:31:32.400 | It doubled up on the-- when you did it, it said it was just-- it was just-- it was outdoor

01:31:37.400 | job living for backpacking and camping tent was Alpine Gary.

01:31:41.400 | But from my-- it said I wouldn't put a little bit of hope.

01:31:45.400 | OK.

01:31:46.400 | It's just one to one after the other.

01:31:47.400 | OK.

01:31:48.400 | Interesting.

01:31:49.400 | Yeah.

01:31:50.400 | It's not deterministic.

01:31:51.400 | So it's not perfect.

01:31:53.400 | It's going to make a mistake for sure.

01:31:56.400 | Which is-- is that the whole eval thing?

01:32:02.400 | Like, I guess that's where I struggle sometimes.

01:32:04.400 | Yes.

01:32:05.400 | It is that whole--

01:32:06.400 | So I give a breakout session, by the way, on evals.

01:32:08.400 | OK.

01:32:09.400 | Tomorrow.

01:32:10.400 | OK.

01:32:11.400 | At 12:30, I think.

01:32:12.400 | Or 2:00 PM.

01:32:13.400 | I forget.

01:32:14.400 | But I'm going to give a breakout session tomorrow specifically on how to evaluate agents.

01:32:21.400 | because, yes, it's--

01:32:22.400 | It's a little bit because it's like, well, the answer might be right.

01:32:26.400 | It's a problem.

01:32:27.400 | It's hard.

01:32:28.400 | OK.

01:32:31.400 | Let's move on.

01:32:32.400 | Cutting temperature.

01:32:35.400 | So now, we're going to go back to our main.py.

01:32:43.400 | And I'm going to comment that one.

01:32:46.400 | OK.

01:32:50.400 | And every time, we keep the previous tools.

01:32:52.400 | We just add new tools.

01:32:54.400 | OK.

01:32:57.400 | So we can also just have all the tools running, right?

01:33:01.400 | Yes.

01:33:02.400 | At the end, when I'm going to uncomment everything, all the tools will be working at the same time.

01:33:07.400 | But what's very interesting is to uncomment-- it's a workshop.

01:33:11.400 | It's so that you get an understanding of exactly the certainty of how everything works.

01:33:18.400 | It just works.

01:33:19.400 | OK.

01:33:25.400 | So run.

01:33:39.400 | So we're going to look at the instruction.

01:33:40.400 | And we're going to compare code interpreter with file search.

01:33:49.400 | Compare.

01:33:50.400 | So here, with code interpreter, we add visualizations.

01:34:01.400 | We add a chapter, a section in the instruction, which explains how to do visualization.

01:34:07.400 | So when you have questions involving visualization, we basically say, hey, go use the code interpreter

01:34:13.400 | to generate charts.

01:34:14.400 | And at the end, yeah, PNG.

01:34:20.400 | OK.

01:34:21.400 | Let's move on.

01:34:22.400 | Main.

01:34:23.400 | There we go.

01:34:23.400 | OK.

01:34:23.400 | So show sales by region as a pie chart.

01:34:29.400 | Remember, in the previous-- in the slide I showed before, we were first asking for the sales by region, and then for a pie chart.

01:34:44.400 | Here, we're asking for both at the same time.

01:35:00.400 | And so it's tricky because sometimes when you need-- because basically what this is doing is that it's calling two tools.

01:35:13.400 | And sometimes, it's simple enough, the reasoning capabilities of the LLM is capable in, like, one question and answer to--

01:35:23.400 | it has enough reasoning power to say, oh, I need to call that tool and then this one.

01:35:30.400 | And sometimes it falls short and it does not have enough reasoning power.

01:35:35.400 | And you need to use a multi-step system.

01:35:42.400 | OK.

01:35:43.400 | So here, we saved the results here, so I'm going to-- OK.

01:35:47.400 | So here is a diagram, the pie chart diagram.

01:35:50.400 | With our answer, I can try to zoom to show-- OK.

01:35:56.400 | So we have the drill down of revenue by region.

01:35:59.400 | OK.

01:36:01.400 | Cool.

01:36:02.400 | I go fast on this, but this is incredible.

01:36:06.400 | I mean, another who is literally generating Python code to generate the pie charts.

01:36:12.400 | So basically, what that means is that you can generate a pretty crazy diagrams idea.

01:36:18.400 | You can be pretty imaginative in the type of information that you want to display.

01:36:27.400 | So for the anecdote, last year at this conference, I gave a talk on cognitive pressure because at the time it was a hot topic.

01:36:37.400 | It was brand new.

01:36:38.400 | And what I had done, I do kite surfing.

01:36:41.400 | And I had extracted one of my sessions where, you know, you're on the water and you do attacks, you do turns, you do jumps.

01:36:50.400 | And I had the XML file of my session recorded from my watch, exported.

01:36:55.400 | I imported it into a code interpreter, and I asked to calculate how many times I turned or how many times I jumped.

01:37:04.400 | And from-- it was able to read the XML because XML is a very rich, strict format and self-sufficient.

01:37:15.400 | And extracted all the information and the structure and generated the Python code to go through all the data points in the file and calculate how many turns and jumps and height I had during my session.

01:37:31.400 | It's a pretty incredible tool when used right.

01:37:35.400 | Anyway, end of the anecdote.

01:37:38.400 | It shows blah, blah, blah, blah.

01:37:39.400 | Okay.

01:37:40.400 | Next.

01:37:41.400 | Why download the JSON?

01:37:52.400 | I forget what this one does, to be honest.

01:38:06.400 | Yeah, okay.

01:38:08.400 | I mean, it can be useful, but let's move on.

01:38:11.400 | Continue asking questions about-- yeah, because there was no tools involved.

01:38:16.400 | Yeah, just interpreted whatever was in the context before and downloaded the JSON.

01:38:22.400 | Blah, blah, blah.

01:38:25.400 | What would be-- huh, interesting.

01:38:29.400 | I mean, it's really crazy what you can do.

01:38:34.400 | You can really, really be imaginative.

01:38:40.400 | By the way, you hear a lot about one of the latest things in AI is deep research.

01:38:48.400 | Under the hood, deep research is an agent that has those tools.

01:38:54.400 | Just that it's been developed by an army of engineers and they've made sure that everything works well.

01:39:01.400 | But it's basically what it's doing.

01:39:04.400 | I don't know how it's implemented, but I'm assuming it's a mix of or some kind of orchestration system that mixes tools together and it loops until it achieves a goal.

01:39:16.400 | And it can run for hours.

01:39:18.400 | Anyway.

01:39:19.400 | So what would be the impact of a shock event, 20% sales drop in one region?

01:39:29.400 | Oh, yeah.

01:39:30.400 | Can you elaborate on the difference between what this can do, how it's not looping, but if you use Autodesk and you can loop through the different tools that it needs?

01:39:42.400 | Yeah.

01:39:43.400 | I see that it has planning orchestration.

01:39:46.400 | It's able to use several tools at the same time.

01:39:49.400 | Two.

01:39:50.400 | Yeah, two.

01:39:51.400 | Fine.

01:39:52.400 | And, like for example, let me try to find an example and, like for example, if you ask a question, you have two data sources, right?

01:40:03.400 | One is product, one is sales.

01:40:05.400 | Imagine you ask a question for which it is obvious that you need to query one, but it's not obvious that you need to query the other.

01:40:14.400 | The information is in the other, but you have no hint in the phrasing of the question that you should be querying the second in order to answer the question as a whole.

01:40:28.400 | It would fall short.

01:40:29.400 | This system would fall short, not the Autodesk?

01:40:32.400 | Yes.

01:40:33.400 | Depending on how you configure it.

01:40:35.400 | And this one, the falling short, would also depend on what instructions you gave it, because you can hack your way through those kinds of problems.

01:40:44.400 | Because if you, you can just, in your instruction, add some instructions saying, hey, if the user asks for that kind of question, you should also go look into the data source.

01:40:54.400 | But, the difference, sorry, the difference with a multi-agent, multi-step planning system, with a more complex topology, is that you have an LLM, which every time you get an answer, looks at the answer and asks itself the question, is the answer correct?

01:41:13.400 | Is this answer answering the question or not?

01:41:16.400 | And it's capable of extrapolating.

01:41:18.400 | It can do something such as, huh, this is not answering the question.

01:41:22.400 | What do I have at my disposal?

01:41:24.400 | What could I do more to try to answer the question?

01:41:27.400 | And then, maybe the first step, it did not query the second data store, but it's going to say, actually, maybe we should go look into the data store, because actually, maybe the answer is in there.

01:41:38.400 | Or we should query the internet.

01:41:39.400 | Or we should query the internet.

01:41:41.400 | See what I mean?

01:41:42.400 | And auto-type .

01:41:43.400 | Yes.

01:41:44.400 | So you can query it, right?

01:41:46.400 | Yes.

01:41:47.400 | Automatic kernel or other more, like, collaborative multi-agent system.

01:41:52.400 | Yes.

01:41:53.400 | Automatic kernel.

01:41:54.400 | But this is today, it's an introductory.

01:41:56.400 | So we don't go into those more complex.

01:41:58.400 | It's a workshop of its own, just looking at multi-agent systems.

01:42:08.400 | Yes.

01:42:09.400 | So you said that sometimes, I mean, the answer is not obvious, or the question is the prompt itself.

01:42:18.400 | It doesn't point to the right, maybe, set of data.

01:42:21.400 | Yes.

01:42:22.400 | That's exactly what we were saying.

01:42:33.400 | That's a multi-agent system.

01:42:35.400 | And a coordinator, when you use the word coordinator, it's actually one of the patterns, we call it a topology, it's one of the topologies of multi-agent systems.

01:42:45.400 | And you can build such a topology with autogen.

01:42:51.400 | Yeah.

01:42:52.400 | Heads up, it's not trivial.

01:42:54.400 | It's not trivial.

01:42:55.400 | Not trivial.

01:42:56.400 | The hardest question in multi-agent systems is, in my opinion, how to specify the definition of DOM.

01:43:13.400 | So we did not open.

01:43:20.400 | Did we?

01:43:21.400 | No, we did not.

01:43:22.400 | OK.

01:43:23.400 | So this is a simulation.

01:43:27.400 | Impact of a 20% sales drop in North America on global sales distribution.

01:43:34.400 | Percentage of global sales.

01:43:40.400 | Oh, yeah.

01:43:41.400 | Because in North America, here it's dropping.

01:43:50.400 | So the percentage, obviously, of North America goes down, while the percentage of others goes up.

01:43:56.400 | That's a pretty cool example.

01:43:59.400 | What if the shock event was 50%?

01:44:03.400 | I mean, I'm going to skip because it's pretty obvious what it's going to do.

01:44:08.400 | So, which regions have sales above or below their range?

01:44:15.400 | Oh, yeah.

01:44:16.400 | Interesting.

01:44:16.400 | So maybe the previous questions-- the previous question was-- no, I don't think so.

01:44:28.400 | I think it's a--

01:44:29.400 | Huh.

01:44:30.400 | Do we have a problem?

01:44:31.400 | Connectivity problem, maybe?

01:44:33.400 | Interesting. So maybe the previous question was-- no, I don't think so. I think it's-- huh. Do we have a problem? Connectivity problem, maybe?

01:44:56.400 | We'll launch it again.

01:45:03.400 | OK.

01:45:04.400 | Let me just--

01:45:31.400 | OK. Let me just-- so we are-- OK. I want to show you real quick the Bing grounding with Bing Search.

01:45:40.400 | So I'm going to skip over that one because this is important. Very often you want to add internet search capabilities.

01:45:50.400 | So I'm going to comment-- uncomment this. Use the other instruction file.

01:45:59.400 | Ah. What did I do?

01:46:10.400 | OK. Here we go.

01:46:14.400 | And the last one, code interpreter multilingual. We skip it. It's important because that's how to configure code interpreter.

01:46:22.400 | Once you want to use-- you want to work on non-English languages. So with specific encoding, specific fonts, that kind of thing.

01:46:34.400 | But we're going to skip it for today. Resource not found? Huh.

01:46:43.400 | Well, that's interesting. Did you have that problem? Did you try to that point? Did you have that issue?

01:46:52.400 | Is it working for you? Is it working for you? I have that error.

01:46:59.400 | You have the same? Yeah.

01:47:01.400 | Maybe we've had to change. OK. And I'm not sure what happened. It may be-- because built was recently. And a lot of APIs have changed. And maybe that one changed too? I'm not sure.

01:47:23.400 | Anyway, what would have happened is that-- let's look at the question.

01:47:42.400 | OK. So here, what beginner tends to sell? Queries, our products, PDF. What beginner tends to our competitors sell, include prices, needs to go query the internet.

01:47:57.400 | You need to go query information about what competitors are out there and what they sell and for how much.

01:48:06.400 | So that would use the Bing grounding tool, et cetera. So same logic as before, we showed-- and I'm going to wrap up and go back to my slide.

01:48:35.400 | OK. But just to finish my sentence. So what we bring-- one, two, three-- Bing grounding is just one more tool.

01:48:47.400 | That same as the database query tool, the vector database file indexing tool is one more source of data that the agent can use to make informed decisions.

01:49:04.400 | Or rather reports.

01:49:09.400 | So we've seen how to do more with function calling, like with a bunch of very powerful tools.

01:49:16.400 | And yeah, Azure AI agent service.

01:49:21.400 | So yeah.

01:49:23.400 | The UI today was very simple.

01:49:25.400 | It's always a question when you do workshops like this, like do we go with a web UI which is complicated and adds some React and web stuff that some people might not be familiar with.

01:49:32.400 | That one is like bare bone, like it's just CLI, the bare minimum of code focusing explicitly on making sure you understand how to build such an application.

01:49:48.400 | That one works because it's a whole new paradigm.

01:49:52.400 | And building those is not trivial.

01:49:54.400 | And as you mentioned, there is a question of evaluation.

01:49:57.400 | And so when it comes to evaluating normal, just an LLM, question answer, there has been many frameworks out there for some time now.

01:50:09.400 | But when it comes to evaluating agents, it's a whole new world again.

01:50:14.400 | Because not only you are evaluating one answer, but you need to evaluate a whole conversation.

01:50:21.400 | Plus, because you need to call tools.

01:50:23.400 | So you need to evaluate whether the good tool, the accuracy of which tools were selected.

01:50:30.400 | Anyway, we're going to see all of this tomorrow during the breakout session where I'm going to introduce the Azure AI evaluation SDK with the agent evaluation capabilities, which is fascinating.

01:50:44.400 | You have on that QR code a bunch of additional resources.

01:50:56.400 | And I'm going to point back to my contact if you want to.

01:51:02.400 | Do you have questions?

01:51:18.400 | We have three minutes.

01:51:20.400 | Oh, no.

01:51:21.400 | One minute, sorry.

01:51:25.400 | Sorry, first.

01:51:27.400 | The tools which are used by agents, should they be part of the Azure ecosystem?

01:51:33.400 | No.

01:51:34.400 | So you can mix.

01:51:36.400 | When you create an AI foundry, an agent, you have a bunch of pre-made tools that are readily available.

01:51:43.400 | You just have to click on it and configure it.

01:51:46.400 | Like I said, I can't remember from the top of my mind.

01:51:48.400 | You would have to go look, to be honest.

01:51:51.400 | But the tools we define today, they are client side.

01:51:55.400 | We define them using code.

01:51:58.400 | Well, except Bing grounding, because Bing grounding is you decline client side, but the actual Bing grounding call is happening on the server side.

01:52:10.400 | But the SQL query is client side.

01:52:12.400 | So we had a mix, actually, to that.

01:52:16.400 | So I know that the agent service and ad foundry is crazy right now.

01:52:22.400 | So what are the -- why should we use the agent service versus just the serving service for, like, GPT 4.0, 4.1?

01:52:31.400 | So I guess if we can instantiate the tools in the application itself, do we need to use the agent service, then, inside of Azure?

01:52:46.400 | Or can we just stick with the regular old LLM kind of thing?

01:52:50.400 | So you do not need to.

01:52:53.400 | Okay.

01:52:54.400 | Like I said earlier, everything I just showed today with agent -- Azure AI agent service is just a managed AI system.

01:53:08.400 | managed in the sense that it takes -- like most developers, every time you build an AI application those days, you need a vector database, you need conversations, you need to remember those, you need to store those, you need to search on the web.

01:53:23.400 | I mean, those are the basic features that every single AI application out there needs.

01:53:28.400 | I mean, most of them.

01:53:29.400 | So Azure AI agent service makes it super easy, manages everything for you.

01:53:36.400 | But no, you do not have to use it.

01:53:39.400 | You can use the bare bone LLM completion from two years ago, and you can do whatever you want with it.

01:53:48.400 | It's just -- it's easier.

01:53:53.400 | You have a full-blown API.

01:53:55.400 | It manages persistence.

01:53:56.400 | It's just easier.

01:53:57.400 | You have a full-blown API.

01:53:58.400 | You have a full-blown API.

01:53:59.400 | You have a full-blown API.

01:54:00.400 | You have a full-blown API.

01:54:01.400 | You have a full-blown API.

01:54:02.400 | You have a full-blown API.

01:54:03.400 | You have a full-blown API.

Building Code First AI Agents with Azure AI Agent Service — Cedric Vidal, Microsoft