Data is Your Differentiator: Building Secure and Tailored AI Systems

00:00:00.000 | So today we are going to talk about data as your differentiator or maybe about coconuts.

00:00:21.760 | Let's see, right? So that's why I was like I'm confident that I'll wake you up a little bit.

00:00:28.080 | And by the way, these coconuts dance also. Courtesy of Amazon models.

00:00:33.760 | Okay, enough about coconuts and we'll connect the dots and revisit our coconuts because they're dancing and I love dancing coconuts.

00:00:42.960 | Right? Remind us of the beach and the, you know, vacation and everything.

00:00:48.000 | But let's come back from vacation now. Serious work to do.

00:00:51.280 | Okay, so we have been talking about generative AI.

00:00:55.200 | And we all know that generative AI adds a lot of business value.

00:01:00.880 | If done right. Right? That's a big if.

00:01:04.880 | And it can take your business to the highest level as you can see in this image.

00:01:10.880 | But however, when you are building something that big that can change how your users interact with your applications,

00:01:20.560 | then think about the foundation. Right?

00:01:24.560 | The foundation has to be even deeper. And what is that foundation?

00:01:30.240 | That's your data. Because your data is representing your company, your brand, your organization.

00:01:38.240 | So you have to get the data together.

00:01:40.240 | And we all know this since the very beginning of machine learning world.

00:01:44.320 | And you would say, money, what is new that you are, that you are, that you are, you know, presenting over here.

00:01:49.920 | So the new thing is that the data requirements for building generative AI applications

00:01:55.600 | are different. Generative AI is different. And so there are special treatment that you need to give

00:02:02.720 | to the data based on your application and the business requirements.

00:02:07.600 | So that is exactly what we are going to talk about today.

00:02:10.480 | And data is not just about, okay, I have to transform my data.

00:02:15.200 | I have to load my data. I have to parse it. I have to load it. You know, all that stuff.

00:02:18.800 | That is still there and very important. But the most important thing is how is your data

00:02:24.240 | interacting with the technology? How is it interacting with your people?

00:02:28.960 | Do you still have data silos? If yes, then what about your applications? Because these models can take up

00:02:35.520 | lots and lots of data, right? So how are we doing with all those interactions? Because no longer we can afford to be in a silo.

00:02:43.520 | And that's what that's the exact reason why we have to get it right. And our foundation is

00:02:50.560 | data. So let's see.

00:02:53.280 | Now there are certain common applications. Let's take an example. And there are so many

00:02:58.960 | applications. The possibilities are in finite, right?

00:03:03.120 | However, let's, you know, break it down and see with these three applications how your

00:03:08.800 | data requirements may differ. The first example is the travel agent. So if you're building a travel

00:03:15.760 | agent, what kind of data would you need? First, with that travel agent, it's going to interact with

00:03:21.040 | your end users. So it needs to know about your customer profile, right? Because if it doesn't know about

00:03:28.720 | your customer profile, it's literally like remembering somebody's name when they have left the room.

00:03:35.200 | How weird that would be, right? Because you want to give personalized experience and personalized

00:03:41.200 | experience require personalized data. And it also brings up a lot of responsibility because you can no

00:03:47.040 | longer afford to have PII information disclosed. You have to be responsible. You have to maintain your brand

00:03:54.320 | image. In addition to that, you also need company data. For example, your travel policies. If somebody

00:04:00.560 | asks for a refund, whether that person qualifies for the refund of the ticket or not, it will be defined

00:04:06.560 | by your travel policies, which airlines, so on and so forth, right? So a lot goes into when you are

00:04:12.720 | building like a virtual agent, for example, a travel agent in that scenario. However, if you're building a

00:04:17.840 | conversational contextual chat bot for improving the employee productivity for common questions,

00:04:24.000 | what do you really need? You need data about your company? Absolutely. You need to make sure that that

00:04:29.600 | employee has access to that data and you are not by mistake giving additional access more than that is

00:04:37.200 | required. So super important. And then you need other integrations like how this conversational

00:04:43.280 | chat bot will be presented, whether it's a Slack integration. It can be anything, maybe a custom

00:04:49.920 | application, so on and so forth. And the data can reside in different data sources. So that's another

00:04:55.200 | thing. The third use case talks about the marketing. If you're building it for a brand, so obviously

00:05:00.240 | those data requirements will be different. So now that we have established that our data requirements

00:05:07.200 | will be different for different use cases. So now let's take a deeper dive and double click on the

00:05:13.840 | travel agent. Let's say I have a travel agent. What do I need? First, we all know about the prompting.

00:05:20.320 | I'm not going to repeat that in this session. So you need a prompt, you need a system prompt, plus your user

00:05:26.320 | queries and your user queries will become part of your prompt, right? So the query will be passed into.

00:05:31.920 | That's again, your data, the instructions is your data. And sometimes you can have like a prompt,

00:05:38.480 | catalog or a template, based on different requirements, your agent might choose to pick

00:05:43.360 | up a specific prompt, you know, who knows, the way you design or the way your business requirements are

00:05:49.600 | will derive how the design will look like. Then the second third is context. Now this context is no longer

00:05:55.840 | static. It's coming from your data, again, right? And the data can be different data services, data sources.

00:06:02.480 | And then the model that you are using, maybe you're using out of the box model, but maybe you have fine

00:06:08.560 | tuned the model. Again, you need data for training your model representing your company. So you need data

00:06:14.640 | in every step before you can get meaningful output. Now one thing that is not mentioned over here is the

00:06:21.440 | responsible AI part, which I'm not losing sight of, and I'm going to talk about it, but just in a moment.

00:06:27.120 | So now we have established what we need, but you will be like, okay, we all know what we need,

00:06:33.440 | we have talked enough, how will we solve this problem? So that's where we have Amazon Bedrock.

00:06:39.200 | So Amazon Bedrock not just provides you with the choice of the models, but it also provides you with

00:06:46.080 | some of the amazing features such as Bedrock data automation using which you can build your custom

00:06:51.360 | data pipelines, transform your data, and we'll talk about that. Or if you want to fine tune the model,

00:06:58.080 | you have model customizations, you can evaluate the models, you have model evaluation. If you want to

00:07:03.600 | build a RAG application very quickly, rather than writing the code from scratch, you can use Bedrock

00:07:09.600 | knowledge bases to do that for you and reduce your time to go to market. And again, I'm not forgetting

00:07:15.920 | responsible AI, and we have Amazon Bedrock guardrails, which can help you with that. So we have to do

00:07:21.600 | whatever we are doing, whether we are building an agentic RAG application, or we are building a,

00:07:26.880 | let's say, summarization application, or a classification application, or detecting a fraud,

00:07:32.480 | or it's our same contextual chatbot. We all have, we have to do all of this in a very secure and in a

00:07:41.040 | private manner with safety guardrails in place. So that's what we are going to do. So now bear with

00:07:48.000 | me and let's take another example of a simple RAG application for building a contextual chatbot.

00:07:54.720 | Let's simplify a bit so that we can go deeper, right? So let's say you want to build a chat application,

00:08:01.280 | what would you need? Data. Now with data, we have Amazon Bedrock data automation using which you can

00:08:08.080 | build your custom data pipelines just with a single API, right? That's amazing. Maybe you have data

00:08:15.120 | which is in the form of the videos or text or images. You need to comprehend those images because they have

00:08:24.080 | information. When I say images, I'm not talking about describing a portrait over here. I'm talking about

00:08:30.640 | You might have financial documents, which have charts, which have line graphs, and you need to make business sense out of it.

00:08:37.440 | How would you do that? Right? So you need some mechanism to do it. Of course, you can do everything on your own,

00:08:43.600 | but how we can make it faster, right? That's where Bedrock data automation comes into play.

00:08:48.480 | Then next we have Amazon Bedrock knowledge bases. Now you would say that, yeah, I can build, you know, a contextual based chatbot,

00:08:56.640 | Arag application, and there are so many tools out there. But let's talk about how knowledge bases can help you.

00:09:02.560 | One way is, what do you really need when you are building an application? We have talked about data processing and Bedrock

00:09:09.600 | knowledge bases have native support for Bedrock data automation. Then we need to define a chunking strategy.

00:09:15.440 | Do you want to build a logic for your own chunking strategy? Or do you want to just leverage some of the things which are out of the box?

00:09:22.240 | Such as if you have complex tables and stuff, you can use hierarchical chunking, semantic chunking.

00:09:28.080 | But then you would say, yeah, that's good money, but sometimes we need the custom logic for chunking. So you can actually do that.

00:09:35.120 | And then you can augment the prompt, you would need to vectorize it, you can select a particular foundation model, embeddings model for creating the embeddings.

00:09:45.600 | And we provide you the choice of using which vector store to store your embeddings. Once all that has been set up, the data ingestion is set up, the incremental updates to the data is set up, because that's what knowledge base brings out of the box.

00:09:58.720 | And then you can do that. You don't have to write that custom logic, right? Once that is all set up, what do you need? You need APIs to retrieve the information, right?

00:10:07.200 | So when I say retrieve, there is actually a retrieve API using which you can retrieve the similar content. And then when you are saying, I'm doing the search.

00:10:18.080 | Yeah, semantic search is good. But how do I optimize on search? So there is hybrid search. So we provide you those options and you can pass it as parameters rather than implementing it on your own.

00:10:28.640 | Right? And now that we have established on how this retrieve API works, you can augment it with a model and then get the response back.

00:10:37.120 | But there are so many different techniques like re-ranking, post-processing, query decomposition. So we have another API, which we call retrieve ancient rate, which can do everything out of the box, as the name suggests, and also provide you with controls such as, oh, I want re-ranking, I want query decomposition. So all you have to do is pass in the package.

00:10:58.560 | So if you have the right parameters, if you have the right parameters, if you have complex queries, it can handle that.

00:11:03.040 | Okay, so now we have the RAG application, we have processed our data, we have the RAG application.

00:11:08.960 | But remember, I talked about the guardrails, right? We need to generate responses responsibly, right?

00:11:16.320 | So that's where Amazon bedrock guardrails provides you with so many features. And with this limited time, I cannot go into each and every feature. But think of it, we are taking example of a contextual based chatbot, right?

00:11:30.800 | I want the users or the information that is being presented to my end user to not have the PII, right? Or maybe not have any keywords, which are not good, because sometimes models can say stuff that they are not supposed to, right?

00:11:51.280 | In a casual coffee chat, that's okay, but not in a formal setting when you are being recorded, right?

00:11:57.760 | So that holds the same for our models as well. So that's where Amazon bedrock guardrails comes into play. You can create like your own policy, you can define your own custom guardrails, give examples of what is not to be shared,

00:12:13.120 | even ground your responses, reduce hallucinations, and then figure out how your users are interacting when they are triggered because everything is logged. And you can identify your user patterns as well with that, right?

00:12:27.040 | So now we have this chatbot, right? Bear with me, just visualize it. You have established the data processing using BDA, you have established a RAG application using Amazon bedrock knowledge bases, and you have created guardrails, and you have integrated that with

00:12:42.480 | with your knowledge bases because there is a native integration. So you would think that, yeah, I'm all set, right? Of course, you're all set. But how about the coconuts that we talked about, right? They have to come into play because we mentioned that. So we will talk about coconuts. But before that, I was supposed to actually transition it when I say your application is ready. So now I'd say your application is ready, and then you have images, and it's able to give you the responses.

00:13:11.840 | So you have the responses and search back in the form of the images, right? You have ingested the documents, so everything is ready. You're getting the responses back. But yes, now let me come back to the coconuts.

00:13:23.840 | So yeah, the dance. I like the dancing coconuts, but let's derive meaning out of it. So far, we have been talking about data processing, how data is important, how which bedrock features can help. And now coconuts? Yes, because that

00:13:41.200 | Because that is one thing that I want you to remember when you get out of this room, because coconuts are super important, and you will see how. So actually what happens is a lot of people have talked with me about the best practices of building rag applications.

00:14:03.200 | And these are not just for the rag applications. And these are not just for the rag applications, actually. Some of these concepts or most of these concepts can be extended to other generative AI applications as well. Right? So we'll talk about that. So I want to, you know, have some time to go into this, because this is important.

00:14:21.200 | So now you have this rag application, but remember, we started with the data, and we talked about how chunking strategy is so important. Right? And having the right strategy will make the difference in the accuracy of the generated responses.

00:14:40.200 | So this is an important step. Right? But this is only a step. The second part is optimization. I talked about re-ranking, I talked about parsing, hybrid search, but there are so many other techniques like query reformulation, decomposition, you need to understand how to optimize my application.

00:15:00.200 | But tell me one thing. Can you optimize without knowing what is wrong with your application? Can you do that? Any raise of hands? No, right? I can see people nodding heads, so I'll take no.

00:15:13.200 | So we cannot, because we need to evaluate our application. So we will come to evaluation, because before evaluation is we have to see what's going on. And also, when we are talking about optimization, one is the optimizing the accuracy,

00:15:29.200 | the second is optimizing the cost. Right? And also the latency. How do we do that? Because these are the three pillars of any generative application. Performance, latency, cost. Right? So how do we do that? One way of doing that is caching your results. So you can use semantic cache, which is caching. Why semantic? Because in the generative AI world, we don't have always the exact

00:15:59.200 | exact, deterministic questions. Neither the deterministic responses up to a certain extent, but not in the exact

00:16:06.200 | wordings. So if we have to do it, what we are looking in our cache is not the exact question, but if similar questions have been asked, I don't need to go and invoke my

00:16:16.200 | foundation model. That will charge me money. Right? And that's number one. Second, that will also, because it will take time for the model to

00:16:26.200 | synthesize, it will add latency. Right? So we need to take care of both those things. That's why semantic caching, similar questions,

00:16:33.200 | quickly retrieve it, if this has been asked before. So now we have got three things. The second is observability. We need to see what's going on. Right? That's where you need to log everything, user queries, retrieval hits, model responses, because that's your only way how you can monitor your application to improve it.

00:16:55.200 | So that's what we are doing with observability. So earlier when my, you know, customers used to ask me about observability or best practices, I would say, you know, observability is the critical component. But now I say, please don't go into production or even do a pilot without observability. How will you troubleshoot? How will you improve? How will you figure out what's going on in the application?

00:17:24.200 | What's going on in the application if somebody complains, right? So you need observability. But now when we are talking about observability, now we know we have this additional data.

00:17:33.200 | What are we going to do with it? We have to do something with it. We need to evaluate it. Right? We need to figure out based on my application. Like for RAG application, we have context relevance, because if my context is not relevant, my search results are not good, it doesn't make sense for me to put them in front of the model and pay the money. And, you know, because everything has a cost associated with that, just to get a bad response. So you know, we have this additional data.

00:17:40.200 | We have this additional data. We have this additional data. We have this additional data. We have this additional data. We need to evaluate it. We need to evaluate it. Right? We need to figure out based on my application.

00:17:49.200 | We need to evaluate it. And we need to evaluate it. Right? Because if my context is not relevant, my search results are not good, it doesn't make sense for me to put them in front of the model and pay the money. And, you know, because everything has a cost associated with that, just to get a bad response.

00:18:04.200 | I would first evaluate my search results. If my search results are good, then only I'm going to augment my prompt and give it to the model. Right? So those are the common things that we sometimes, because we are, you know, we are quickly doing everything, we forget about it.

00:18:20.200 | So we have to rethink and evaluate our thinking as well as our application. So that's super important. But if you have a summarization use case, you can have metrics related to summarization. Right?

00:18:32.200 | So that's why I was saying like these coconut is actually, coconuts are good. They're good for other applications as well. And then never go blind. So we have talked about that. But what are we going to do? Let's say in our evaluation, something happened, we figured it out, we need to update. We need to update what? Sometimes our data, because if our data is stale, our answers are stale. Sometimes we have to update our strategy. So again, coming back to optimization, right?

00:19:01.200 | And once you have done the updates, please don't feel that you are done and you have updated, it's all good. You need to test again. So one is you are evaluating your application, you have updated it, but then you also need to have a test suite.

00:19:18.200 | Because whenever you have a test suite, because whenever you are doing this update, these evaluations have to be automatic. And if you have a defined path on how you are creating this test suite, the chances of you going wrong in your production application reduces. And the chances that you have a high quality application out there in production increases by many fold.

00:19:42.200 | So test. So test. And once you have tested and tested and tested your application, you're ready to crack your coconut. And that's how you will scale. So not just the coconut, we need coconuts.

00:19:54.200 | And once you have cracked your coconuts, you can decorate it and you can have your application into production and bear the fruits of success.

00:20:03.200 | And once you have tested, you can decorate it and you can use it.

00:20:04.200 | And once you have tested, you can use it.

00:20:05.200 | And once you have tested, you can use it.

00:20:06.200 | And once you have tested, you can use it.

00:20:07.200 | And once you have tested, you can use it.

00:20:08.200 | And once you have tested, you can use it.

Data is Your Differentiator: Building Secure and Tailored AI Systems — Mani Khanuja, AWS

Chapters