Back to Index

NVIDIA NeMo Guardrails: Full Walkthrough for Chatbots / AI


Chapters

0:0 Nvidia's NeMo Guardrails
2:16 How Typical Chatbots Work
5:54 Dialogue Flows
7:53 Code Intro to NeMo Guardrails
12:6 How Guardrails Works Under the Hood
14:33 NeMo Guardrails Chatbot in Python
18:28 Speaking with Guardrails Chatbot
19:50 Future NeMo Guardrails Content

Transcript

In the past year we've seen the unparalleled adoption of chatbots across many industries. There hasn't really been an obvious technology that has been adopted and become so widespread so quickly as chatbots have. And in fact, according to a couple of reports from Gartner, they actually expect chatbots to be the primary communication channel for 25% of all organizations by 2027, which is not really that far away.

This adoption is pretty amazing, but it's also dangerous. Chatbots make things up and they do very convincingly and it's harder to give a chatbot guidelines like we would to an actual human. So if you have a human behind some chat, they've been trained on how to talk about your company, on what not to say, what to say, and to be polite and so on.

It's a little more difficult with AI chatbots, particularly if we're just using the default approach of calling OpenAI. And when we want a chatbot to actually represent an organization, it's simply not enough. In short, we need something more to actually deploy conversational AI. To do that, we will be using Guardrails.

Now Guardrails is a kind of new library from NVIDIA, and the main focus of this library is to help us deploy chatbots safely. But there's actually a lot more that we can do with it. So we can use that for things like safety, for topical guidelines, but we can also use it for more advanced things.

We can use it to build agents. We can use it in retrieval augments generation, and naturally also to just define more deterministic dialogue where relevant. And honestly, if a company is going to production and deploying a chatbot without using Nemo Guardrails or some sort of alternative Guardrails system, I don't know, I'm just surprised that they are allowing it.

Because things can go wrong very easily if you don't have these sort of things in place. So in most conversational AI systems at the moment, we kind of have this. We have this direct path between our conversational AI or our agent and our users. That's fine. But if something goes wrong, if the user begins asking about things that we don't really want our chatbot to respond to, like, for example, politics, or if our chatbot simply begins talking about something that we also don't want it to respond to, or it begins responding in a way that doesn't really represent what we would like the chatbot to represent, we have an issue.

There's no checks here. Nothing is happening. Now, we can improve this scenario a little bit through prompt engineering, but prompt engineering can only get us so far. There's always going to be cases where issues come up. So ideally, what we want is something in the middle here. We want what are called Guardrails, which can check what is being transferred between the user and the chatbot and react accordingly.

So if the user begins talking about politics, we can create a prebuilt message or we can instruct the bot to generate a message that says, sorry, I cannot talk about politics. Now, that is the core idea behind Guardrails. It's very simple. But what you can do with this is far more than just add some safety to our chatbots.

What we are essentially doing here is we're creating rules, deterministic rules that say, okay, if the user begins talking, let's say about politics, we want to do something. So we can go over here and we can do some action. That action can be a safety measure or maybe in the case of our user is asking a question about maybe our product.

So we have a product question here. If they do that, we don't really want to say, oh, sorry, I can't talk about our product, obviously, but we may want to do something different than just generate an answer. We may want to, for example, bring in some information from our database so that our chatbot can answer the question more accurately.

So we do retrieve augmented generation in that case. We can also specify more deterministic dialogues. So maybe what we will see is that many users are kind of asking the same questions. They're going through the same dialogue paths. So if we have common dialogues, we could create rails for them and they would allow us to create or catch the question.

So the question would come over here and actually rather than going to the bot here, we could specify a particular dialogue flow. So we can say, okay, given the user is asking about X, we should respond with a particular response. So we have a particular response. We can set that, we can write it ourselves, or we could actually ask the bot to write a response.

And then from there, the dialogue could go, you know, multiple different ways until we reach some sort of final solution for our user. Now this sort of deterministic dialogue flow is how chatbots used to work. Before chat GPT, there would be like a set path. You'd have to select the options within your dialogue.

So, you know, the chatbot would introduce itself and it would say, what can I help you with? And you'd have to say, I have a problem with, and then it would give you like three options that you could choose from. Click on those and kind of go through almost like a path of dialogue.

You wouldn't really be able to chat with the chatbot because it couldn't support that. That deterministic dialogue flow is actually useful, but it's restrictive. So we do kind of want that in some scenarios, particularly for those common dialogue flows that we can actually help with. But at the same time, we don't want to restrict our users to just those dialogue flows.

We want the more flexible behavior of conversational AI like chat GPT. Now, another thing that we can actually use these guardrails for, which I've kind of hinted on a little bit with the rag example over here, is we can actually give it access to tools. Okay. So based on a particular question, so maybe our user says something like, okay, how is the weather today?

An LLM is not going to be able to answer that question because it doesn't know what the weather is like today, but a LLM agent or conversational agent would be able to. And the reason that they can is because they have access to tools such as weather APIs. So the agent could identify this question is needing to use this weather API tool and it would go to the weather API tool and it would say, you know, how is the weather?

Give me the weather. And then it would formulate a response back to the user based on that. So we can also include tool usage in there. So let's take a look at a quick example of how all of this works. In here on the left, we have the Nemo guardrails folder, and I have this config directory.

In here, I have a config and a topics.co. So co is a colang file, which we'll talk about a little more in a moment. Within the config, we are essentially specifying the configuration details for our chatbot, for our guardrails. So here I'm saying I want to use text of entry 003.

We're using this model. It just gets a little bit easier to set up with guardrails. But, of course, we can also use GPT 3.5 and also GPT 4 and actually other models as well from Hugging Face, Llama 2, and so on. So we have this config YAML file, and we also have this colang file.

Now this colang file is where we set up the flow of a dialogue, so a dialogue flow, or the guardrails for particular topics or issues. So here I'm defining a few things. So we're expressing greetings from the user. We're also expressing a greeting from a bot. Now, this is actually a hard-coded greeting.

So when we use this, the chatbot will return specifically this text here, but we don't have to do that. Now, we just have these, which is the greeting, and we'll talk a little bit more about the syntax soon. And then we also have a guardrail here. So we want to define our limits.

If a user begins asking about politics, we want to say, okay, the bot's going to respond with, I'm a shopping assistant. I don't like the talk of politics. And sorry, I can't talk about politics. Actually, we can remove that. So that will be the response, this here. Now let's take a look at how we would actually use these files.

So over in our terminal, we're going to navigate to this directory. So I'm going to cd documents, projects, examples, learn, generation, chatbots, Nemo guardrails, intra. So we've navigated to the directory. In here, we just have that config directory that I mentioned before. Okay. So in order to use this, what we're going to do is first, we actually need to pip install guardrails.

So pip install Nemo guardrails, like so. And then we're going to do Nemo guardrails chat, and we set the config. Okay. So this will allow us to chat within our bash terminal. Okay. So we've now started our chat and we can say something. Okay. So I'm just going to say, Hey there.

And you see where we actually get these two messages. We get, Hey there. And how are you doing? That is because within our Kolang file, we specified in a greeting flow that the bot will produce two responses. It will express a greeting and then it will express or ask how are you, which is exactly what it's doing here.

Now, if we continue and let's ask something political. So can you tell me, tell me your thoughts on the president of the USA. Right. We should see that this would block. Okay. So we can see it responds with, I'm a shopping assistant. I don't like to talk politics. How can I help you today?

Okay. So we've successfully blocked that political question using the guardrails that we created in our Kolang file. Now let's talk a little bit about how that Kolang file was able to identify that this message that we created here should be blocked and that it belonged to that user as politics rail, despite us not specifying this exact question.

So the way that this works is that we have our canonical forms and utterances. Just know that here, this is the canonical form and these are the utterances. Okay. And all of these are coming from the user, right? So we say define user as political. We give some examples.

What, you know, what would be political? And then we say, define user ask LLM. So it's asking a question about large language models. What would constitute a question about large language models? All of these sentences get taken to our embedding model by default. That's a mini LM model and they get encoded into semantic vector space.

All right. So then when the user comes along, they ask that question. Okay. Maybe they ask what I asked, like, you know, what are your opinions about the president of the U S or whatever I said. Right. So you have that question coming from the user that goes into the embedding model and then it creates, it would probably be over here.

It creates a embedding. Right. And then we can see, right. Okay. These are most similar to the utterances that belong to the as political canonical form. We see that here as well. Right. So this is same visual, right? These are our, you know, these are our political items. These are our LLM items or utterances.

We have our user query. Are there any government build language models cases, you know, almost in between, but we're definitely asking about language models here. Hopefully the embedding model understand this. So the embedding model will take that and code it into the vector space. And it will see that it has more similarity with the utterances that come from the user ask LM canonical form.

So with that, we know that our query should activate a flow where user ask LM is defined. Now there's a lot to talk about when it comes to guardrails, but I want to give just one example before we finish this video. In future videos, we will talk more about co-lang, which is the modeling language that guardrails uses and guardrails itself.

So let's go through this point example. This is in Colab. So you can sort of just follow along. We're going to first install Nima guardrails and also opening AI. Now we will need to set our open AI API key. So we'll just import OS. We do OS environment, open AI API key.

And in here, you just pass in your API key. Okay. And once that is done, the first thing that we want to do is define a co-lang file. So it's kind of what we saw before it is that.co file. So I am going to define that here. We can either define it from file, or we can actually define it from a string in our code.

So here, I'm going to define it in a string in our code because we're, well, we're working within Colab. So in here, we have defined what are the three main types of blocks within co-lang. Those are the define user blocks. So the user message blocks, the define bot. So that is a bot message block.

And if we come down here, we also have a flow block. So this is how we define the dialogue flow, right? So these here, they're our canonical forms. These are the utterances and it is using those that we create that sort of vector space or populate that vector space.

Then based on that vector space, we can decide when a user creates a message, which one of these should be activated. So if the user says, "Hey, how are you?" It will probably activate the user express greeting form. So actually in here, we can remove those because this is just a response from the bot again here as well.

Okay, cool. So once we have initialized that, we can, through the Python API, initialize our Rails. So we need to do from Nemo guardrails, import lm_rails, and also rails_config. Okay. So the rails_config is basically our configuration file. It takes our colang. And if we have a configuration yaml, it will take that as well and use that to initialize everything.

Now, alongside our colang content, we also need the config content. So we'll just put yaml content, I think it's called. So yaml content equals. And this is where we just pass in our configuration details, which is basically just which model we want to use at least for now. There are more things that we can populate this with, but this is enough for what we're wanting to do here.

Okay. So then we initialize our config with both of those. We want to write from content, which means we're loading these from within file. And we will have colang content, which is going to be equal to colang content and yaml content. Okay. That initializes our config. And from that, we can initialize our rails.

So rails equals lm_rails. And we just pass in our config. Okay. So we run that. Okay. And then we can generate. So this is where we're actually talking with our rails. So within a notebook, we actually need to use async functions. Just how it works. Because guardrails is built to enable async.

So we have to write this. And we'll just say, like, hi, there. We can run that. And we get this response. We say, hey, there. How are you doing? So, again, we can see that the chat bot is going to bot express greeting and bot ask how are you.

We can see, hey, there, and how are you doing? Which is exactly what we see here. Right? So we can try again with something. By the way, if you want to run this without async in, like, a Python file, you just run this. Okay. And we can say, I can't remember what the last question was.

Yeah. What is your opinion on the president? Okay. Okay. Cool. Let's run that. And we can see that activates that guardrail, which says I'm shopping assistant. I don't want to talk about politics. And then it says, how are you? How can I help today? Right? So, that is a very simple example of how we would use guardrails.

This really doesn't even start to scratch the surface of what we can actually do with guardrails. And there are many other examples that I will be sharing with you, like, in the coming days and weeks, where we'll dive into a lot more detail. We'll take a look at the Kolang language, things like variables and actions.

And on the guardrails side of things, we'll be diving into more detail on how we can sort of set up agents, essentially. How we can do retrieval augmentation. And all of these other really cool things that guardrails allows us to do. For now, that's it for this introduction. So, I hope this has all been useful and interesting.

I know I covered a lot. But there is a lot to cover. So, thank you very much for watching. And I will see you again in the next one. Bye. (End of Audio)