Gemini 2 Agent + Google Search and Citations

Okay, so Google released the new Gemini 2 model, or 2 Flash, the experimental model, and it seems very good. And one thing that is super interesting about these models is that DeepMind are focusing on the agentic ability of these models, which in my opinion is really the future of these, it's not even the future, it's the now of these models, and I think it's also what we're using them for in the foreseeable future as well.

So really focusing on that agentic ability I think is really good, and it's something that you find is missing in a lot of models. Like a lot of LMs are good at generating text, but if they can't generate text reliably in a structured manner, you can't really integrate the models with code, right?

And that really limits you in what you're actually able to build with them. So I do like this focus from DeepMind on the agentic component, which they really, for the agentic area, it's like the announcement plug here, it's one of the main things they talk about. So that is really cool.

Let's have a look at a, okay, obviously this is Google DeepMind, so they have support for just integrating all of this with Google Search, which is kind of interesting. So we have this example, AurelioLabs Cookbook, you go into Gen AI, Google AI, Gemini 2, there's another example in here, and I'm going to add a few more soon as well.

And you can either run this in Colab if you prefer, or you can run it locally, it's up to you. So you can go and open that in Colab, that is the easiest thing to do for sure. But I'm going to be running it locally here. So running it locally, there are setup instructions here, you need to use, I've set it up to use UV, which is like a Python package manager, which I think is pretty good.

And well, let's jump straight into it. So the first thing we're going to need to do is get Google AI Studio account. So that is pretty straightforward, it's not hard. So you go into, what is it, aistudio.google.com, you'd have to create an account, and then you need to get your API key, right?

So API key, I think the settings, API plan information. And actually for this example, okay, for using Gemini, just generating stuff, that's all you need, right? So you could go to your, I think go to billing here, and you can get your API key. However, we're going to be using the Google search API.

And for that, you actually need to add your billing information. So I think if you haven't already done that, you should see it somewhere around here, it will say something like add billing information. So you need to do that. I don't think running through this example, you're actually going to need to pay anything, because you have like a certain amount of like free credits, I think every month, although I'm not 100% sure, actually, you can see here, so yeah, free of charge, you have all of this.

I don't know, hmm, potentially the Google search component, potentially, yeah, you might have to pay for that. I haven't seen the bill yet, so yeah, anyway. So you would actually have to pay a small amount for this. Pretty sure we can create like a free version. Okay, so I go into, yeah, it opens this.

And then I need to go, where do I need to go, API credentials, I think it was. So when you create your account with Google AI Studio, they will create this project in GCP for you. So this is why I have Gemini API here, I didn't actually create this project, they just did it for me.

And then you go into here, so you create your credentials, you create API key, that's it. So yeah, you need to copy that API key, and then we're going to use it in the notebook. Okay, so once we're in the notebook, we would run this cell. Because I'm running locally, I would be using this, so actually it would be this.

Then when I run a cell, we have this get pass, it's basically going to have a little text box pop up, and just ask you to enter your Google API key, your AI Studio API key. And it's going to use that to initialize your client. You can also just enter your API key directly in like a string if you prefer, it's up to you.

Okay, so we're going to start. So one thing with Gemini is that it just does generally generate everything in Markdown. So we're going to use IPython display, import Markdown, and we're just going to display everything in Markdown. It looks better. So the model we're using is, of course, the first of the Gemini 2 models, it's Flash, which is like the, at least going from the previous versions of Gemini, it's like the second fastest of the Gemini family.

I haven't used Gemini models before this, to be honest, but I believe that is true. And then eXp here just means experimental, right? And then we can see, okay, we have this like nice little Markdown output. It looks, you know, I'm sure it can probably tell us about itself relatively well.

And you can see here actually, so we have Ultra Pro and Nano, it doesn't mention Flash. So thanks Gemini for that lack of information. I'm going to ask it quickly. What are the Flash models in Gemini? Okay. Yep. Quick processing. Yep. So on and so on. So that just tells us it's efficient, slightly reduced accuracy, although honestly it does do pretty well, but I am looking forward to see the better ones.

But yeah, generally faster model, obviously when you do faster models, you have that trade off all the time. So getting to the Google search tool. So Google search, they have it kind of baked into the library here. One thing with, sorry, just one thing to say here with Google's AI libraries, generative AI, gen AI, whatever, they have various different versions of libraries that are supposed to give you access to all of this.

And it's a bit of a mess. So the one that you should be using is this Google gen AI. There are others. It's like Google generative AI and whatever they, I don't know what, I know Google is a big company. It feels like they just got various teams to build the SDKs for them and just didn't communicate.

So the one that seems to work is this Google gen AI. I need to make sure I run this, run this again. So we have, we've initialized our tool here. If you look into this, interestingly, it's basically just empty. So I believe that this is just an object type.

And then when it is being passed through the SDK over to Gemini on the other end, it's seeing that you're using the Google search tool, or you're wanting to use it. And then it handles everything on that side. We come down to here, we have this generate content config object.

This is important for just generating stuff with Gemini in general. So system instruction here is literally your system prompt. So I'm just saying, like the tip, if you're a helpful assistant, provides up-to-date information and help the user in their research. So I'm not really specifying anything on using the tools, but it works, it uses it all the time anyway.

And then we're passing in the tools, right? So it's just a search tool. Of course, you could pass in more tools if you have them. We'll have a look at like using Gemini in a more like agentic fashion. We are using it kind of agentically here, but like more agentically with your own custom tools and stuff.

We want to respond with text. We can set the temperature and the candidate counts. The candidate count is basically how many responses it's going to return you. How many candidate responses, essentially. By default, that is one. And I think you can't actually set it to anything else for now.

But I just wanted to outline those other parameters that you can have. There's also the other typical sort of LLM generation parameters, and I think a few more as well. So yeah, there's also a frequency penalty, which I was using in another example. So yeah, there are a few, quite a few things that you can like mess around with there.

Right. So once you've done that, we can pass everything into our config, right? So I'm going to say tell me the latest news in AI, okay? And it's going to go ahead and use, obviously, well, it should hopefully go ahead and use the Google search tool, and we'll have a look at what it responds with, and also what we can actually do with all of the other stuff in there as well.

So they, so OpenAI have introduced these models, which I don't think is actually a new thing, is it? Okay. It's not that new, unless I'm wrong. I actually don't know what VO, and I know, obviously, Imogen is fine, generative AI. Okay. I mean, that's just super useless. Could have been a bit better there.

Maybe, I mean, it's fine. Let's just go with it. So VO and Imogen, relatively new stuff, I think. Right. So that's fine. But what I really care about here is, okay, what else can we get from our search there? Right? Can we confirm it's search for one? That's probably important.

But what can we actually get from our search? So we're going to run this. And this is basically, okay, so I mentioned they have multiple candidates for your responses. We're just generating one. So if I go with, like, one here, it will just break. So we're going with the one candidate that we generated.

We're looking at this grounding metadata. So what, grounding, what is that? Essentially the grounding concept here is that your LLM is grounding what it's responding with with external information. So we're grounding it with information from the Google search. And then within that grounding metadata, you have a ton of stuff.

All right. So we have these grounding chunks. So this is, like, okay, where is it pulled information from? You can see that we have, you know, a few websites with the links. All these links are essentially links through Vertex AI search. Why do they do that? But they do.

So there's that. And you also have the title, which is kind of nice. So the title of the website, it can just make it a little bit cleaner. Although that being said, it's not, like, a nice title. It's just the URL. It's just a cleaned URL. So you have that.

Anyway. The retrieve context is just always none at the moment. I don't know why that is exactly. But anyway. Then we have all these grounding supports. So these grounding supports are telling you, okay, from your segments, which are defined by the character count here, where did that information come from?

Right? So it's the one thing that at least, you know, Google, like, Gemini is relatively good at is actually saying, okay, this has come from here and this has come from here. Right? So when it's generating some information, you can actually map that back to an actual -- like, an actual source.

Right? And then you also have the text that's contained within those segments. So this is literally the generated text. It's saying, okay, this generated text came from here. That -- you know, that's useful, particularly when you're, you know, kind of, like, giving stuff that people need to trust. It can be hard to if you -- if they don't see where that information is coming from.

Right? And then you can also see how this -- how this search was done. So we can see that Gemini did a search for recent AI developments and the latest news in AI. Right? So that's -- they're the search queries that provided -- or Gemini provided to the Google search tool.

Cool. So we have all that. We can, obviously, extract all that out. We can format our links nicely. So we can actually click through on these, and it will go through to where it got the information, like the page. Went to New Dasik and whatever. It's going to the blog and the latest news -- latest AI news, September 2024.

Right? It's like, okay. You can see information here. I don't know, yo, some -- ah, yeah, yeah. Maybe it came through from here. Right? Let's see. Maybe that's where the information is coming from. Scoring 83% on the International Mathematics Olympiad qualifying exam with estimated IQ of 120. Which is exactly what Gemini told us here.

Right? There we go. That's where the information came from. 83% on the -- yeah. So that's where the information is coming from. So it is good to actually see, okay, it is pulling in this information for sure. Which is nice. Great. So we have that. It's pulling in those sources.

We can format them nicely. But you know what is -- you know what I really like in these interfaces when you have a chat bar and you ask some questions? When it tells you, okay, the information is coming from a specific -- like you have the text and it has like a little -- like a little one, and it has like a link to the source that you can go through.

I really like that. So we're going to replicate that sort of interface here using these grounding support objects in the segments. So what we need to do is basically look at, okay, sort index and insert the -- like the markdown links there. Now this could get quite messy because we've got all these like values and metadata and everything everywhere.

So what I'm doing here is just keeping things a bit cleaner. We're going to use -- we're going to use Pydantic, we're going to use a base model. And we're going to use that to define a citation object. That citation object is where we're going to basically for each of those grounding supports we had, they're going to represent a single citation within our text.

There are multiple citations for every source. So it's not like a one-to-one mapping there. That's why there are many more segments and citations, basically. Or, yeah, there are many more of these segments than there are of the actual sources themselves. So in any case, each citation will have a title and a link, which is going to map to one of these.

And then each one of them also has their own individual score, sort index, end index, and chunk index. So chunk index is here. That's just mapping it back to one of these. Our end index and sort index is basically the character index within our text. And then we also have the confidence scores here.

We're not actually -- I'm not actually doing anything with these. But you could do these, right? If it's not particularly confident on something, you could say, okay, maybe we shouldn't include this information. And you could -- you know the sort and end index, so you could actually pull that out if you wanted to.

If you want to be, like, super precise or just careful with what you're telling people. So that could be pretty useful. But anyway, we don't -- here I'm not doing that. So we have this citation-based model. Now two methods that are useful here is just get link. Okay? So here I'm just getting a link.

It's just a -- it's just marked down, right? So it's just, like, in italics and it's giving you, like, a little -- the square brackets and a one with a -- or a one or a two or a three with a link. And then also count characters. So you can imagine if we're inserting these, like, citations and links and stuff into our markdown, the size of that -- of the overall markdown increases.

So then if you're, like, iterating through and you're adding -- I mean, you could just go the other direction, but I didn't do that. Well, anyway, yeah, you could probably simplify that. Doesn't matter. So what I'm doing is I'm going from the start of the text, I'm going through, and obviously every time you add a citation, you're increasing the number of characters.

So you're actually modifying where this needs to land, like, the next start index. So yeah, I did that. Probably you could go in the other direction and you wouldn't need to do that, but I didn't. Doesn't matter. I've done it this way now. So let's just make sure we run this.

Then run this. So what am I doing here? I'm just creating a list of citation objects based on those -- the grounding chunk things. So -- yeah, sorry, grounding supports. Yep. Then we are just sorting those citations by their start index, right? So this is -- so that we're going through one by one from the start until the end.

Again, as I kind of alluded to just now, you could go in the other direction and then you wouldn't need to modify the character count, but it's fine. Now we've got a citation object. We can use those methods I defined, so like the get link, to just very easily create a markdown link here.

So this text you see here is just a markdown link when it gets rendered, and it looks fine. So I'm going to run this, and we're going to just take a look. So what we're doing here, we're going through our citations. We are taking the final response, and this is a bit where I overcomplicated it, I think.

So yeah, yeah, I did. So the final response, you get the start, or you get the chunk of your final response before where you want to insert your citation. You add your citation. Then you add the rest of the response after that. You could do it in the other direction if you go in reverse with the citations, and you wouldn't need to do this bit here, which is adding an offset to your next citation.

So if you see here, we have this offset. So basically every time we insert stuff, we're adding this offset so that we put the next citation in the correct place. But as I said, we don't actually need to do that if you do it differently. So you can see here that we now have these inserted citations, and you can click through on these.

So we'll go to number two here. It's going to take us to this AI News page, and we must have got something from here. Or we click on this one, it takes you through, and you have all of your links there. Of course, in an actual application, you could make all this a lot more dynamic and nice, but you can see it didn't take us long to do that.

It was pretty straightforward, and we're getting nicely grounded responses with our citations. Now finally, so okay, we have all of this, but we don't have, one thing that I do kind of like having is this list of citations at the end. So we're just going to go ahead and add that, pretty straightforward, again, all in Markdown.

So now we have all the stuff we had before, all of our links. Then I just added these citations at the end, right? Nice. So pretty straightforward, nice quick introduction to using Gemini 2 with the Google search tool, which of course they've integrated quite easily, given it's Google. And it just gives you these nicely grounded results.

And I think this is pretty important. Well, it's important for various reasons, like the typical like rag argument. So without augmenting your LLM, your LLM will only know information up to the date that it was trained, right? The world of the LLM is just what it was trained on, right?

Which is fine. It's not a problem, of course. But if you want your LLM to give you accurate answers about like recent events, you need it to have access to some external information, which is what we do with rag, right? We're creating our database, like a custom database, throwing that somewhere, and then giving our LLM access to it.

With this, with this web search, we're doing, you know, similar thing, but we're just using Google search directly and giving our LLM access to it, right? So it has access to update information. And also, when you're asking about specific things, you have this ability, like we did here with these links, you have the ability to provide the references to where that information came from, which I think is pretty important because LLMs can just make stuff up and they do it super convincingly.

So just having those links there, if the LLM said something and you're like, "I'm not entirely sure," you can go in and check. And also for us, like developing these things, we have this confidence score, right? That confidence score, I think, could be pretty useful. I haven't tried using it, but I think that could also be pretty useful in just flagging things.

Either flagging things to the user where you're like, "Ah, I'm not too sure this is actually true," so maybe you might want to go check that link, or just saying if it's below a certain threshold, maybe we just don't return it to the user at all. So that is potentially pretty useful, I think, though I haven't thought too much about it.

But yeah, I'll leave it there for now. So thank you very much for watching, and I will see you again in the next one. Bye. (music)

Gemini 2 Agent + Google Search and Citations

Chapters

Transcript