back to indexGemini 2 Agent + Google Search and Citations
Chapters
0:0 Gemini with Google Search
1:36 Gemini Web Search in Python
4:20 Using Gemini 2 Flash
6:7 Google GenAI Libraries
6:59 Using Google Search with Gemini
9:51 Grounding Gemini Responses
13:46 Inserting Citations for Gemini
20:30 Why use Citations
00:00:00.000 |
Okay, so Google released the new Gemini 2 model, or 2 Flash, the experimental model, 00:00:10.040 |
And one thing that is super interesting about these models is that DeepMind are focusing 00:00:16.260 |
on the agentic ability of these models, which in my opinion is really the future of these, 00:00:24.560 |
it's not even the future, it's the now of these models, and I think it's also what we're 00:00:28.840 |
using them for in the foreseeable future as well. 00:00:31.880 |
So really focusing on that agentic ability I think is really good, and it's something 00:00:40.580 |
Like a lot of LMs are good at generating text, but if they can't generate text reliably in 00:00:46.560 |
a structured manner, you can't really integrate the models with code, right? 00:00:53.360 |
And that really limits you in what you're actually able to build with them. 00:00:56.680 |
So I do like this focus from DeepMind on the agentic component, which they really, for 00:01:05.640 |
the agentic area, it's like the announcement plug here, it's one of the main things they 00:01:12.900 |
Let's have a look at a, okay, obviously this is Google DeepMind, so they have support for 00:01:19.040 |
just integrating all of this with Google Search, which is kind of interesting. 00:01:22.680 |
So we have this example, AurelioLabs Cookbook, you go into Gen AI, Google AI, Gemini 2, there's 00:01:29.760 |
another example in here, and I'm going to add a few more soon as well. 00:01:33.120 |
And you can either run this in Colab if you prefer, or you can run it locally, it's up 00:01:37.680 |
So you can go and open that in Colab, that is the easiest thing to do for sure. 00:01:42.620 |
So running it locally, there are setup instructions here, you need to use, I've set it up to use 00:01:48.280 |
UV, which is like a Python package manager, which I think is pretty good. 00:01:54.760 |
So the first thing we're going to need to do is get Google AI Studio account. 00:01:59.760 |
So that is pretty straightforward, it's not hard. 00:02:03.220 |
So you go into, what is it, aistudio.google.com, you'd have to create an account, and then 00:02:12.080 |
So API key, I think the settings, API plan information. 00:02:15.720 |
And actually for this example, okay, for using Gemini, just generating stuff, that's all 00:02:21.800 |
So you could go to your, I think go to billing here, and you can get your API key. 00:02:25.760 |
However, we're going to be using the Google search API. 00:02:29.760 |
And for that, you actually need to add your billing information. 00:02:32.440 |
So I think if you haven't already done that, you should see it somewhere around here, it 00:02:36.040 |
will say something like add billing information. 00:02:39.680 |
I don't think running through this example, you're actually going to need to pay anything, 00:02:44.840 |
because you have like a certain amount of like free credits, I think every month, although 00:02:50.800 |
I'm not 100% sure, actually, you can see here, so yeah, free of charge, you have all of this. 00:02:57.480 |
I don't know, hmm, potentially the Google search component, potentially, yeah, you might 00:03:08.480 |
I haven't seen the bill yet, so yeah, anyway. 00:03:12.320 |
So you would actually have to pay a small amount for this. 00:03:16.680 |
Pretty sure we can create like a free version. 00:03:24.040 |
And then I need to go, where do I need to go, API credentials, I think it was. 00:03:31.840 |
So when you create your account with Google AI Studio, they will create this project in 00:03:37.680 |
So this is why I have Gemini API here, I didn't actually create this project, they just did 00:03:42.720 |
And then you go into here, so you create your credentials, you create API key, that's it. 00:03:46.280 |
So yeah, you need to copy that API key, and then we're going to use it in the notebook. 00:03:50.440 |
Okay, so once we're in the notebook, we would run this cell. 00:03:55.240 |
Because I'm running locally, I would be using this, so actually it would be this. 00:04:00.240 |
Then when I run a cell, we have this get pass, it's basically going to have a little text 00:04:05.760 |
box pop up, and just ask you to enter your Google API key, your AI Studio API key. 00:04:12.480 |
And it's going to use that to initialize your client. 00:04:15.840 |
You can also just enter your API key directly in like a string if you prefer, it's up to 00:04:24.000 |
So one thing with Gemini is that it just does generally generate everything in Markdown. 00:04:30.120 |
So we're going to use IPython display, import Markdown, and we're just going to display 00:04:37.240 |
So the model we're using is, of course, the first of the Gemini 2 models, it's Flash, 00:04:43.840 |
which is like the, at least going from the previous versions of Gemini, it's like the 00:04:55.120 |
I haven't used Gemini models before this, to be honest, but I believe that is true. 00:05:00.980 |
And then eXp here just means experimental, right? 00:05:03.680 |
And then we can see, okay, we have this like nice little Markdown output. 00:05:08.360 |
It looks, you know, I'm sure it can probably tell us about itself relatively well. 00:05:16.440 |
And you can see here actually, so we have Ultra Pro and Nano, it doesn't mention Flash. 00:05:21.640 |
So thanks Gemini for that lack of information. 00:05:44.880 |
So that just tells us it's efficient, slightly reduced accuracy, although honestly it does 00:05:50.800 |
do pretty well, but I am looking forward to see the better ones. 00:05:56.320 |
But yeah, generally faster model, obviously when you do faster models, you have that trade 00:06:08.020 |
So Google search, they have it kind of baked into the library here. 00:06:14.980 |
One thing with, sorry, just one thing to say here with Google's AI libraries, generative 00:06:20.940 |
AI, gen AI, whatever, they have various different versions of libraries that are supposed to 00:06:31.400 |
So the one that you should be using is this Google gen AI. 00:06:36.300 |
It's like Google generative AI and whatever they, I don't know what, I know Google is 00:06:43.740 |
It feels like they just got various teams to build the SDKs for them and just didn't 00:06:50.500 |
So the one that seems to work is this Google gen AI. 00:06:54.060 |
I need to make sure I run this, run this again. 00:07:02.920 |
If you look into this, interestingly, it's basically just empty. 00:07:05.980 |
So I believe that this is just an object type. 00:07:11.240 |
And then when it is being passed through the SDK over to Gemini on the other end, it's 00:07:15.860 |
seeing that you're using the Google search tool, or you're wanting to use it. 00:07:24.580 |
We come down to here, we have this generate content config object. 00:07:28.060 |
This is important for just generating stuff with Gemini in general. 00:07:35.340 |
So system instruction here is literally your system prompt. 00:07:39.040 |
So I'm just saying, like the tip, if you're a helpful assistant, provides up-to-date information 00:07:46.220 |
So I'm not really specifying anything on using the tools, but it works, it uses it all the 00:07:56.280 |
Of course, you could pass in more tools if you have them. 00:08:00.100 |
We'll have a look at like using Gemini in a more like agentic fashion. 00:08:04.540 |
We are using it kind of agentically here, but like more agentically with your own custom 00:08:13.160 |
We can set the temperature and the candidate counts. 00:08:16.320 |
The candidate count is basically how many responses it's going to return you. 00:08:24.080 |
And I think you can't actually set it to anything else for now. 00:08:28.280 |
But I just wanted to outline those other parameters that you can have. 00:08:31.360 |
There's also the other typical sort of LLM generation parameters, and I think a few more 00:08:36.520 |
So yeah, there's also a frequency penalty, which I was using in another example. 00:08:41.760 |
So yeah, there are a few, quite a few things that you can like mess around with there. 00:08:47.940 |
So once you've done that, we can pass everything into our config, right? 00:08:52.520 |
So I'm going to say tell me the latest news in AI, okay? 00:08:56.800 |
And it's going to go ahead and use, obviously, well, it should hopefully go ahead and use 00:09:00.440 |
the Google search tool, and we'll have a look at what it responds with, and also what we 00:09:07.600 |
can actually do with all of the other stuff in there as well. 00:09:12.800 |
So they, so OpenAI have introduced these models, which I don't think is actually a new thing, 00:09:31.680 |
I actually don't know what VO, and I know, obviously, Imogen is fine, generative AI. 00:09:46.280 |
So VO and Imogen, relatively new stuff, I think. 00:09:53.840 |
But what I really care about here is, okay, what else can we get from our search there? 00:10:04.120 |
But what can we actually get from our search? 00:10:08.040 |
And this is basically, okay, so I mentioned they have multiple candidates for your responses. 00:10:15.680 |
So if I go with, like, one here, it will just break. 00:10:19.920 |
So we're going with the one candidate that we generated. 00:10:30.240 |
Essentially the grounding concept here is that your LLM is grounding what it's responding 00:10:39.580 |
So we're grounding it with information from the Google search. 00:10:43.200 |
And then within that grounding metadata, you have a ton of stuff. 00:10:48.600 |
So this is, like, okay, where is it pulled information from? 00:10:51.040 |
You can see that we have, you know, a few websites with the links. 00:10:55.400 |
All these links are essentially links through Vertex AI search. 00:11:04.280 |
And you also have the title, which is kind of nice. 00:11:06.080 |
So the title of the website, it can just make it a little bit cleaner. 00:11:09.240 |
Although that being said, it's not, like, a nice title. 00:11:19.080 |
The retrieve context is just always none at the moment. 00:11:28.960 |
So these grounding supports are telling you, okay, from your segments, which are defined 00:11:33.240 |
by the character count here, where did that information come from? 00:11:39.440 |
So it's the one thing that at least, you know, Google, like, Gemini is relatively good at 00:11:43.840 |
is actually saying, okay, this has come from here and this has come from here. 00:11:47.800 |
So when it's generating some information, you can actually map that back to an actual 00:11:56.080 |
And then you also have the text that's contained within those segments. 00:12:00.480 |
It's saying, okay, this generated text came from here. 00:12:04.520 |
That -- you know, that's useful, particularly when you're, you know, kind of, like, giving 00:12:11.480 |
It can be hard to if you -- if they don't see where that information is coming from. 00:12:19.720 |
And then you can also see how this -- how this search was done. 00:12:25.140 |
So we can see that Gemini did a search for recent AI developments and the latest news 00:12:32.280 |
So that's -- they're the search queries that provided -- or Gemini provided to the Google 00:12:46.080 |
So we can actually click through on these, and it will go through to where it got the 00:12:57.240 |
It's going to the blog and the latest news -- latest AI news, September 2024. 00:13:12.360 |
Maybe that's where the information is coming from. 00:13:14.080 |
Scoring 83% on the International Mathematics Olympiad qualifying exam with estimated IQ 00:13:32.560 |
So that's where the information is coming from. 00:13:34.640 |
So it is good to actually see, okay, it is pulling in this information for sure. 00:13:46.560 |
But you know what is -- you know what I really like in these interfaces when you have a chat 00:13:54.040 |
When it tells you, okay, the information is coming from a specific -- like you have the 00:13:58.840 |
text and it has like a little -- like a little one, and it has like a link to the source 00:14:05.380 |
So we're going to replicate that sort of interface here using these grounding support objects 00:14:12.260 |
So what we need to do is basically look at, okay, sort index and insert the -- like the 00:14:22.780 |
Now this could get quite messy because we've got all these like values and metadata and 00:14:28.740 |
So what I'm doing here is just keeping things a bit cleaner. 00:14:33.460 |
We're going to use -- we're going to use Pydantic, we're going to use a base model. 00:14:36.960 |
And we're going to use that to define a citation object. 00:14:39.900 |
That citation object is where we're going to basically for each of those grounding supports 00:14:45.140 |
we had, they're going to represent a single citation within our text. 00:14:52.420 |
There are multiple citations for every source. 00:14:56.220 |
That's why there are many more segments and citations, basically. 00:15:01.460 |
Or, yeah, there are many more of these segments than there are of the actual sources themselves. 00:15:10.700 |
So in any case, each citation will have a title and a link, which is going to map to 00:15:22.260 |
And then each one of them also has their own individual score, sort index, end index, and 00:15:34.100 |
Our end index and sort index is basically the character index within our text. 00:15:40.180 |
And then we also have the confidence scores here. 00:15:42.060 |
We're not actually -- I'm not actually doing anything with these. 00:15:46.380 |
If it's not particularly confident on something, you could say, okay, maybe we shouldn't include 00:15:52.740 |
And you could -- you know the sort and end index, so you could actually pull that out 00:15:57.460 |
If you want to be, like, super precise or just careful with what you're telling people. 00:16:06.940 |
But anyway, we don't -- here I'm not doing that. 00:16:14.980 |
Now two methods that are useful here is just get link. 00:16:25.940 |
So it's just, like, in italics and it's giving you, like, a little -- the square brackets 00:16:29.680 |
and a one with a -- or a one or a two or a three with a link. 00:16:35.540 |
So you can imagine if we're inserting these, like, citations and links and stuff into our 00:16:41.980 |
markdown, the size of that -- of the overall markdown increases. 00:16:47.280 |
So then if you're, like, iterating through and you're adding -- I mean, you could just 00:16:52.780 |
go the other direction, but I didn't do that. 00:16:56.460 |
Well, anyway, yeah, you could probably simplify that. 00:17:02.380 |
So what I'm doing is I'm going from the start of the text, I'm going through, and obviously 00:17:07.740 |
every time you add a citation, you're increasing the number of characters. 00:17:12.420 |
So you're actually modifying where this needs to land, like, the next start index. 00:17:22.020 |
Probably you could go in the other direction and you wouldn't need to do that, but I didn't. 00:17:39.220 |
I'm just creating a list of citation objects based on those -- the grounding chunk things. 00:17:53.640 |
Then we are just sorting those citations by their start index, right? 00:17:59.440 |
So this is -- so that we're going through one by one from the start until the end. 00:18:05.120 |
Again, as I kind of alluded to just now, you could go in the other direction and then you 00:18:09.200 |
wouldn't need to modify the character count, but it's fine. 00:18:19.820 |
We can use those methods I defined, so like the get link, to just very easily create a 00:18:27.360 |
So this text you see here is just a markdown link when it gets rendered, and it looks fine. 00:18:34.640 |
So I'm going to run this, and we're going to just take a look. 00:18:39.120 |
So what we're doing here, we're going through our citations. 00:18:42.480 |
We are taking the final response, and this is a bit where I overcomplicated it, I think. 00:18:49.240 |
So the final response, you get the start, or you get the chunk of your final response 00:18:54.080 |
before where you want to insert your citation. 00:18:59.640 |
Then you add the rest of the response after that. 00:19:04.520 |
You could do it in the other direction if you go in reverse with the citations, and 00:19:09.400 |
you wouldn't need to do this bit here, which is adding an offset to your next citation. 00:19:19.400 |
So basically every time we insert stuff, we're adding this offset so that we put the next 00:19:25.720 |
But as I said, we don't actually need to do that if you do it differently. 00:19:32.240 |
So you can see here that we now have these inserted citations, and you can click through 00:19:40.040 |
It's going to take us to this AI News page, and we must have got something from here. 00:19:47.160 |
Or we click on this one, it takes you through, and you have all of your links there. 00:19:52.640 |
Of course, in an actual application, you could make all this a lot more dynamic and nice, 00:19:57.880 |
but you can see it didn't take us long to do that. 00:20:00.400 |
It was pretty straightforward, and we're getting nicely grounded responses with our citations. 00:20:06.140 |
Now finally, so okay, we have all of this, but we don't have, one thing that I do kind 00:20:11.320 |
of like having is this list of citations at the end. 00:20:15.060 |
So we're just going to go ahead and add that, pretty straightforward, again, all in Markdown. 00:20:20.140 |
So now we have all the stuff we had before, all of our links. 00:20:25.400 |
Then I just added these citations at the end, right? 00:20:29.560 |
So pretty straightforward, nice quick introduction to using Gemini 2 with the Google search tool, 00:20:39.080 |
which of course they've integrated quite easily, given it's Google. 00:20:45.320 |
And it just gives you these nicely grounded results. 00:20:51.400 |
Well, it's important for various reasons, like the typical like rag argument. 00:20:56.360 |
So without augmenting your LLM, your LLM will only know information up to the date that 00:21:05.040 |
The world of the LLM is just what it was trained on, right? 00:21:13.760 |
But if you want your LLM to give you accurate answers about like recent events, you need 00:21:18.560 |
it to have access to some external information, which is what we do with rag, right? 00:21:25.120 |
We're creating our database, like a custom database, throwing that somewhere, and then 00:21:31.040 |
With this, with this web search, we're doing, you know, similar thing, but we're just using 00:21:34.960 |
Google search directly and giving our LLM access to it, right? 00:21:40.400 |
And also, when you're asking about specific things, you have this ability, like we did 00:21:47.360 |
here with these links, you have the ability to provide the references to where that information 00:21:54.200 |
came from, which I think is pretty important because LLMs can just make stuff up and they 00:22:05.100 |
So just having those links there, if the LLM said something and you're like, "I'm not entirely 00:22:13.240 |
And also for us, like developing these things, we have this confidence score, right? 00:22:17.340 |
That confidence score, I think, could be pretty useful. 00:22:19.560 |
I haven't tried using it, but I think that could also be pretty useful in just flagging 00:22:24.480 |
Either flagging things to the user where you're like, "Ah, I'm not too sure this is actually 00:22:27.680 |
true," so maybe you might want to go check that link, or just saying if it's below a 00:22:33.600 |
certain threshold, maybe we just don't return it to the user at all. 00:22:38.660 |
So that is potentially pretty useful, I think, though I haven't thought too much about it. 00:22:46.720 |
So thank you very much for watching, and I will see you again in the next one.