Gemini 2 Agent + Google Search and Citations

00:00:00.000 | Okay, so Google released the new Gemini 2 model, or 2 Flash, the experimental model,

00:00:07.660 | and it seems very good.

00:00:10.040 | And one thing that is super interesting about these models is that DeepMind are focusing

00:00:16.260 | on the agentic ability of these models, which in my opinion is really the future of these,

00:00:24.560 | it's not even the future, it's the now of these models, and I think it's also what we're

00:00:28.840 | using them for in the foreseeable future as well.

00:00:31.880 | So really focusing on that agentic ability I think is really good, and it's something

00:00:36.760 | that you find is missing in a lot of models.

00:00:40.580 | Like a lot of LMs are good at generating text, but if they can't generate text reliably in

00:00:46.560 | a structured manner, you can't really integrate the models with code, right?

00:00:53.360 | And that really limits you in what you're actually able to build with them.

00:00:56.680 | So I do like this focus from DeepMind on the agentic component, which they really, for

00:01:05.640 | the agentic area, it's like the announcement plug here, it's one of the main things they

00:01:10.360 | talk about.

00:01:11.360 | So that is really cool.

00:01:12.900 | Let's have a look at a, okay, obviously this is Google DeepMind, so they have support for

00:01:19.040 | just integrating all of this with Google Search, which is kind of interesting.

00:01:22.680 | So we have this example, AurelioLabs Cookbook, you go into Gen AI, Google AI, Gemini 2, there's

00:01:29.760 | another example in here, and I'm going to add a few more soon as well.

00:01:33.120 | And you can either run this in Colab if you prefer, or you can run it locally, it's up

00:01:36.680 | to you.

00:01:37.680 | So you can go and open that in Colab, that is the easiest thing to do for sure.

00:01:40.720 | But I'm going to be running it locally here.

00:01:42.620 | So running it locally, there are setup instructions here, you need to use, I've set it up to use

00:01:48.280 | UV, which is like a Python package manager, which I think is pretty good.

00:01:53.280 | And well, let's jump straight into it.

00:01:54.760 | So the first thing we're going to need to do is get Google AI Studio account.

00:01:59.760 | So that is pretty straightforward, it's not hard.

00:02:03.220 | So you go into, what is it, aistudio.google.com, you'd have to create an account, and then

00:02:10.280 | you need to get your API key, right?

00:02:12.080 | So API key, I think the settings, API plan information.

00:02:15.720 | And actually for this example, okay, for using Gemini, just generating stuff, that's all

00:02:20.800 | you need, right?

00:02:21.800 | So you could go to your, I think go to billing here, and you can get your API key.

00:02:25.760 | However, we're going to be using the Google search API.

00:02:29.760 | And for that, you actually need to add your billing information.

00:02:32.440 | So I think if you haven't already done that, you should see it somewhere around here, it

00:02:36.040 | will say something like add billing information.

00:02:38.420 | So you need to do that.

00:02:39.680 | I don't think running through this example, you're actually going to need to pay anything,

00:02:44.840 | because you have like a certain amount of like free credits, I think every month, although

00:02:50.800 | I'm not 100% sure, actually, you can see here, so yeah, free of charge, you have all of this.

00:02:57.480 | I don't know, hmm, potentially the Google search component, potentially, yeah, you might

00:03:07.480 | have to pay for that.

00:03:08.480 | I haven't seen the bill yet, so yeah, anyway.

00:03:12.320 | So you would actually have to pay a small amount for this.

00:03:16.680 | Pretty sure we can create like a free version.

00:03:20.160 | Okay, so I go into, yeah, it opens this.

00:03:24.040 | And then I need to go, where do I need to go, API credentials, I think it was.

00:03:31.840 | So when you create your account with Google AI Studio, they will create this project in

00:03:36.680 | GCP for you.

00:03:37.680 | So this is why I have Gemini API here, I didn't actually create this project, they just did

00:03:41.720 | it for me.

00:03:42.720 | And then you go into here, so you create your credentials, you create API key, that's it.

00:03:46.280 | So yeah, you need to copy that API key, and then we're going to use it in the notebook.

00:03:50.440 | Okay, so once we're in the notebook, we would run this cell.

00:03:55.240 | Because I'm running locally, I would be using this, so actually it would be this.

00:04:00.240 | Then when I run a cell, we have this get pass, it's basically going to have a little text

00:04:05.760 | box pop up, and just ask you to enter your Google API key, your AI Studio API key.

00:04:12.480 | And it's going to use that to initialize your client.

00:04:15.840 | You can also just enter your API key directly in like a string if you prefer, it's up to

00:04:21.120 | you.

00:04:22.120 | Okay, so we're going to start.

00:04:24.000 | So one thing with Gemini is that it just does generally generate everything in Markdown.

00:04:30.120 | So we're going to use IPython display, import Markdown, and we're just going to display

00:04:34.040 | everything in Markdown.

00:04:35.920 | It looks better.

00:04:37.240 | So the model we're using is, of course, the first of the Gemini 2 models, it's Flash,

00:04:43.840 | which is like the, at least going from the previous versions of Gemini, it's like the

00:04:49.640 | second fastest of the Gemini family.

00:04:55.120 | I haven't used Gemini models before this, to be honest, but I believe that is true.

00:05:00.980 | And then eXp here just means experimental, right?

00:05:03.680 | And then we can see, okay, we have this like nice little Markdown output.

00:05:08.360 | It looks, you know, I'm sure it can probably tell us about itself relatively well.

00:05:16.440 | And you can see here actually, so we have Ultra Pro and Nano, it doesn't mention Flash.

00:05:21.640 | So thanks Gemini for that lack of information.

00:05:26.520 | I'm going to ask it quickly.

00:05:31.360 | What are the Flash models in Gemini?

00:05:36.320 | Okay.

00:05:37.320 | Yep.

00:05:38.320 | Quick processing.

00:05:39.320 | Yep.

00:05:40.320 | So on and so on.

00:05:44.880 | So that just tells us it's efficient, slightly reduced accuracy, although honestly it does

00:05:50.800 | do pretty well, but I am looking forward to see the better ones.

00:05:56.320 | But yeah, generally faster model, obviously when you do faster models, you have that trade

00:06:02.200 | off all the time.

00:06:04.660 | So getting to the Google search tool.

00:06:08.020 | So Google search, they have it kind of baked into the library here.

00:06:14.980 | One thing with, sorry, just one thing to say here with Google's AI libraries, generative

00:06:20.940 | AI, gen AI, whatever, they have various different versions of libraries that are supposed to

00:06:27.500 | give you access to all of this.

00:06:30.220 | And it's a bit of a mess.

00:06:31.400 | So the one that you should be using is this Google gen AI.

00:06:35.300 | There are others.

00:06:36.300 | It's like Google generative AI and whatever they, I don't know what, I know Google is

00:06:42.740 | a big company.

00:06:43.740 | It feels like they just got various teams to build the SDKs for them and just didn't

00:06:49.500 | communicate.

00:06:50.500 | So the one that seems to work is this Google gen AI.

00:06:54.060 | I need to make sure I run this, run this again.

00:06:59.660 | So we have, we've initialized our tool here.

00:07:02.920 | If you look into this, interestingly, it's basically just empty.

00:07:05.980 | So I believe that this is just an object type.

00:07:11.240 | And then when it is being passed through the SDK over to Gemini on the other end, it's

00:07:15.860 | seeing that you're using the Google search tool, or you're wanting to use it.

00:07:22.620 | And then it handles everything on that side.

00:07:24.580 | We come down to here, we have this generate content config object.

00:07:28.060 | This is important for just generating stuff with Gemini in general.

00:07:35.340 | So system instruction here is literally your system prompt.

00:07:39.040 | So I'm just saying, like the tip, if you're a helpful assistant, provides up-to-date information

00:07:44.600 | and help the user in their research.

00:07:46.220 | So I'm not really specifying anything on using the tools, but it works, it uses it all the

00:07:52.220 | time anyway.

00:07:53.480 | And then we're passing in the tools, right?

00:07:55.280 | So it's just a search tool.

00:07:56.280 | Of course, you could pass in more tools if you have them.

00:08:00.100 | We'll have a look at like using Gemini in a more like agentic fashion.

00:08:04.540 | We are using it kind of agentically here, but like more agentically with your own custom

00:08:09.420 | tools and stuff.

00:08:10.720 | We want to respond with text.

00:08:13.160 | We can set the temperature and the candidate counts.

00:08:16.320 | The candidate count is basically how many responses it's going to return you.

00:08:19.720 | How many candidate responses, essentially.

00:08:22.420 | By default, that is one.

00:08:24.080 | And I think you can't actually set it to anything else for now.

00:08:28.280 | But I just wanted to outline those other parameters that you can have.

00:08:31.360 | There's also the other typical sort of LLM generation parameters, and I think a few more

00:08:35.520 | as well.

00:08:36.520 | So yeah, there's also a frequency penalty, which I was using in another example.

00:08:41.760 | So yeah, there are a few, quite a few things that you can like mess around with there.

00:08:46.940 | Right.

00:08:47.940 | So once you've done that, we can pass everything into our config, right?

00:08:52.520 | So I'm going to say tell me the latest news in AI, okay?

00:08:56.800 | And it's going to go ahead and use, obviously, well, it should hopefully go ahead and use

00:09:00.440 | the Google search tool, and we'll have a look at what it responds with, and also what we

00:09:07.600 | can actually do with all of the other stuff in there as well.

00:09:12.800 | So they, so OpenAI have introduced these models, which I don't think is actually a new thing,

00:09:23.080 | is it?

00:09:24.080 | Okay.

00:09:25.080 | It's not that new, unless I'm wrong.

00:09:31.680 | I actually don't know what VO, and I know, obviously, Imogen is fine, generative AI.

00:09:36.680 | Okay.

00:09:37.680 | I mean, that's just super useless.

00:09:40.680 | Could have been a bit better there.

00:09:43.160 | Maybe, I mean, it's fine.

00:09:44.760 | Let's just go with it.

00:09:46.280 | So VO and Imogen, relatively new stuff, I think.

00:09:51.840 | Right.

00:09:52.840 | So that's fine.

00:09:53.840 | But what I really care about here is, okay, what else can we get from our search there?

00:09:59.240 | Right?

00:10:00.240 | Can we confirm it's search for one?

00:10:01.840 | That's probably important.

00:10:04.120 | But what can we actually get from our search?

00:10:05.640 | So we're going to run this.

00:10:08.040 | And this is basically, okay, so I mentioned they have multiple candidates for your responses.

00:10:14.640 | We're just generating one.

00:10:15.680 | So if I go with, like, one here, it will just break.

00:10:19.920 | So we're going with the one candidate that we generated.

00:10:24.980 | We're looking at this grounding metadata.

00:10:26.440 | So what, grounding, what is that?

00:10:30.240 | Essentially the grounding concept here is that your LLM is grounding what it's responding

00:10:37.280 | with with external information.

00:10:39.580 | So we're grounding it with information from the Google search.

00:10:43.200 | And then within that grounding metadata, you have a ton of stuff.

00:10:46.600 | All right.

00:10:47.600 | So we have these grounding chunks.

00:10:48.600 | So this is, like, okay, where is it pulled information from?

00:10:51.040 | You can see that we have, you know, a few websites with the links.

00:10:55.400 | All these links are essentially links through Vertex AI search.

00:10:59.240 | Why do they do that?

00:11:01.840 | But they do.

00:11:02.840 | So there's that.

00:11:04.280 | And you also have the title, which is kind of nice.

00:11:06.080 | So the title of the website, it can just make it a little bit cleaner.

00:11:09.240 | Although that being said, it's not, like, a nice title.

00:11:11.880 | It's just the URL.

00:11:14.040 | It's just a cleaned URL.

00:11:16.640 | So you have that.

00:11:18.080 | Anyway.

00:11:19.080 | The retrieve context is just always none at the moment.

00:11:21.840 | I don't know why that is exactly.

00:11:25.200 | But anyway.

00:11:27.080 | Then we have all these grounding supports.

00:11:28.960 | So these grounding supports are telling you, okay, from your segments, which are defined

00:11:33.240 | by the character count here, where did that information come from?

00:11:38.440 | Right?

00:11:39.440 | So it's the one thing that at least, you know, Google, like, Gemini is relatively good at

00:11:43.840 | is actually saying, okay, this has come from here and this has come from here.

00:11:46.800 | Right?

00:11:47.800 | So when it's generating some information, you can actually map that back to an actual

00:11:53.080 | -- like, an actual source.

00:11:55.080 | Right?

00:11:56.080 | And then you also have the text that's contained within those segments.

00:11:58.360 | So this is literally the generated text.

00:12:00.480 | It's saying, okay, this generated text came from here.

00:12:04.520 | That -- you know, that's useful, particularly when you're, you know, kind of, like, giving

00:12:08.880 | stuff that people need to trust.

00:12:11.480 | It can be hard to if you -- if they don't see where that information is coming from.

00:12:18.720 | Right?

00:12:19.720 | And then you can also see how this -- how this search was done.

00:12:25.140 | So we can see that Gemini did a search for recent AI developments and the latest news

00:12:30.280 | in AI.

00:12:31.280 | Right?

00:12:32.280 | So that's -- they're the search queries that provided -- or Gemini provided to the Google

00:12:36.520 | search tool.

00:12:38.480 | Cool.

00:12:40.100 | So we have all that.

00:12:41.400 | We can, obviously, extract all that out.

00:12:43.640 | We can format our links nicely.

00:12:46.080 | So we can actually click through on these, and it will go through to where it got the

00:12:49.520 | information, like the page.

00:12:51.640 | Went to New Dasik and whatever.

00:12:57.240 | It's going to the blog and the latest news -- latest AI news, September 2024.

00:13:01.480 | Right?

00:13:02.480 | It's like, okay.

00:13:03.480 | You can see information here.

00:13:05.400 | I don't know, yo, some -- ah, yeah, yeah.

00:13:08.520 | Maybe it came through from here.

00:13:10.360 | Right?

00:13:11.360 | Let's see.

00:13:12.360 | Maybe that's where the information is coming from.

00:13:14.080 | Scoring 83% on the International Mathematics Olympiad qualifying exam with estimated IQ

00:13:19.800 | of 120.

00:13:21.000 | Which is exactly what Gemini told us here.

00:13:26.800 | Right?

00:13:28.480 | There we go.

00:13:29.480 | That's where the information came from.

00:13:30.880 | 83% on the -- yeah.

00:13:32.560 | So that's where the information is coming from.

00:13:34.640 | So it is good to actually see, okay, it is pulling in this information for sure.

00:13:39.880 | Which is nice.

00:13:41.360 | Great.

00:13:42.360 | So we have that.

00:13:43.360 | It's pulling in those sources.

00:13:45.040 | We can format them nicely.

00:13:46.560 | But you know what is -- you know what I really like in these interfaces when you have a chat

00:13:51.620 | bar and you ask some questions?

00:13:54.040 | When it tells you, okay, the information is coming from a specific -- like you have the

00:13:58.840 | text and it has like a little -- like a little one, and it has like a link to the source

00:14:03.080 | that you can go through.

00:14:04.240 | I really like that.

00:14:05.380 | So we're going to replicate that sort of interface here using these grounding support objects

00:14:10.580 | in the segments.

00:14:12.260 | So what we need to do is basically look at, okay, sort index and insert the -- like the

00:14:19.960 | markdown links there.

00:14:22.780 | Now this could get quite messy because we've got all these like values and metadata and

00:14:27.380 | everything everywhere.

00:14:28.740 | So what I'm doing here is just keeping things a bit cleaner.

00:14:33.460 | We're going to use -- we're going to use Pydantic, we're going to use a base model.

00:14:36.960 | And we're going to use that to define a citation object.

00:14:39.900 | That citation object is where we're going to basically for each of those grounding supports

00:14:45.140 | we had, they're going to represent a single citation within our text.

00:14:52.420 | There are multiple citations for every source.

00:14:54.500 | So it's not like a one-to-one mapping there.

00:14:56.220 | That's why there are many more segments and citations, basically.

00:15:01.460 | Or, yeah, there are many more of these segments than there are of the actual sources themselves.

00:15:10.700 | So in any case, each citation will have a title and a link, which is going to map to

00:15:19.380 | one of these.

00:15:22.260 | And then each one of them also has their own individual score, sort index, end index, and

00:15:27.940 | chunk index.

00:15:29.420 | So chunk index is here.

00:15:30.780 | That's just mapping it back to one of these.

00:15:34.100 | Our end index and sort index is basically the character index within our text.

00:15:40.180 | And then we also have the confidence scores here.

00:15:42.060 | We're not actually -- I'm not actually doing anything with these.

00:15:44.500 | But you could do these, right?

00:15:46.380 | If it's not particularly confident on something, you could say, okay, maybe we shouldn't include

00:15:51.740 | this information.

00:15:52.740 | And you could -- you know the sort and end index, so you could actually pull that out

00:15:56.460 | if you wanted to.

00:15:57.460 | If you want to be, like, super precise or just careful with what you're telling people.

00:16:04.420 | So that could be pretty useful.

00:16:06.940 | But anyway, we don't -- here I'm not doing that.

00:16:10.780 | So we have this citation-based model.

00:16:14.980 | Now two methods that are useful here is just get link.

00:16:20.500 | Okay?

00:16:21.500 | So here I'm just getting a link.

00:16:23.340 | It's just a -- it's just marked down, right?

00:16:25.940 | So it's just, like, in italics and it's giving you, like, a little -- the square brackets

00:16:29.680 | and a one with a -- or a one or a two or a three with a link.

00:16:33.980 | And then also count characters.

00:16:35.540 | So you can imagine if we're inserting these, like, citations and links and stuff into our

00:16:41.980 | markdown, the size of that -- of the overall markdown increases.

00:16:47.280 | So then if you're, like, iterating through and you're adding -- I mean, you could just

00:16:52.780 | go the other direction, but I didn't do that.

00:16:56.460 | Well, anyway, yeah, you could probably simplify that.

00:17:01.380 | Doesn't matter.

00:17:02.380 | So what I'm doing is I'm going from the start of the text, I'm going through, and obviously

00:17:07.740 | every time you add a citation, you're increasing the number of characters.

00:17:12.420 | So you're actually modifying where this needs to land, like, the next start index.

00:17:17.780 | So yeah, I did that.

00:17:22.020 | Probably you could go in the other direction and you wouldn't need to do that, but I didn't.

00:17:27.060 | Doesn't matter.

00:17:28.620 | I've done it this way now.

00:17:31.800 | So let's just make sure we run this.

00:17:36.160 | Then run this.

00:17:37.440 | So what am I doing here?

00:17:39.220 | I'm just creating a list of citation objects based on those -- the grounding chunk things.

00:17:45.880 | So -- yeah, sorry, grounding supports.

00:17:51.360 | Yep.

00:17:53.640 | Then we are just sorting those citations by their start index, right?

00:17:59.440 | So this is -- so that we're going through one by one from the start until the end.

00:18:05.120 | Again, as I kind of alluded to just now, you could go in the other direction and then you

00:18:09.200 | wouldn't need to modify the character count, but it's fine.

00:18:17.320 | Now we've got a citation object.

00:18:19.820 | We can use those methods I defined, so like the get link, to just very easily create a

00:18:26.360 | markdown link here.

00:18:27.360 | So this text you see here is just a markdown link when it gets rendered, and it looks fine.

00:18:34.640 | So I'm going to run this, and we're going to just take a look.

00:18:39.120 | So what we're doing here, we're going through our citations.

00:18:42.480 | We are taking the final response, and this is a bit where I overcomplicated it, I think.

00:18:47.280 | So yeah, yeah, I did.

00:18:49.240 | So the final response, you get the start, or you get the chunk of your final response

00:18:54.080 | before where you want to insert your citation.

00:18:57.400 | You add your citation.

00:18:59.640 | Then you add the rest of the response after that.

00:19:04.520 | You could do it in the other direction if you go in reverse with the citations, and

00:19:09.400 | you wouldn't need to do this bit here, which is adding an offset to your next citation.

00:19:16.640 | So if you see here, we have this offset.

00:19:19.400 | So basically every time we insert stuff, we're adding this offset so that we put the next

00:19:24.240 | citation in the correct place.

00:19:25.720 | But as I said, we don't actually need to do that if you do it differently.

00:19:32.240 | So you can see here that we now have these inserted citations, and you can click through

00:19:38.040 | on these.

00:19:39.040 | So we'll go to number two here.

00:19:40.040 | It's going to take us to this AI News page, and we must have got something from here.

00:19:47.160 | Or we click on this one, it takes you through, and you have all of your links there.

00:19:52.640 | Of course, in an actual application, you could make all this a lot more dynamic and nice,

00:19:57.880 | but you can see it didn't take us long to do that.

00:20:00.400 | It was pretty straightforward, and we're getting nicely grounded responses with our citations.

00:20:06.140 | Now finally, so okay, we have all of this, but we don't have, one thing that I do kind

00:20:11.320 | of like having is this list of citations at the end.

00:20:15.060 | So we're just going to go ahead and add that, pretty straightforward, again, all in Markdown.

00:20:20.140 | So now we have all the stuff we had before, all of our links.

00:20:25.400 | Then I just added these citations at the end, right?

00:20:28.560 | Nice.

00:20:29.560 | So pretty straightforward, nice quick introduction to using Gemini 2 with the Google search tool,

00:20:39.080 | which of course they've integrated quite easily, given it's Google.

00:20:45.320 | And it just gives you these nicely grounded results.

00:20:49.120 | And I think this is pretty important.

00:20:51.400 | Well, it's important for various reasons, like the typical like rag argument.

00:20:56.360 | So without augmenting your LLM, your LLM will only know information up to the date that

00:21:03.340 | it was trained, right?

00:21:05.040 | The world of the LLM is just what it was trained on, right?

00:21:10.640 | Which is fine.

00:21:11.640 | It's not a problem, of course.

00:21:13.760 | But if you want your LLM to give you accurate answers about like recent events, you need

00:21:18.560 | it to have access to some external information, which is what we do with rag, right?

00:21:25.120 | We're creating our database, like a custom database, throwing that somewhere, and then

00:21:29.360 | giving our LLM access to it.

00:21:31.040 | With this, with this web search, we're doing, you know, similar thing, but we're just using

00:21:34.960 | Google search directly and giving our LLM access to it, right?

00:21:38.440 | So it has access to update information.

00:21:40.400 | And also, when you're asking about specific things, you have this ability, like we did

00:21:47.360 | here with these links, you have the ability to provide the references to where that information

00:21:54.200 | came from, which I think is pretty important because LLMs can just make stuff up and they

00:22:02.040 | do it super convincingly.

00:22:05.100 | So just having those links there, if the LLM said something and you're like, "I'm not entirely

00:22:10.480 | sure," you can go in and check.

00:22:13.240 | And also for us, like developing these things, we have this confidence score, right?

00:22:17.340 | That confidence score, I think, could be pretty useful.

00:22:19.560 | I haven't tried using it, but I think that could also be pretty useful in just flagging

00:22:23.480 | things.

00:22:24.480 | Either flagging things to the user where you're like, "Ah, I'm not too sure this is actually

00:22:27.680 | true," so maybe you might want to go check that link, or just saying if it's below a

00:22:33.600 | certain threshold, maybe we just don't return it to the user at all.

00:22:38.660 | So that is potentially pretty useful, I think, though I haven't thought too much about it.

00:22:45.040 | But yeah, I'll leave it there for now.

00:22:46.720 | So thank you very much for watching, and I will see you again in the next one.

00:22:50.280 | Bye.

00:22:50.780 | [MUSIC PLAYING]

00:22:54.140 | [MUSIC PLAYING]

00:22:57.500 | [MUSIC PLAYING]

00:23:01.020 | [MUSIC PLAYING]

00:23:04.380 | (music)

Gemini 2 Agent + Google Search and Citations

Chapters