Intro to GraphRAG — Zach Blumenfeld

00:00:00.000 | so as you come in we have here server set up with everything you'll need if you want to follow along

00:00:22.440 | you should have gotten a post-it note if you don't just raise your hand and my colleague Alex over

00:00:28.320 | here we'll come find you and we'll provide you with one basically what you're gonna do is you're

00:00:33.120 | just gonna go if you have a number 160 or below you go to this link here the QR code on top as well

00:00:40.860 | and if you have a number that's 201 or above you go to the second link or the QR code from there I'll

00:00:47.340 | give you some directions you're gonna have to clone a repo and then you're just gonna have to move an

00:00:51.480 | environments file over really quick also for everything in this deck I did create a workshop

00:00:56.280 | graph rag intro slack channel so if you're part of the AIE slack group you can also go there and just

00:01:02.820 | grab this deck and you know get the links or however however you'd like to do that so we'll get started

00:01:08.580 | here thinking just a couple of minutes all right so you're sorry you're not your number will give you

00:01:15.120 | your username and your password basically it's attendee all lowercase than your number that'll be both your

00:01:21.060 | username and your password when you sign in

00:01:23.320 | the other link that you should try to open as well as the browser preview but I'll walk you through that

00:01:39.660 | here in a second I'll give it just another minute here for everyone to file in and get situated

00:01:46.200 | the user name and password is going to be attendee all lowercase and then the number that you have

00:01:58.260 | both the user name and the password are the same

00:02:00.360 | so the notebooks the servers will be down after the session but we you have the github link you can go

00:02:14.140 | back to so the code is is there for you to use it's just that the environment won't be available afterward

00:02:20.520 | all righty so i'm going to go ahead and get started here i'll leave this screen for a second so if you

00:02:31.920 | want to grab the qr codes here now would be the time to do it obviously you can go to the slack

00:02:38.280 | channel and also pick up this deck all righty so we're going to do an intro to graph ride workshop today

00:02:49.860 | um i was debating what to actually put in this workshop since everything's changing so quickly

00:02:56.160 | and some of my colleagues convinced me not to make it too complicated so this course is going to be

00:03:02.940 | very much an introductory level course if you want to look at sort of more advanced graph rag techniques

00:03:09.380 | integrations with things like mcp we have that at our booth and we have some other things that we're

00:03:14.520 | doing other events tomorrow that we'll go over some of that stuff and i'll have links to all of that

00:03:19.200 | uh as we go through but basically what we're going to do is we're going to get everything set

00:03:24.960 | up hopefully that should only take a few more minutes and then we have three modules we're going

00:03:30.180 | to go over some graph basics we'll be using neo4j today so it's a graph database just how to query that

00:03:36.360 | kind of how to you know construct um your logic to retrieve data we'll go over um another module on

00:03:44.040 | unstructured data um how to do entity extraction

00:03:47.540 | how many people know about neo4j

00:03:56.860 | all right how many people have by the way used neo4j like have written cypher queries

00:04:04.620 | okay so some folks in here and then how how many people have used lane chain before

00:04:11.340 | okay so so a fair number of you okay that's good to know um and then in our in our third module we'll

00:04:19.900 | actually go over we'll use lane graph we'll build a very simple agent that will use some retrieval tools

00:04:25.340 | and you get to see how some of that works and we'll wrap up after that with some resources so make sure to

00:04:30.220 | ask questions straight away raise your hand i'll stop intermittently we only have 80 minutes though

00:04:35.820 | so i want to make sure if you do have a question go ahead and raise it and we'll get that answered

00:04:40.380 | because we're going to be moving through the material a little bit quickly as i said before we have two

00:04:47.100 | jupyter server set up so you don't need a pip install anything you can go ahead and connect to these notebooks

00:04:53.020 | attendees i already explained you should have a number if you don't go ahead and raise your hand

00:04:58.620 | the username and password is just going to be attendee followed by your number 160 and less

00:05:03.660 | you go to that first link 201 and larger you go to the second link there's also i if you can go ahead

00:05:09.820 | and open browser.neo4j.io slash preview i'll show you in a little bit you're going to log into that as

00:05:17.100 | well that will let you visualize the graph a little bit better as we start putting data inside of it

00:05:25.260 | yes all right so alex we can get a number over here

00:05:30.540 | all righty once you're inside of your environment what i want you to do

00:05:41.980 | is these two commands so you're going to open up a terminal window in jupyter you can do that by pressing

00:05:48.940 | the little plus sign i want you to get clone the command should actually be in the readme it should

00:05:54.060 | say get clone and it will give you the link to the repository that you need to clone and once you do

00:05:59.340 | that i want you to copy the workshop file over into that gen ai workshop talent folder that environment file

00:06:06.300 | ws.env is going to have information to a database that's already set up it will also have an open ai key

00:06:13.100 | inside of it that we'll be able to use for the workshop so just to show you what this looks like

00:06:20.700 | if i go over here and i look at my my terminal right i've already done this

00:06:29.580 | this a little bit bigger you just go ahead and get clone

00:06:36.860 | if you go to the readme that was um in the main folder this readme here it has the uh the um the

00:06:46.540 | github url so you just basically go get clone um and then oops and then after that you just copy

00:06:55.500 | that workshop file into that gen ai workshop talent directory so you'll get this gen ai workshop talent

00:07:02.940 | it's like a subdirectory in here and you just have to copy that file and that will have um resources

00:07:08.700 | that you'll need to log into uh the browser and for you to connect through your notebooks the other link

00:07:14.860 | that is inside of that deck is this browser.neo4j.preview link basically that will give you a way to

00:07:24.540 | visualize the graph so what you will do if i go ahead and disconnect and maybe i

00:07:31.180 | um um got to connect to an instance so you should get a screen that looks like this

00:07:36.700 | and then that workshop file will have and it should actually be the same thing for you guys it should

00:07:42.220 | be the attendee the number and then the same thing for the password so go ahead and make sure you do that

00:07:48.460 | because then you'll be able to visualize the graph a little bit better

00:07:51.180 | any questions so far

00:07:58.860 | so for the jupyter environment if you got your number it's attendee all lowercase

00:08:03.340 | and then your number for both the username and the password

00:08:07.740 | sorry where's the connection url

00:08:13.900 | um

00:08:22.220 | yeah ws.env for the for the neo4j browser

00:08:26.220 | so it's um right here

00:08:33.980 | so it's basically going to be your username or sorry it's going to be the attendee all lowercase and then

00:08:43.420 | the number that you receive for both the username and the password

00:08:47.420 | yeah and if if you um want to come back to this if you have if you're connected through slack

00:08:52.540 | the workshop graph rag intro you can go there and you can pick up the slides as we move on so then

00:08:57.740 | that way you just have a constant reference back to it

00:09:05.180 | um so while everyone gets set up here i'll talk a little bit about just what graph rag is in general

00:09:12.780 | to kind of motivate what we're doing here so this is um an architecture actually um that represents what

00:09:19.500 | some of our customers do it's a very common architecture for graph rag users it's generalized

00:09:25.340 | and basically the idea is that you have your agent over there you have your ai models and your ui so

00:09:32.140 | like all the normal things that you might think of if you're putting together a knowledge assistant

00:09:35.740 | but then there's this knowledge graph thing in the middle and that knowledge graph thing you can ingest

00:09:42.060 | both unstructured and structured data into that so unstructured being things like documents and pdfs

00:09:47.420 | and that sort of stuff and then structured being tables like csvs or stuff from a relational database or

00:09:53.020 | what have you um and so there's a big question of like well why in you know the heck do we need this

00:09:59.740 | like knowledge graph thing in the middle right like we have agents we can have tools and we can go pick

00:10:04.940 | stuff from data sources and so the idea with this is that if you have a use case and you kind of know the

00:10:11.020 | types of questions that you want to answer with your agents by taking your data and decomposing even a very

00:10:16.460 | simple knowledge graph to start you're going to be able to expose a lot of the sort of domain logic that

00:10:24.460 | you'd want to apply through the model of your data so the idea is like we'll see when we build a skills

00:10:30.300 | graph we'll make some relationships about people knowing skills and by making that schema available to

00:10:35.900 | the agent and making tools available to the agent you're going to be able to have a lot more control over

00:10:41.580 | how data is retrieved more accurately explain the retrieval logic better and we see this is especially

00:10:47.820 | important as we start moving more and more into this agentic world because it's not like a one-shot

00:10:52.460 | vector search anymore right we're starting to see that now when we get questions or prompts handed to

00:10:58.780 | an agentic workflow those start to get broken down in various ways and when you have a knowledge graph it

00:11:04.140 | just lets you offer retrieval logic to complement that in a much more simple and in my opinion a better

00:11:11.580 | manner and today we're going to be looking at a skills and employee graph so basically what will the use

00:11:20.620 | case we'll be looking at is you're building a knowledge assistant to help with things like searching for

00:11:25.420 | talent aligning and analyzing skills within an organization and doing things like staffing and team

00:11:31.900 | formation and substitutions and things of that nature

00:11:34.700 | and so i'm going to present a little bit about what we're going to go through in these modules

00:11:42.620 | first so i'll do some stuff inside of a deck hopefully it won't take too long i just want to

00:11:46.460 | kind of talk to you about cypher and some of the things that you'll be seeing and then we'll go ahead

00:11:51.100 | and get hands-on here pretty quickly

00:11:52.940 | so we're going to talk about creating a graph we'll start with some structured data here just to keep things

00:12:00.860 | simple i'll introduce on structured data a little bit later some basic cipher queries some algorithms

00:12:06.060 | and we'll get into some vector search and semantic stuff so a knowledge graph basically when we think

00:12:14.460 | about it a knowledge graph generally is devont is defined as some design patterns to organize and access

00:12:19.820 | interrelated data and at neo4j we model the data inside of the database is what's called a property graph

00:12:27.260 | and this consists of three primary elements so the first are nodes these are like your nouns these are

00:12:32.940 | your people places and things next are your relationships these are how things are related

00:12:39.660 | together hence the name and often will be like verbs so person knows person person lives with person

00:12:46.460 | person drives or owns a car and both of the nodes and relationships can have properties which are just

00:12:55.580 | attributes they can be strings they can be numbers they can be arrays of things and they can be vectors

00:13:01.740 | as well so we can store vectors for we've had for a long time inside of neo4j and you can do search over

00:13:08.460 | these things now the query language that we're going to use to access the database is called cipher

00:13:16.620 | and i know a lot of you raised your hands in the beginning so you already have some familiarity with this

00:13:22.940 | but cipher kind of looks like ascii text so the idea right is that it has this sqls kind of feel to it

00:13:31.660 | but you get to write these statements like if you see match person knows skill basically you're connecting

00:13:38.460 | a person node to a skills node through that nose relationship so it reads kind of very

00:13:44.940 | literally in the way that it's written nodes have what's called labels which is sort of like it would

00:13:52.060 | be the equivalent of a type of table within a sql database of basically what type of entity it is

00:13:57.180 | and then as i said before they have properties so for example you can identify by a property like name

00:14:03.740 | and you can have variables like p and s which refer to the actual entity as you start to write your query

00:14:12.060 | more so this is not going to be a course on writing cipher right because we can make an 80 minute

00:14:17.180 | course just on like how are we going to make cipher queries um but we'll be walking through these

00:14:22.620 | queries so don't expect to like be if you haven't seen cipher before to be a super expert in the cipher

00:14:28.300 | query language when we're done but just know that this is kind of how it works and then as you go

00:14:33.100 | through hopefully you'll get a better understanding and a feel for how these queries uh work and in the

00:14:38.220 | types of data that can be returned as you run them um and so i'm sure is everyone pretty familiar with

00:14:46.940 | vector search in here at this point yeah i have a feeling this audience probably would be so i won't

00:14:52.380 | spend too long on this right i think we all kind of know what embeddings are it's basically a type of

00:14:57.340 | data compression you can apply them to all sorts of things right text audio you can even apply them to

00:15:03.020 | graphs oftentimes it's just going to be a vector of numbers and then you can use that to find similar

00:15:09.260 | things within that domain space so find texts that are similar uh semantically not just lex lexily like

00:15:16.220 | actually based on the types of things that they're talking about and within neo4j you have search

00:15:22.460 | indices including vectors so there's range indices you have uniqueness constraints you're able to search

00:15:28.460 | text you're able to do full text with blue scene and then we also have approximate nearest neighbor

00:15:34.220 | vector search as well that we'll be leveraging as we go through in combination with the cipher queries

00:15:39.820 | that we were just looking at to do graph traversals the next thing to know about is that in addition to

00:15:46.540 | being able to query the database we also have analytics so we have graph analytics powered on the database

00:15:53.180 | that lets you do different types of data enrichment and do more graph global type of analytics so finding

00:16:00.620 | which nodes are most central according to different algorithms doing things like community detection how

00:16:05.980 | do you cluster the graph finding paths between nodes doing different types of embeddings so we have a lot

00:16:12.380 | of those algorithms and we'll be touching on them very very briefly today in the first module

00:16:17.180 | just to show that you know once you have a knowledge graph you can start enriching that data and then

00:16:21.660 | actually using things like we'll see in our case we'll be using community detection where we'll be

00:16:28.540 | summarizing skills inside of our graph and then we'll be able to pass that on to an agent to actually

00:16:34.060 | use that to explain some parts of our graph for our use case all righty so with that in mind we'll go ahead

00:16:43.980 | here and jump into the first notebook are there any questions before we dive in is anyone still okay yes over here

00:16:53.580 | yes do you have uh let me just go back here to the

00:17:12.220 | so and this is available in the slack channel too if you don't have a number um my colleague alex over

00:17:19.820 | there can go ahead and grab one for you um we're in the workshop graph rag intro slack channel so you

00:17:25.420 | can go there to grab the deck and all the links but basically if your number is 160 or below you go to

00:17:31.260 | that first jupyter server if it's 201 or above you go to the second one use attendee all lowercase and

00:17:39.580 | then your number as both your username and your password you'll do that for the jupyter notebook and

00:17:44.620 | then also for the neo4j browser if you want to follow along with visualizing the graph as we go through

00:17:52.140 | any other yes i know this is uh introduction to uh graph track so um but maybe like you know when you're

00:18:01.660 | building these these graphs uh i see you have like a small small graph how do you i can prioritize whether

00:18:09.260 | you should like big big graphs like you know one that makes more than scale or make smaller graphs

00:18:16.780 | so your question is about data modeling and whether how do you prioritize making one graph versus

00:18:24.220 | multiple graphs um i mean it's a good question i think in general for a lot of what we're seeing with

00:18:31.340 | agents i find it's helpful to have a smaller data model if possible especially if you're doing different

00:18:37.420 | types of dynamic query generation so to keep that in mind but as things are getting better we can pull

00:18:43.740 | back the graph schema and and offer it to agents and i'm we're noticing that as agents sort of keep

00:18:49.180 | it or as language models really keep iterating they're starting to get better and better at

00:18:53.340 | interpreting so whenever you want to do traversals in a low latency way between two data points those

00:18:59.740 | things really should go in the same graph and then it's a question as far as what you make a label versus

00:19:04.540 | a property um in that scenario so we'll go through some of it and then if you want to talk after and come

00:19:10.300 | by our booth we can have a more sort of use case focused conversation anything else all righty so i'm

00:19:17.820 | going to go ahead here and then dive into the notebook

00:19:20.140 | so for our first notebook

00:19:24.940 | you can go ahead and restart that's fine

00:19:34.700 | all right so you're just going to come down here and start um and remember we're in the talent subfolder so

00:19:45.660 | there's two workshops in here the one we'll be doing is called talent

00:19:48.540 | if you're in the other one it's there's also some interesting stuff in there but you won't be able to

00:19:53.900 | follow along all righty so it looks like i'm running now so basically what i'm going to do is i'm going to get

00:20:01.180 | my environments file here and i'm just going to load it if you um don't have the environments file just

00:20:07.580 | go ahead and move it it's in the root directory just go ahead and move it into this subdirectory

00:20:12.140 | it's this ws.n file

00:20:14.140 | and basically what we're going to do first is we're just going to load our skills data sets it's going to be a

00:20:22.300 | for a table and if we look at that table um we're going to have uh basically three fields

00:20:30.220 | there's the um an email field a name field and then just a list of skills for the person

00:20:39.340 | and as i said before we'll go into a little bit of detail here around how you might extract this from

00:20:44.380 | documents like resumes um in a second but basically for now because we're interested in sort of this

00:20:51.260 | skills uh mapping and team formation and staffing kind of use case we're starting with this sort of

00:20:57.900 | very simple data set to get us started um and so there's a couple steps here that just go through

00:21:05.820 | basically organizing the data to make it easy to load and then we're going to start to create our graph

00:21:11.980 | and so a lot of this is just what we'd call like basic kind of neo4j data loading we're going to create

00:21:19.660 | chunks out of our data frame you're going to um basically check to make sure you've got nothing in

00:21:26.540 | your database i do have stuff in my database because i was just running this before but that's on me

00:21:31.660 | because i was just running the course before yours should say zero now the first thing we do is set a

00:21:36.700 | constraint so basically inside of neo4j whenever you create nodes if you have what's called a node key

00:21:45.180 | constraint or a uniqueness constraint it's basically saying in this case that the email has to be

00:21:50.620 | unique and non-null for all your um for all your people and that will make it so that it's very fast

00:21:57.900 | to match on people and do merging operations so a lot of times people will say well neo4j is really slow

00:22:03.260 | and that's often because of simple mistakes like not setting a constraint and then you're going to have to

00:22:09.820 | do very complex searches in the database every time you search on a user rather than having it

00:22:14.220 | in an index that's unique and then you also do the same for skill because our data model is going to be

00:22:21.740 | person and skill so we have two types of nodes and when we do that we'll go ahead and have two constraints

00:22:29.260 | here inside of the database you'll see for skill and for person after that we'll go ahead and start

00:22:36.620 | loading our nodes and our relationships so the way that this query works and i i guess i won't run it

00:22:43.260 | even though it won't actually change anything in my database but what we're doing here is we're looping

00:22:48.780 | through chunks of our data frame and we're saying hey merge a person on email set their name and then

00:22:56.620 | for that list of skills basically you're going to merge a skill on a skill name and then you're going

00:23:03.260 | to merge here that the person knows that skill so it's going to create this graph pattern of person knows

00:23:08.220 | skill in the database once you run that what you can do is if you have that browser window open that we

00:23:15.020 | were going over before is i can go ahead and copy one of these or maybe i'll just take i'll take this

00:23:22.780 | one well i'll go ahead and take this one first well this will show you inside of the database

00:23:28.780 | is if i just match people i'll get my people back here and i may have lost oh cool i still have my internet

00:23:37.420 | connection all right so i can go ahead and see that i have my people they have their names and their email

00:23:42.860 | addresses you can do the same thing for matching skills and then you can also look for relationships

00:23:51.260 | so this gets into that pattern matching that we were talking about before with cypher this is a very

00:23:56.140 | simple version of matching a path so i'm saying p which is path is equal to node connect to nose connects

00:24:03.260 | to another node and i'm saying limit 25 and that will return a graph where i get to see all these different

00:24:09.740 | relationships looks like my internet connection is still somewhat slow but i get it back here so

00:24:16.220 | you'll see i'll get my people i'll get that nose relationship then in this case this person knows api

00:24:22.220 | design tableau flask and you'll see different skills pop up here inside of your graph

00:24:31.420 | and there's you know you can you can go ahead and run these through what through our driver here as

00:24:36.620 | well to look at the data pull back the different people that are in there and find out what skills

00:24:43.660 | they have and such we do here

00:24:53.820 | nose is a relationship type we are making it up so our our domain model that we have

00:25:02.460 | i can actually call it here and mine is going to show more than yours if you run the same command because it

00:25:10.060 | has the um some other later stuff that we do in the course but basically you have person knows skill

00:25:19.180 | that's our data model so you can say person has skill would be another way to put it right

00:25:24.620 | exactly exactly yeah and it's it's actually funny because this is becoming even more important um now that

00:25:38.620 | we're using uh llms to design queries because like the language that you use is sort of like an annotation

00:25:45.580 | for the model right so that starts to become very interesting

00:25:48.620 | all righty so there's some cipher queries here that i'll go ahead and run through really quick and i may

00:25:59.740 | depending on time need to need to kind of speed things up through this notebook because i want to

00:26:04.620 | make sure that we actually get to the agent at the end so um if you go into the uh the deck

00:26:14.780 | there's this link browser.neo4j.io slash preview and then i think it's just your username and your password

00:26:25.660 | but you can also look inside of your workshop environment file and it will have that information

00:26:31.340 | there you just use your username and your password and your uri information which you get here so you get

00:26:36.940 | your uri and then your username and your password

00:26:39.100 | all righty so like i said we'll go through some of these so for example we can count in cypher so we can say

00:26:50.140 | match person no skill we can get back the name and we can count the distinct people uh basically here for

00:26:57.180 | for each skill so basically what we're doing here is you can think of it as like okay i've got all my

00:27:02.460 | skills and i'm going to count the distinct people that know that skill it's very simply what we're doing

00:27:07.180 | when we get that back we'll see kind of what our most popular skills are here

00:27:12.540 | going down and they're all kind of tech focused we can also ask different types of multi-hop questions

00:27:18.780 | which is very interesting so for example i'll take this and i'll copy it over to my browser because

00:27:24.620 | it's it's interesting to see these visually but what we're asking here is we're going to we're going to

00:27:32.300 | take this person um named lucy and i'm just going to ask you know what people are kind of similar to lucy in

00:27:40.140 | terms of knowing the same skills right so i can go ahead and run that and then what i'll get is

00:27:47.500 | i'll get lucy here i'll get all of her skills and then i'll get all the other people that know those

00:27:52.460 | skills here right um and you can build on that iteratively so i can if i go back here i can also say

00:28:01.180 | well now i want to know all of those skills or all those people and i want to know and basically

00:28:08.300 | i'm going to add at the end of that query i get they know a certain skill and then i want to get

00:28:14.140 | all of those people and then i want to get all the skills they know so i'm basically adding

00:28:18.300 | this and what skills do these other people know to the query and then i'll get a very large graph back

00:28:24.300 | but the idea with this is that once we have this logic extracted from whatever our original data source

00:28:30.540 | is we can now control at a much more fine-tuned level how we define what a similar person is or what a

00:28:37.820 | similar skill is because we have this ability to traverse over the graph and apply concrete logic it's

00:28:45.100 | basically like having your information in a symbolic versus just a sub symbolic vector um and so you know

00:28:51.420 | you'll get a lot of stuff back because now we're looking at people and all the other skills that they know

00:28:56.460 | and i can go in here and find the most central skills among these people right like for example

00:29:02.300 | scrum is very central among this group because there's there's a lot of people that know that skill

00:29:07.180 | so i'm figuring out about this local community that sort of knows the similar skills to lucy

00:29:12.060 | and in here it's just some examples of running that same logic

00:29:20.620 | basically inside of the inside of the notebook yes

00:29:24.060 | so to get to the graph basically what you're going to do is you're going to go to this neo4j browser link

00:29:35.660 | oh i see um so

00:29:41.260 | you table then graph and then sometimes if you're not returning nodes it will only return a table like if i said

00:29:48.780 | you know return um yeah okay

00:29:55.100 | okay yeah so if you if you just say return p

00:30:01.500 | um in that case it should return it should return the past sometimes if you don't see it it's because

00:30:06.140 | you're returning like just a name or something um and then in that case it'll like just show you a list of names

00:30:12.700 | um and i don't have a distinct here oh yeah i did

00:30:23.740 | i don't know if it's completely necessary for this one actually

00:30:31.180 | yeah i don't think it is completely necessary for this one

00:30:37.580 | um there are times when you do very complicated especially we'll see that there are a couple other

00:30:42.060 | examples where we do like multi-hop paths and there's a chance with some of those that you'll get

00:30:47.340 | basically two paths that are the same um in which case having the distinct there just allows you to

00:30:52.860 | filter it better

00:30:56.140 | um

00:31:04.140 | so that's hard to say they're getting better

00:31:07.340 | a lot of it depends on the complexity of your schema

00:31:10.860 | and basically you know we see for simpler aggregation queries or when you have a lot of prompt

00:31:17.900 | engineering around doing different types of path queries that are very specific on a smaller model they can do well

00:31:23.660 | we do often recommend that you have your own expert tools if there's like a really complicated type of

00:31:28.540 | traversal that you want to do so right you can write your own python functions or you can have

00:31:33.020 | your own mcp server that will just have like your you know set of functions for your more complicated traversals

00:31:38.940 | we also see too you can sort of restrict the options for llm so instead of writing a complete query

00:31:45.260 | you can say hey like there's you know these you know three types of general patterns you know and and

00:31:51.420 | write that part of the pattern and then it will go into this other query so you can do stuff like that

00:31:55.340 | to help it we've also done we've had fine-tuned models that we just released i think back in april

00:32:01.660 | they're on they're fine-tuned from gamma they're on hugging face so you can try using those as well

00:32:07.740 | they can do a little bit better we're not going to use them here though unfortunately because we're

00:32:12.940 | using a bunch of open ai uh open ai models for this um all righty so a lot of this um as i was just going

00:32:24.540 | over is basically just running these queries um returning in this case the distinct is important to

00:32:30.460 | return a distinct name because you might actually get to the same person multiple times um so if you're just

00:32:36.700 | returning a name in a and um of a skill right then it's important to um to to use distinct in that case

00:32:45.020 | so we get all the distinct skills that basically showed up in the graph we were just looking at

00:32:49.580 | another thing that might be important for our use case is finding similar people

00:32:54.780 | so this is again using that query that we were just going over to find we use lucy before but now we can

00:33:02.540 | actually parameterize that here um and then basically go no skill and then we can match um basically from

00:33:11.340 | that skill going to another person and we can sort of count number of shared skills between people

00:33:16.940 | and so when we do that right we can go ahead and see in this case um like the number of shared skills

00:33:27.740 | um between between different individuals um and we do it again here i think for um a different

00:33:36.620 | set of people i think this just counts most skills shared between any two people

00:33:41.660 | um so you can kind of see that here again just another way to measure similarity beyond just semantic

00:33:48.460 | similarity measuring actually what we have from our model in terms of exact shared skills

00:33:57.180 | and as we go through this some of the things that we can do to help speed up our queries and this

00:34:01.500 | is this is sort of optional but if we know that we're going to um sort of look for similar skill

00:34:07.580 | sets a lot we can create a similar skill relationship inside of our graph so basically we can match two

00:34:15.180 | different people um and then we can merge a similar skill set basically bet based on um an overlap of the um of a skill

00:34:25.660 | count so we have our data frame locally that we can use for that that we that we were just looking

00:34:31.100 | we basically just pulled it back when we were looking at those similar skills and what that's going

00:34:35.180 | to do is it's just going to create again this relationship that has similar skill set between people

00:34:41.340 | um and if i were to look at that it will um go over to my browser

00:34:54.620 | we'll go ahead and see you know i can have a similar skill some of these are overlap one

00:35:00.140 | others will be greater um inside of here i think they go up to three um so it's all this is doing is

00:35:07.580 | basically saying hey if two people like what is their overlap so we don't have to do that full traversal

00:35:12.380 | over and over again if we don't want to um with that similar skill set relationship

00:35:18.140 | um the next thing i wanted to show you and yes but if you do this you need to like this is static

00:35:24.540 | and if you want to update what's the overlap you need to run this query again you would need to run

00:35:29.180 | that query over again yeah so i mean it depends on how often your data gets updated there's nothing

00:35:35.980 | actually wrong with doing the multi-hop query over and over again the graph database is designed to handle

00:35:41.980 | that so if you had a situation where you had a graph that was constantly getting updated you might not

00:35:47.820 | even need to create this relationship

00:35:49.580 | the next thing that i wanted to show you was how our graph analytics works inside of neo4j and basically

00:36:00.780 | using that to enrich the graph so this is basically creating what we call a gds or a graph data science client

00:36:10.940 | and what's going what we're basically doing here is we're creating something called a projection

00:36:16.220 | and then we're running a algorithm called leiden for the graph rack community are you all have

00:36:22.220 | how many people here have heard of leiden as an algorithm okay so just just a few people how many

00:36:28.220 | people have heard of louvain as a graph algorithm okay so we got a couple people basically what this

00:36:35.100 | algorithm does is it breaks the graph down into a hierarchy so we'll start by

00:36:40.140 | basically breaking the graph into a few big communities and then going into smaller communities

00:36:46.620 | and what it's trying to do is optimize what we call modularity and it's basically this metric that says

00:36:52.940 | hey i want to create these clusters in my graph where the connections within the cluster are very high and

00:36:59.420 | connections across clusters are very low so i'm creating these modules and basically what i do by creating

00:37:05.260 | these um and i'm using that similar skill set relationship so this is another important reason

00:37:10.300 | to create it because if you do um analytics on your graph it can help with those analytics running a

00:37:16.060 | little bit better because i have person connects to another person with a similar skill set

00:37:21.740 | and by running this leiden algorithm basically what we get is a bunch of communities that reflect

00:37:29.820 | people within the communities knowing similar skills so this is all simulated data

00:37:36.940 | but basically if i go down to um and i'll skip over some of this we do some checks around like how good the

00:37:44.940 | communities are i encourage you to run this just so that the agent works well at the end

00:37:49.820 | but the what i wanted to show you here is this uh graphic right here so what this is this graphic is

00:37:58.060 | looking at because basically we wrote this community id property back to the graph

00:38:02.140 | and you get these different community ids and you get to see which communities in a heat map

00:38:08.540 | have the most skills in a certain in a certain area so this data is randomly generated so a lot of these

00:38:15.260 | patterns are going to look maybe a little bit funky if you were to really dig into them but the idea is

00:38:20.620 | that um as you have very if you have more relevant data and realistic data this can actually show you

00:38:27.100 | like your data engineers are here right and your front end guys and front end folks are over here and

00:38:33.020 | then your ml people are over here so you can start to see that within the graph um and really break that

00:38:38.380 | down um but do go ahead and run everything here um through the g.drop so that you have that property

00:38:45.420 | another way that we can break down uh sort of different groups inside of the graph is to look at go ahead

00:38:54.540 | sorry i just had one question regarding like when when do you uh customize your graph like for example the

00:39:00.620 | the community detection algorithm that you're running yeah and when do you just let the agent

00:39:05.180 | is there any you know again heuristics well actually better to invest time in figuring out whether

00:39:14.540 | we should improve our graph well i think it depends on your use case right like if you're very interested

00:39:22.620 | in you know saying hey i want to understand like the skill communities inside of my company right if that's

00:39:29.500 | like a question that's going to come up frequently then using something like graph analytics can be

00:39:33.900 | very beneficial right because you can do basically like employee segmentation you can understand

00:39:39.740 | performance with inside of different groups and stuff we see it oftentimes used for customer

00:39:44.700 | segmentation and recommendation systems and that sort of thing too um at the same time if you're just

00:39:50.220 | like hey i just want to like look for matches of different people with similar skills maybe you don't need

00:39:55.740 | community detection for that because that's just like a pairing exercise right um so i'd say you use it

00:40:01.020 | whenever you want to do some sort of clustering analysis and persist that and then sort of even have

00:40:06.780 | visibility and i guess the the confidence in knowing that there was some way that you did that right and

00:40:12.780 | it's not just up to the model that's just making stuff up around how that works right so basically you

00:40:18.060 | look at uh what the users are doing and then try to see if you need to build yeah yeah yeah yeah

00:40:29.020 | yes

00:40:38.300 | the heat map is showing you how often different skills show up with inside of each community

00:40:45.020 | aren't the communities based on what skills they have yeah so like the first community for example is it

00:40:55.580 | it looks like either tableau or swift right yeah to understand like the skill breakdown within each

00:41:04.380 | community and again this is generated data so this is a little bit random right um but you can imagine

00:41:11.660 | that in a non-random scenario what you're probably going to end up seeing is like if you have a lot of

00:41:16.860 | product managers versus a lot of you know um like front-end developers versus you know like devops

00:41:23.660 | folks like you'll see that grouping start to emerge

00:41:25.900 | yes

00:41:28.540 | two connected questions one do you have any different uh best practices for data modeling for an agent to

00:41:34.780 | understand the data model or just general draft best practices around creating data models um yeah i mean

00:41:43.580 | i'd say a lot of the agent stuff is evolving super super quickly um you know as llms keep changing and

00:41:50.300 | getting better um we've had for a long time guides on how to do like data migration from relational systems

00:41:56.380 | to graph and how to think about that there's a certain way in graph how you think about again like

00:42:01.100 | nodes being nouns relationships being verbs and how to connect those together for agents i think it's

00:42:06.700 | really nice when the data model reflects a natural language right so person knows skill very natural

00:42:12.860 | language way of you know saying something that that translates directly to a data model and as i was

00:42:19.260 | saying before simpler data models seem to work better when you do like dynamic query generation

00:42:24.140 | so there's stuff like that and the rest of it is i know like the it depends answer is like

00:42:31.100 | you know such a cop out but it is true that like depending on the type of retrievers that you have

00:42:37.020 | the size of your data the cardinality of different categories of things in your data right like

00:42:42.940 | you know you generally don't want if you can avoid it to have hundreds or thousands of no labels

00:42:48.140 | because it's just a lot so then you make them properties so there's a lot of stuff like that to

00:42:52.060 | consider so i don't know if that answers your question but yeah we'll see at the end of module

00:43:13.180 | three which we might not have time to get to but you'll see it in the code and i'll show it really

00:43:17.340 | quick is that there's you can from the graph schema there's functions that we have to pull back the

00:43:23.100 | node labels and the relationship types so you can create a sort of json representation right of what

00:43:28.700 | the graph schema looks like and then combine it with specific prompts so then it's like okay i follow

00:43:33.900 | that another thing to do that helps even more is if you have a graph data model that's not going to

00:43:39.900 | change a lot over time where you know you can just pull it and it will be the same for a while

00:43:43.740 | because you can annotate that schema so you can say like for specific properties or no labels or

00:43:49.020 | relationship types hey this thing does this and when you ingest data make sure you you know put it

00:43:54.460 | here and when you pull data make sure you can you know go on different paths the other thing is putting

00:43:59.420 | in like we had person knows skill putting those actual query patterns into the schema as well helps a

00:44:05.500 | lot because the model can read that and then understand how to do that traversal better

00:44:11.580 | all right anything else all right cool um so we're actually getting close on time so i'm going to go

00:44:18.940 | pretty quickly through the rest of this um but hopefully um it'll be pretty understandable so there's

00:44:27.020 | another way that we can start um thinking about skills and relationships between skills is how

00:44:32.060 | they're semantically similar so basically what we can do is actually make embeddings on our skills so

00:44:41.420 | there's another file in here um that basically has a csv file that you read into this notebook that has

00:44:47.980 | skills and descriptions and an embedding so which which field here do you think we embedded the skills of

00:44:55.740 | the description and why right so one of the things is when you have really short names like r is a

00:45:04.700 | technically a programming language although a lot of people don't love it i love r um aws like they're

00:45:11.820 | very short right so having descriptions about those uh skill names if you embed those it provides a more

00:45:20.140 | informative embedding right so that's the whole idea there so basically we give each skill a description

00:45:26.060 | and then we embed that description and what we're seeing inside of this is we're actually going to

00:45:31.740 | and these are all text embedding ada so they're so they're 1536

00:45:36.060 | we're going to go ahead and create a vector property

00:45:39.740 | this is just loading those up in chunks

00:45:43.820 | and then we're going to set the description as well and after we do that um we'll basically have

00:45:50.940 | i think we create our vector index down here that we call the skills embedding

00:45:55.100 | once that's set up basically what we're able to do and you'll see it show up here we'll get that skills

00:46:02.540 | embedding index is we're now going to be able to do vector search on skills inside of the graph so if

00:46:09.660 | i have python as a skill and i go ahead and i'll use this command in cypher to search the skills embedding

00:46:16.380 | pull back the 10 most relevant skills and you'll see here it'll it'll bring some skills back

00:46:21.980 | here it's like ruby and java we got pandas at least that's good django pytorch so some of these are

00:46:30.060 | better than others but the point is that we can go ahead and apply these vectors and then pull information

00:46:36.700 | back with vector search and another interesting thing that we can do um as well

00:46:43.420 | is we can if i had something that wasn't in the database like say i'm just looking for api coding

00:46:50.700 | right and i searched that as a term basically what i'm doing here is i'm using the open ai

00:46:57.020 | client here to just embed this model or i might be using actually laying chain up here looks like

00:47:02.220 | it's laying chain and then i'm doing a search on the database to pull back relevant skills with a

00:47:07.580 | certain similarity threshold and i'll get back api design and javascript for that api coding example

00:47:13.580 | and what i can actually do in this case is i can say well i have this ability to do semantic similarity in the

00:47:24.540 | database i can actually write a relationship that's just similar semantic and i can attach a score to

00:47:31.340 | that um so there's some advantages to doing this but a big one is visualization and also clustering

00:47:39.820 | so if i were to take this command which takes a semantic similar semantic relationship

00:47:46.460 | and i go into my graph and i just put that in here and i just return all basically the skills that are

00:47:53.900 | semantically similar

00:47:54.940 | this internet speeds up hopefully

00:47:59.980 | and i zoom in i'll start to see sort of interesting groupings here so

00:48:07.340 | i'll start to see for example that i get my cloud skills here azure aws cloud architecture all in one place

00:48:16.540 | similarly like i have flask and django here connected

00:48:19.820 | i've got my data analytics groups is like tableau power bi data visualization

00:48:25.020 | and then i've got a big grouping over here so you see like you have your jvm languages like your java

00:48:31.260 | and scala and kotlin here and then i've got you know my python stuff here with pandas

00:48:38.060 | and then if i go up in this group that's connected right i've got my java and then i've got like all this

00:48:43.340 | front-end you know frameworks and stuff up here so don't underestimate the power of being able to

00:48:50.940 | visualize similarities very important because i can create communities from these i can use this for

00:48:57.180 | customized scoring in my retrieval queries which we'll see but the other really cool thing is that

00:49:02.940 | if for some reason like maybe i don't think java should be connected to python i can control that

00:49:10.940 | i can remove that relationship and then every time i do similarity relationships i have control over that

00:49:16.380 | and i can filter that right so that's just some important things to keep in mind about how you can

00:49:22.140 | sort of use vectors and graphs together

00:49:24.700 | quick question

00:49:28.620 | is it only when you do semantic similarity is it only to visualize or is there any other

00:49:34.300 | we'll see in a bit here because basically what what we can do and i'll answer this actually as i

00:49:40.620 | as i go down i can pull back this semantic similarity relationship here but what i can start to do

00:49:49.900 | which is actually pretty cool is i can start creating these um sort of customized scorings between

00:49:57.100 | things that kind of balance the like the semantic similarity versus hard relate like skill matches

00:50:03.980 | and i can weight that if i wanted to in in a custom way so you can use it in your retrieval patterns

00:50:09.100 | as well to improve things um yep

00:50:15.260 | you so you can use both um so there's a lot of workflows because now you can compose things together

00:50:32.460 | into multiple multiple steps right so you can definitely do something where you can pull similar

00:50:37.100 | skills and then look for people right and it just depends on how you break down those functions for the agents

00:50:43.420 | sometimes if you know that like there's a very specific pattern that you want to follow like

00:50:48.140 | here this is a very this looks like a really big query it's somewhat intimidating but it's actually not that complicated

00:50:53.980 | like what you're doing here is you're just sort of doing a weighting between the semantic similarity and a hard like

00:51:00.940 | like overlap with similar skill sets like that might be a case where it actually coupling that logic

00:51:06.380 | together might make sense like if there's a very specific type of metric that you want for similarity

00:51:11.580 | but what if you're coming from like when you're coming from you have a query and say you have an assistant

00:51:17.020 | and you have a much larger uh ontology um so it's like tell me about people with certain skills

00:51:23.980 | um how do you know that these are even the entities that you're looking at like what's the first step when

00:51:28.940 | you go from query to retrieval how do you break it into the entities to know that this is the corey

00:51:36.380 | that you should be doing does that make sense yeah i mean why don't we revisit that when we get to the

00:51:42.140 | third module the second modules should be really quick and then we can you can see some of the functions

00:51:46.540 | in the third module and then that might help me answer that question a little bit better i think i know

00:51:51.340 | where you're going but maybe seeing that will help um so as i said before this is doing kind of like a

00:51:58.300 | a balance between the semantic similarity and sort of a hard overlap of skills and then you can use

00:52:06.860 | that to kind of weight you know how you want to find similar people inside of the graph um so you

00:52:13.660 | can you can start balancing both sort of the vector search similarity and the uh similarity that uh just

00:52:21.340 | happens with inside of the hard matches and another cool thing about a graph database specifically is

00:52:27.180 | i'll go ahead and take this query here um just so you can see what this looks like oh go ahead

00:52:34.380 | i mean you could go either way right so some of a lot of this to be honest will come down to cost

00:52:53.180 | considerations like how expensive is it in neo4j versus how expensive is it inside of postgres and

00:52:59.100 | that varies a lot depending on the type of infrastructure you have having everything in

00:53:03.340 | one place means that you it's a little bit you don't have to like sync your data right and then

00:53:08.380 | you also the query latency is at least in theory going to be lower because you're just querying from

00:53:13.580 | the same database um but if you already have data in postgres maybe or you already have a specialized

00:53:20.220 | vector database you also don't have to migrate your data necessarily to neo4j to make that work

00:53:24.860 | so yeah i'd say a lot of it actually it's performance but it's really like cost per performance right

00:53:31.340 | is kind of what you're thinking about in terms of what what does each deployment cost

00:53:35.020 | so this query here um is actually i'm taking i'm looking for similarity between two people and then

00:53:44.860 | you see i have this like star dot dot thing here and basically with a graph you're allowed to do what's

00:53:51.340 | called variable length queries so i'm saying hey go out on similar semantic but you're allowed to go out

00:53:57.260 | anywhere from zero to two hops between these skill sets before you find a connection between john and

00:54:03.340 | matthew here and then i can also union it against just the plane you know person knows same skill

00:54:11.900 | and when we get that back we'll see right like you get matthew over here matthew knows react john knows html

00:54:19.580 | and then those are sort of similar because they both have a semantic similarity to javascript same thing

00:54:26.140 | here you see we have this semantic similarity but this is only one hop this is where the variable hop comes

00:54:31.980 | in so you can start to control like these you know sort of how far out you can go on on either of these

00:54:38.220 | paths to be able to pull back similarities between people um this is just an advantage of a graph database

00:54:47.260 | then i think i might want to finish it off for this notebook i would take a break except that uh we only

00:54:56.300 | have 23 more minutes so what do you say should we just power through the last 20 minutes you think yeah

00:55:04.540 | let's let's let's do that so now i will we looked at some of the advantages of using the graph and the

00:55:11.500 | semantic similarity inside of the graph and now we'll talk a little bit about our second module here

00:55:16.620 | and i won't go over to the slides because i think for you guys i can probably just hop right into the

00:55:21.420 | notebook uh around well what if we have just resumes right we don't have a csv file so this is going to be

00:55:29.420 | a simple example that will show you um basically how to take the data from text and turn it into useful

00:55:36.940 | data for the graph so again like if you're if you're going through and running this live

00:55:42.380 | you're just connecting to your same workshop file which you should have from before

00:55:45.980 | testing the connection making sure you can count now you should actually get 154 nodes

00:55:51.020 | so here and if you come by our booth we can show you much more uh sort of exciting examples than

00:55:58.860 | than the two text blobs that we have here but here we have two different bios

00:56:04.940 | and basically the way that you can do this and if you've already done some entity extraction

00:56:08.780 | you're probably already familiar with this workflow is we can define our domain model

00:56:14.060 | in terms of pedantic classes so here basically i'm going to define my person with a name and an email

00:56:23.580 | and then a list of skills and then i have the skills field here if you add relationship properties you

00:56:31.260 | would have like a nose maybe like in a more complicated model nose as a relationship would

00:56:37.740 | also have like a proficiency property in which place this would be a list of you know nose skills

00:56:44.780 | and then you would have a nose skill would have a class um would have a skill property inside of it

00:56:50.860 | but you can see this is a very simple example so all we're doing here is we're defining a list of skills

00:56:55.100 | skills that someone can have skills just has the name property and then we can create this person list

00:57:00.140 | and then once we have our uh pidantic class defined we can create our system message to basically be a

00:57:08.940 | prompt for our model and then we can use in this case i used for one here um and we gave it the documents to

00:57:19.020 | uh to uh to ingest and then it will spit out at the end of this some json uh with those two people so

00:57:25.580 | we had two documents each one um corresponded to one person and then we got our emails and skills

00:57:32.060 | um with all their different names and such and once we have that right um it's it's from there it's pretty

00:57:40.300 | trivial to load it and it's very similar to what we did last time um in fact if we go down to our graph

00:57:45.980 | creation here we'll see uh this isn't exactly the query that we had but uh it's very similar where

00:57:52.300 | we've basically we're ingesting one person at a time merging on that uh email address which we have

00:57:58.940 | indexed sending the name and then for each of the skills that's sort of a list inside of there

00:58:05.580 | we're going to go ahead and merge the skill name and then that nose property connecting them together

00:58:11.020 | and then of course i could go back to the graph and i can say these are neo4j employees that i loaded

00:58:18.860 | but i can go ahead and look for one of them in the graph and i should get them back here where i have

00:58:28.060 | them and i have the different skills that they went ahead and picked up that i can put inside of the

00:58:34.220 | the database so um very very simple uh we have um our own um graph rag python package as well which

00:58:44.780 | is very good for and and also our knowledge graph builder if you look at some of the code that we used

00:58:49.820 | to implement that which is kind of like a reference ui which has more sort of examples around if you

00:58:55.500 | wanted to do things like document chunking with overlap um you know and and of course also doing like

00:59:02.620 | multi-threading with async and stuff like that so you're not just you know doing a for loop you know

00:59:07.340 | over over a bunch of over a bunch of bios so we have all of that if you stop by the booth we can we can

00:59:12.700 | give you more around that um but like i said especially for this crowd because you guys are already familiar

00:59:18.140 | with this um you know this is this is a very short module and yes so we would have to do that yes so

00:59:34.780 | ideally i would have done this in an in an order where i would have done this first and then we've we've got

00:59:40.780 | we would have gone through like the clustering materializing new relationships in with weights

00:59:45.500 | we're we're putting new relationships in these don't have weights on them but if we wanted to

00:59:51.500 | reformulate our communities um thank you so so the the community is kind of following what you're doing

01:00:01.500 | you're adding extra links in between the nodes that have weights on them whether it's for semantic distance

01:00:07.820 | for skills or whether it's for you know you're you are x hops away from somebody else in terms of some

01:00:15.500 | other distance computation that you've materialized yeah so we would we would ideally we run that

01:00:23.100 | in a recurring way as we upload data right so because i created i did in this case create i think some

01:00:29.900 | new skills in addition to the people so like has same has similar skill set that we we would redo that and

01:00:37.180 | then we would get a couple more relationships there the semantic one if we created new skills we would

01:00:43.900 | create new semantic relationships between the skills yes okay thank you

01:00:48.220 | all right any other questions

01:00:57.900 | before we move on i'm ready all right well then that was a uh a very quick module so

01:01:07.740 | i'll go over and cover um just some other topics around this very very quickly

01:01:13.340 | inside of my inside of my slides here

01:01:17.500 | so what we saw was an example where i'd call it entity extraction or named entity recognition where

01:01:27.420 | we were taking a document and we were literally breaking out people places and things and relationships

01:01:31.980 | from within that document um there's other things that we can do like for example if we have certain

01:01:38.540 | types of documents like from a catalog or in this case rfps we can start to break things out by actual

01:01:45.500 | document structure so i'm only going to walk through this just so you understand that there's different

01:01:50.380 | types of extraction that we can do to create graphs for example if you know what the anatomy of a document

01:01:57.580 | is like in this case if we have an rfp we know that this rfp is going to be designed in a way where

01:02:03.660 | there'll be different sections that we have an intro objective proposal and subsections within that we can

01:02:09.100 | actually create a graph out of those things too so this is another way that we can do um i would call it more like

01:02:16.780 | document extraction um where we're actually putting the the metadata of the document and modeling it as

01:02:22.940 | a graph and the advantage of doing things this way is that basically as you start to embed these different

01:02:29.500 | pieces um and put them into a knowledge graph you can basically do these patterns where you can do these

01:02:38.700 | searches on either entities that come from different chunks um and you can sort of go up and down uh these

01:02:47.020 | document hierarchies to find things which can be very helpful if you have documents that always have

01:02:51.820 | repeated structure um so you know that entity sometime connects between those the structures of those

01:02:57.740 | documents you can start to incorporate that inside of your graph retrieval queries um and then it also gives

01:03:06.060 | you a way to do community summaries because we saw leiden before um but also if you have documents

01:03:12.220 | that give you a natural uh hierarchy um you have a way of also summarizing information um across those

01:03:18.940 | documents as well um yeah why do you have entities and um documents and chunks in the same ontology as

01:03:27.260 | opposed to extracting entities and um and just creating a separate ontology separate from documents and chunks

01:03:33.420 | why do you combine the two well i think when they're combined you can just do traversals between them

01:03:38.780 | so what do you look like what's an example of a of a traversal you want to do between entities and trunks

01:03:45.420 | an example that you want to do between a traversal of entities and trunks like i say like legal contracts is a good

01:03:51.260 | example where if you know like you want to search for different legal clauses but then like the expiry

01:03:56.460 | date that might be somewhere else so like that would be one example yeah yes just making sure i

01:04:03.900 | understand so like sure just want to make sure uh clear on what you're talking about traversing the

01:04:11.100 | document so in a legal document say like a data protection clause across multiple vendors or something

01:04:17.260 | like that and comparing the language like is that a use case yeah yeah okay yeah so that and then like

01:04:25.100 | you know so there might be like um like a perpetuity piece or like you know dates and different things

01:04:31.100 | and then being able to kind of traverse over that document to find that in addition to the entities

01:04:40.620 | anything else all right all right so i just wanted to introduce that as as another example of how to do

01:04:48.860 | things um for the third module because we're already at 1007 so let me just go ahead and jump into the

01:04:56.860 | thing so you'll get to see it i'll go over to module three and this is going to be very simple so has has that

01:05:05.100 | who here we probably asked this in the beginning i think we already asked how many people have experienced

01:05:09.180 | building agents this is going to be very simple it's going to be a lane graph agent that we're going to

01:05:14.460 | that we're going to make here um basically what we're going to do is again a similar setup with our

01:05:21.100 | environments file we're going to connect to neo4j test our connection um there's going to be four tools

01:05:29.820 | that we want to build we want to be able to retrieve the skills of a person we want to be able to retrieve

01:05:35.820 | similar skills to other skills similar people like if we wanted to find out who's another good person to

01:05:41.820 | work on a thing um and then retrieve people based on a set of skills and in this example um we're

01:05:50.380 | basically going to um do a lot of tools first so at the end of this notebook there's going to be like that

01:05:56.460 | text decipher stuff where you get the schema back but here what we're going to go over first is actually

01:06:02.220 | going to be putting these different tools together um and we do that by graph patterns and it's the same graph patterns that we've been going over

01:06:09.420 | uh so for example here right if we just want to find the skills that someone knows it's very simple right

01:06:15.500 | just person matching to their skills um and as you go down this notebook basically what you're seeing is

01:06:22.780 | all the different patterns so the second is retrieving people with similar skills and here we're actually

01:06:28.540 | going to use the um the vector index and that similar semantic relationship um so we're basically

01:06:35.980 | going to pull um actually this one is searching for people with skills i apologize so in this one the

01:06:42.220 | this is um you're going to look for skills so for example you if a user puts in different skills those might

01:06:50.860 | not match the skills we have inside of the database exactly word for word so you're going to use vector search to pull out

01:06:57.260 | um the specific skills and what's semantically similar to those skills and we can do some scoring

01:07:03.100 | thresholds in here to pull back exactly what we want um and then that will go ahead and return some

01:07:09.260 | skills so if we had for example right like this continuous delivery cloud native and security

01:07:14.940 | um this would be like the types of skills that we pull back from that

01:07:20.780 | the person's similarity there's a few different ways that we can do that and we've talked about

01:07:26.300 | that a lot towards the beginning we can do it by community so we can look for people that know different

01:07:34.060 | skills we can get all of their names and then we can look for um that light in community that we created

01:07:41.740 | we can look for all the skills that those people know and basically what we're doing at that point is we're

01:07:45.900 | looking for people inside of the same uh skills uh community um but the other way that you can do that

01:07:53.260 | is um you can look for similar skill sets um using the uh similar skill set relationships so the hard-coded

01:08:02.380 | relationship that we've made from before uh which basically looks at hey how many how what's the actual

01:08:07.740 | skill overlap if you just looked at um who knows what inside of the graph um and that will bring back

01:08:13.820 | um some answers here between so we were looking at john garcia and we're saying hey find similar people

01:08:20.620 | and then we can get like a score count of overlap um to the to the different people here and then we can

01:08:27.100 | start adding in that semantic similarity so this is where we get this big query right but what this query is

01:08:33.420 | actually doing is it's sort of balancing between the um similar skill set and semantically similar skill

01:08:41.260 | set so it's kind of taking both those scores and adding them together um and then from there we get

01:08:47.500 | a floating number score um and a little bit of a different answer that's not just based on hard skill

01:08:53.260 | connections but also skills that are kind of close together um and we can weight those independently as well

01:08:59.660 | and we can also recommend people given a set of skills so if we have a set of skills here

01:09:04.220 | we can just do a vector search on um on those skills and then actually this one

01:09:22.220 | yeah here it is um so basically um the query was broken out into two parts just because this is

01:09:30.220 | this is kind of a big thing to look at but the idea with this is we can get um we can basically do

01:09:35.180 | vector search on skills get semantically similar skills and then a fine person who knows those skills so

01:09:41.980 | very similar to some of the last ones and then we can get a skill count for all of those groups and get

01:09:46.860 | people back um when we actually define the functions for our agent we're going to create here um a skills

01:09:56.380 | object which basically is just going to help us um with some of our function arguments and returns

01:10:01.900 | but basically first tool retrieve skills of person very simple query um and then we'll have uh down here for

01:10:11.260 | tool tool tool for tool two when we say find similar skills um what we're going to look at here is again

01:10:18.060 | that query where we're going to do that semantic similarity between skills so we're going to do a

01:10:22.700 | vector search to find skills and then we're going to go out one hop on semantic similarity um and then we're

01:10:28.380 | basically going to collect everything and return it and then for the third one we're going to do that

01:10:34.940 | weighting because this tool three is going to be for person similarity we're going to do that weighting

01:10:39.900 | between the similar semantic and the similar skill set with that larger query and then i know i'm going

01:10:47.420 | through this kind of quickly for the fourth one where we say find person based on skills here again our

01:10:53.580 | entry point is going to be a vector search on skills um going out to match those semantically similar

01:10:59.580 | skills but then kind of at the end of that uh we'll add on a traversal that will attach the person to

01:11:06.220 | knowing those skills count who knows the most and then effectively return that so those are the four

01:11:11.980 | tools that we're going to end up using for this agent when we set up the agent here if you're familiar

01:11:17.740 | with uh how lang graph works basically uh we we get our llm we test that it's alive we define our list of

01:11:27.260 | tools we can if we didn't want to do this in an agentic way right we can just bind our tools to our llm

01:11:33.420 | and we can we can invoke our llm with tools but what we're going to do instead this is just showing

01:11:42.060 | invoking the different the different tools um is we are going to go and run it with an agent so we're

01:11:49.100 | going to use create react agent which comes from lane graph it's one of their pre-built agent um that

01:11:55.900 | uses the the react um i don't know if you'd call it a framework but um sort of that methodology uh to

01:12:02.860 | build an agent and effectively um once we do that we give it the llm we give it the four tools uh that we

01:12:11.820 | had and then um we can see that here we're just testing we're saying hi and we're just making

01:12:19.100 | sure we get some response back there's a utility function here just to make it easier running in

01:12:24.780 | the notebook um which will basically just you know do this some of the um some of the streaming

01:12:31.100 | methodology so i can just say hey what skills um does christoph have and then if i run that and i don't

01:12:37.660 | know if i don't know if i need to rerun my agent here looks like not everything's running um so you'll

01:12:45.820 | see it says uh when i ran that and actually i ran it for the wrong question here what skills does christoph

01:12:53.900 | have person named christoph and then it will bring back his skill so you see there it will choose to use

01:13:00.460 | uh retrieve skills of person um and similarly if i went down and i said you know what skills are similar

01:13:07.980 | to power bi and data visualization um it'll go through and choose the uh appropriate um

01:13:14.780 | you know the the appropriate tool for the job so in this case uh find similar skills it'll pull those back

01:13:24.460 | and you'll see going down right if i said well what person has similar skills to you know this other

01:13:30.780 | person here um then it will know oh i need person similarity so we'll go ahead and use that specific

01:13:36.700 | tool so in this case what we're doing is we're providing a bunch of tools that are presumably expert

01:13:41.900 | tools that we can give to the model and then it will know that okay i have to go ahead and pull those

01:13:47.580 | those specific tools to be able to provide a response and then there's a little app down here

01:13:53.900 | as well if you wanted to run the chat bot so it's a little gradio app here

01:14:01.500 | but basically if i ran that i can go ahead and come in here and then i can have a little conversation with

01:14:09.980 | it so this is very small but what skills are similar you know i can go ahead and ask it in here

01:14:17.260 | and then provided everything's working it'll go ahead and choose the appropriate tool

01:14:23.580 | and then i can say well who knows you know maybe i'll just say those skills and it should

01:14:32.860 | go ahead and pull the appropriate tool to be able to find out who knows all these different skills right

01:14:41.820 | and if i go back to my uh to my example here i should see the query logic that it used so first

01:14:48.940 | you know it said find similar skills to what i just mentioned because i asked about power bi

01:14:54.380 | and then after that i asked about people who know skills so it said find persons based on similar skills and

01:15:01.340 | and likewise i can say you know who is um similar to those people in the graph

01:15:10.140 | and it will likewise go through and it should you know understand that it needs to use the find other

01:15:16.700 | similar persons tools um to be able to do that

01:15:23.340 | so you'll see if i if i was to keep going down i should get calls um here define persons with similar

01:15:32.860 | we have it here yeah person similarity so it just called the person similarity for each person

01:15:38.540 | um and i know we only have a few minutes left uh there is if you wanted to run this further basically

01:15:46.300 | i have a texas cipher example so this is where

01:15:50.460 | i'll have an example of passing it the annotated schema so this is getting to kind of what you

01:15:55.020 | were asking about right where i've provided these descriptions um so it's sort of like annotations for

01:16:02.700 | the schema as well and then i can go ahead and give that to an aggregation query function that will have

01:16:09.100 | there's also an llm inside of here that will create the cipher um but you can see in here i asked

01:16:15.260 | some questions like describe communities it was able to understand that it needed to grab you know

01:16:20.380 | the the match person knows skill and then it needed to grab the lighting community so it knew from the

01:16:26.460 | schema right that it needed to generate this cipher and there's a couple more examples of that in the notebook

01:16:32.460 | are there any question i know i just went over a lot um are there any questions from that that are worth

01:16:40.540 | answering now while we have just a couple minutes left how long will the uh jupiter server be up if

01:16:52.620 | you want to play with this jupiter server is going to go down very quickly but if you look at the deck

01:16:57.740 | at the end of the deck i have a link to the code and the data is all in github too

01:17:04.540 | so basically if if you go here that's the github repository so you can play with it

01:17:10.300 | what's that the the deck is in this do you have access to the slack channel so the deck is in the

01:17:19.020 | slack channel um and i'll go ahead and jump to that in a second but there's the github repository you can

01:17:27.500 | use aura console we have a free trial that you can use you can just set up a cloud database and you can

01:17:33.660 | load the data into there um also before you guys leave there's a meetup happening tomorrow

01:17:39.820 | it's tonight oh sorry tonight at five um that and there's a link there for more information on that

01:17:50.300 | um and then we also have another workshop at one o'clock where today was very simple like we're going

01:17:58.620 | to go over more graph analytics type of stuff in that workshop so like the community stuff that i

01:18:03.500 | was doing we're going to dive more into depth on that in that workshop um other than that uh come

01:18:10.860 | by our booth if you have more questions we're going to be wherever right there is i don't think we have a

01:18:16.460 | big expo hall but if you want to see neo4j mcp servers adk examples more knowledge graph construction

01:18:23.100 | um all a great uh you know place to come to ask all those types of questions