AI Agents as Neuro-Symbolic Systems?

00:00:00.000 | Okay so I wanted to put together a sort of overview video of what I'm currently working on

00:00:09.040 | which is thinking of restructuring the way that I'm thinking about agents and the way that I'm

00:00:16.960 | also teaching or talking about agents. So this isn't going to be like a fully sort of edited and

00:00:27.680 | structured video I just want to show you a little bit of what I'm thinking about and explain or

00:00:35.040 | explain where I'm coming from really. So all in all this is is part of actually a broader thing

00:00:40.640 | that I am working on which is actually why I haven't been posting on YouTube specifically for

00:00:45.600 | quite a while now I think it's almost two months which is the longest I think I haven't

00:00:49.760 | posted in forever and you know it's well okay it's because I'm working on this but it's also

00:00:57.120 | for other things as well. I had a well I had my first son like a month ago so I've been pretty

00:01:04.640 | busy there and just working on a lot of things over at Aurelio as well but I wanted to go through

00:01:11.600 | this well this introduction to AI agents article that I'm working on and it is done but I do want

00:01:19.600 | to put together a more structured video and some sort of course materials on this. There's even a

00:01:26.240 | already a code example for this which is just taking a look at the React agents which obviously

00:01:35.920 | one of the earlier not earlier it's probably I would say like the foundational structure for

00:01:43.280 | what agents look like today what when I say agents I mean LM based agents and I think

00:01:54.000 | well that is just the most popular type of agent okay React but now it's more like tools or tool

00:01:59.840 | agents but they're very similar anyway I'll talk a little bit about them so first thing I do want to

00:02:06.640 | maybe cover very quickly is the React agent because that's what we're most familiar with

00:02:13.360 | so I'll come down to here so as a reminder okay React is basically this here

00:02:22.160 | so we have some input okay there's some text and rather than just asking our LM to answer directly

00:02:30.560 | we allow our LM to go through multiple reasoning steps and part of those reasoning steps is the

00:02:37.120 | fact that the agent can also call tools so it can get some external information or do something else

00:02:44.000 | and that's what I'm visualizing here right so we have our question this is from the React paper

00:02:50.320 | which I have linked to but the example is okay aside from the Apple remote what other device can

00:02:56.640 | control the program Apple remote was originally designed to interact with probably to be honest

00:03:01.920 | most LM's can answer this directly now I think particularly given that this example is from

00:03:08.640 | the React paper which is like two years ago but anyway it's off topic we're just giving an example

00:03:14.880 | here right so in this example what we're doing is okay we have these tools that the LM can use

00:03:22.160 | we provide them and we're also prompting the agent the LM sorry to say okay go through these multiple

00:03:29.040 | steps of reasoning and action so that is where the React comes from so it's RE and act from action

00:03:36.640 | and it goes through these steps so it's like okay I have access to the search tool

00:03:42.160 | um and like an answer tool at the bottom here which not really a tool but it kind of is at the

00:03:47.760 | same time so it goes through has this prompt and it knows I have to structure things in this React

00:03:55.680 | methodology so that's what it does it says okay it starts it says okay I need to search Apple

00:04:01.920 | remote and find a program is useful and then it provides it structures an action based on that

00:04:08.000 | so it knows it has a search tool and it knows that the input to the search tool is a query which is

00:04:11.680 | a string okay so we have the Apple remote that function runs using some logic that we've developed

00:04:19.840 | and the observation from that is this the Apple remote is designed to control the front row media

00:04:27.440 | center okay so what's the question we have aside from the Apple remote what other device can control

00:04:34.960 | the program Apple remote was originally designed to interact with so now we know what that original

00:04:41.680 | uh program was right which is front row and we have that information the LM now knows that

00:04:48.640 | information so it goes on to a next step and it says okay what do I need to do now I know that

00:04:52.720 | Apple remote controls the front row program but what other device controls the front row program

00:05:01.200 | so it says okay based on this um my next reasoning step is I need to search front row and find other

00:05:08.800 | devices that control it so then it does this search for front row it could also probably do

00:05:15.920 | something if we're thinking in rag terms here we could be like device to control front row and

00:05:20.800 | probably a more today LM would do that but that's fine this is just an example so it goes back to

00:05:29.280 | the search tool again and it says query front row and this isn't like I've shortened this uh for

00:05:36.240 | sake of brevity I think in the actual example it returned or at least from the paper the actual

00:05:40.960 | example returns a lot more information uh but this is the part of it that is important all right so

00:05:46.320 | front row is controlled by an Apple remote or keyboard function keys okay so now we know that

00:05:53.040 | that gets fed back into the LM so the LM you know now knows you know everything that we've covered

00:05:58.080 | here knows the original query and I was like okay well I have all the information that I need to

00:06:03.760 | answer the original query which is side to a map or remote what other device can control front row

00:06:09.680 | so the next step is the LM is like oh okay I have everything I can now provide the answer

00:06:19.120 | of keyboard function keys to the user okay and so it doesn't use a search tool and now it instead

00:06:25.520 | uses the answer tool which has this query this sorry parameter out and the the answer or the

00:06:35.280 | output for that is keyboard function keys which then gets provided back to the user okay so this

00:06:42.240 | is the react agent and this sort of structure of like reasoning building a like a query for a tool

00:06:53.520 | getting a response and then potentially going through another iteration of reasoning and action

00:06:57.760 | and another and then eventually providing an answer that is really the sort of commonly

00:07:06.400 | accepted definition of what an agent is that's what most people are using at the moment and I

00:07:12.960 | think that is great but I think it's very limiting because I I just wouldn't in production I would

00:07:20.880 | never just put something like this whether it's react or openai tools or whatever else I wouldn't

00:07:26.240 | just put that in my opinion an agent is much broader than just what this is and also in general

00:07:35.760 | you know broader literature an agent is not just this either so I went back and I just went through

00:07:43.600 | a few papers to try and figure out okay what what is a actual good definition of an agent that kind

00:07:49.440 | of makes sense in the way that I also understand agents the way that I've been building like

00:07:55.040 | agentic more workflows to be honest right but to me workflow or agent it's kind of the same thing

00:08:03.840 | it's agentic workflow i.e. agent so anyway I went back and the the paper that I think had the nicest

00:08:15.280 | definition that tied back to really like original like ai

00:08:21.600 | not maybe well philosophy like the original ai philosophy or the original ai research

00:08:30.560 | maybe not original but pretty close to original and I think maybe original um was this right so

00:08:36.640 | the it was it was a miracle paper right which is another basically agent lm agent uh I think

00:08:42.640 | this came just before the react agent paper uh it's very similar I would say it has a bit less

00:08:50.160 | structured than the react agent but yeah it's super relevant and the way that they described

00:08:57.840 | their system was that it was a neuro symbolic architecture right I really like this definition

00:09:03.840 | because a so neuro symbolic architecture it's two things right you have the neural part you have the

00:09:11.920 | symbolic part and I actually I have another kind of starting on this article but it's uh yeah there's

00:09:20.000 | it's mostly notes at the moment so the neural part of this in fact let's start with the symbolic part

00:09:28.480 | the symbolic part is the more traditional ai right so the you know I think this is back in

00:09:36.320 | the 40s 50s 60s mostly and then maybe so actually 70s as well this was actually maybe not 70s this

00:09:46.960 | was the sort of traditional approach to ai and the idea or the you know symbolists that were just

00:09:56.240 | like full-on symbolists felt that true agi would be achieved through written rules ontologies and

00:10:05.280 | these other logical functions so basically a load of handwritten stuff um like smart

00:10:11.840 | like philosophical grammars an example of this is the I think it's syllog syllagostic logic

00:10:21.760 | from aristotle and the so basically an example of this would be a I think so you have this major

00:10:32.480 | premise then you have a minor premise and I haven't done this for a long time so forgive me

00:10:39.840 | if I'm not super accurate but you have a major premise minor premise and conclusion based on

00:10:44.960 | that so the idea is like if you say something like um all all dogs have four legs which is

00:10:56.800 | maybe not actually true but let's just assume that you like all dogs have four legs um by nature okay

00:11:06.080 | in fact let's just remove that bit let's not be too pedantic all dogs have four legs right

00:11:15.360 | that is your major premise then you would say um my friend jacks is a dog okay your conclusion would

00:11:32.400 | be okay um my friend jacks has four legs okay so this is a logical framework developed by aristotle

00:11:46.000 | and the symbolic ai people would you know do things like these these sort of exercises where

00:11:57.360 | they're going through all this and trying to build up some sort of logical methodology to allow you

00:12:04.400 | to kind of construct some deeper um like agi type system where it can just kind of figure everything

00:12:12.320 | out now the that was like one side of ai back then and this is like the the traditional ai

00:12:20.560 | it's also called called like good old-fashioned ai I don't remember who or when that was turned but

00:12:28.880 | gofi I don't know if they actually call it gofi but that's how it's written and yeah I mean that

00:12:37.200 | that was one camp the other camp where the connectionists or this is what we we call them

00:12:42.160 | back then now it's kind of the neural um ai type thing so uh connectionism was in so kind of emerged

00:12:52.080 | back in 1943 that there was this basically a a paper that described a neural circuit uh but really

00:13:01.040 | the where neural or connectionist ai really started with is with this guy Rosenblatt

00:13:11.280 | who introduced this idea of a perceptron and it's actually the perceptron is

00:13:14.960 | in an adapted version of the perceptron that he described is what we use in neural networks today

00:13:23.520 | so it was you know okay now it's a big deal back then they were less useful um but a lot

00:13:32.320 | of people really believed in it and you know they probably uh at least so far they were they were

00:13:38.400 | they were more correct I would say now the connectionist approach is focused on building

00:13:46.960 | ai systems um loosely based on the mechanisms of our brains right so neural network uh perceptron

00:13:57.360 | was just like a kind of silly name we would now we would say things like neurons um within the

00:14:03.120 | neural network they all have these sort of names right where it's you can tell they're kind of

00:14:08.800 | coming from the idea of a brain um I don't have a example here but you can see okay if we have a

00:14:14.640 | look at in google this is the perceptron right and then if you look at a sort of a neuro neuron

00:14:22.960 | diagram if that's a thing activation

00:14:27.680 | uh you know kind of there's something right so here you have a like an actual neuron diagram

00:14:41.360 | and you can see there's a lot of similarity I think this one that they're probably comparing

00:14:46.640 | it to an actual um neuron in the sort of ai sense actually here is a perfect example let me

00:14:54.240 | make this bigger no right so this is a good example on the left you have all these inputs

00:15:01.680 | basically for your neuron in your brain goes through some kind of calculation which in this

00:15:08.160 | case is the axon and then you have all these outputs okay and this is actually many outputs

00:15:14.480 | but you can think of them as kind of similar in some way because all of these

00:15:18.880 | axons here to be fair I think they have different degrees of activation um but when you

00:15:25.760 | when you get your output here you just have sort of one output so I suppose in some degree

00:15:35.280 | this would be different um but just when you look at a single um neural network neuron obviously

00:15:43.520 | we put many of these in a in many layers and then at that point you you have many sort of axons

00:15:50.000 | although they're all each one is just coming from a single output here but anyways there is definitely

00:15:56.640 | a lot of similarity here so yeah anyway that is kind you know one fundamental building blocks of

00:16:08.320 | neural ai and yeah for neural ai to work in a lot of compute parallel processing all this sort of

00:16:16.720 | stuff and because of that it didn't really kick off and there was a few like ai what we call ai

00:16:22.560 | winters where people were just less interested in ai in general but particularly the neural or

00:16:29.360 | connectionist ai um and yeah I mean that kind of carried on into into the future until we got

00:16:37.040 | towards like 2011 2012 where you had image net and and the what was it called the alex net model

00:16:48.400 | and they sort of kicked off interest in neural or connectionist ai again and at that point it's just

00:16:56.560 | like neural networks like everyone's like wow neural networks are amazing and we still think

00:17:00.320 | that that's what transformers and lms and you know their core building block is well they are a type

00:17:06.560 | of neural network um just more kind of big and uh complicated anyway so that kicked off because we

00:17:17.440 | had loads of data and compute and everything and and yeah led to where we are now right so that's

00:17:23.200 | what the neural part is and that's also what the symbolic part is here right so uh so okay what do

00:17:29.840 | we have here we have both we're mixing the like old traditional ai with uh neural ai well kind of

00:17:39.120 | to some degree that they are almost kind of mixed together already with neural networks because

00:17:46.560 | neural networks the way that they work they almost learn symbols like they learn representate

00:17:53.360 | logical representations of different concepts which is what the symbol part is in some symbolic

00:17:59.920 | they learn these right but they're just not handwritten okay so it you know neural network

00:18:06.080 | kind of learns what are what strawberry is or what dog is uh but anyway it's kind of beside the point

00:18:13.760 | um we can just assume okay maybe maybe neural networks are sub symbolic but for now let's just

00:18:19.280 | assume they're purely symbolic that's fine so neural networks make up the neural part of this

00:18:25.200 | so basically llms then we have this symbolic part the symbolic part as i mentioned before it's

00:18:32.080 | handwritten stuff right so like code so if you if you write some like some code that can be run by a

00:18:40.720 | or triggered by an llm or some some other type of neural network uh that you you have some sort of

00:18:47.680 | neuros near neuro symbolic architecture you have a mix of both so that's that is what they that is

00:18:57.040 | right and when they developed the uh miracle system here they were using i think it's like gpt2 no

00:19:03.680 | maybe gpt3 um but like the first version of it which was not that great um and then they were

00:19:11.360 | testing with so this was actually their sort of agent system but i think they built at least part

00:19:18.640 | of this on top of an i'm not sure if it was open source model i don't remember the name of it to

00:19:26.240 | be honest for the life of me but anyway it doesn't matter um so so they basically built this agentic

00:19:32.640 | type thing by mixing neural networks with runnable code um yeah and then and then you actually see

00:19:38.800 | some of the things that they're talking about here are you know kind of things that we try and

00:19:42.960 | solve with rag in many cases um lack of up-to-date knowledge like proprietary knowledge all these

00:19:49.920 | sort of things um which is kind of interesting i think but anyway so my my definition of agents

00:19:58.480 | kind of goes along those lines it's neural neural plus symbolic and the reason i like it is one we

00:20:06.800 | have that sort of um that that definition is anchored in you know the ai for the past almost

00:20:15.840 | 100 years maybe 80 years roughly um which is great i think it's good that we have some like really

00:20:25.520 | very solid foundations behind that definition you're a symbolic and two one of the reasons i

00:20:33.840 | like it is because when i'm building these systems okay lms are great but i i don't just use lms

00:20:42.400 | um a lot of time there is very good reason to bring in other neural network based models

00:20:50.000 | so by broadening that neural definition to neural network um you you you don't restrict yourself to

00:21:00.800 | just saying lm right because okay use lms like amazing and of course i use them a lot but not

00:21:08.080 | just lms right so the idea behind um if i go to semantic router i don't know if you've used it

00:21:18.400 | not a big deal if you haven't but the idea behind semantic router let me find a an image instead

00:21:30.480 | actually maybe i have an introduction here

00:21:32.160 | wow we don't have an introduction okay okay so this is a better example um or or easy to explain

00:21:43.040 | example so semantic router uses embedding models which are on neural network base and what they do

00:21:51.680 | is you you have some text i have a better image somewhere let me find it okay so this is the other

00:21:59.680 | example so we have an embedding model this thing in the middle here and what we do is we provide

00:22:06.160 | some example inputs all right so it's like political route it's just this is more for

00:22:10.720 | like a guardrail right so this would be a this would be an actual guardrail here

00:22:14.640 | as you know like protection basically and okay that's fine whatever that that is just one example

00:22:22.800 | then we have the ask lm route and this is a better example so slm route all right so i'm saying okay

00:22:29.280 | what is the llama 2 model i'm three now i wrote this a long time ago tell me about metas new lm

00:22:35.280 | what are the differences between falcon and llama right all of those are obviously things i want to

00:22:42.880 | trigger a search right so what i can do is i can i can identify this with the embedding model so i

00:22:49.120 | can say okay anything that gets caught in this little area here anything caught in that area

00:22:56.320 | there and that is probably the user asking for us to do a search essentially so then what we can do

00:23:03.280 | what what as many things we can do but one thing we can just do is say okay that that's the user's

00:23:08.720 | query just just send it straight across to some right pipeline right so don't even don't even ask

00:23:14.080 | the lm don't ask an lm to rephrase it or you know make a decision to use the right pipeline just use

00:23:20.720 | the right pipeline directly and it's way faster than going through an lm and i would say probably

00:23:27.680 | much more controllable however lms provide a lot of flexibility so that's not what i would usually

00:23:34.720 | do instead what i usually do is there's there's still an lm right so let's say over here right i

00:23:42.000 | have an lm and what i do is okay we have our query that we got from the from the user what i'm going

00:23:53.520 | to do is i'm just going to i'm going to modify a little bit all right so i'm going to come over

00:23:58.160 | here and and this is just like a kind of lazy way but it works well and it's it it leaves the

00:24:04.480 | flexibility of use down to the lm which i i like so i say okay original query right from the user

00:24:11.840 | so whatever that was so we have the query and then and then i append something extra so i say

00:24:18.560 | like system note is is something i've used fairly often before and i'll just say use the rag tool

00:24:30.320 | okay i don't like this new maybe that is a nice fun that is fine whatever i'm using this one now

00:24:37.680 | right so i modified the query that gets sent to the lm and in the lm you're basically a kind of

00:24:44.480 | heavily suggesting to the lm what it should do and that works actually very well so this sort of

00:24:50.720 | system you know the the agent is not just the lm it's it's also the embedding here right and

00:24:56.640 | especially if you're not even you know not even including an lm here right there's like to me

00:25:04.960 | this system without an lm is pretty agentic to me right that seems to me to be an agent

00:25:14.320 | and then even more so when you add the lm and decision making in there so

00:25:19.680 | yeah i i prefer to think of agents as this type of system right or or not just this type of system

00:25:30.800 | but at least more flexibly because i think that if you think of agents just as an lm that can

00:25:38.880 | call tools you're massively limiting you know you're sort of you're boxing yourself into this

00:25:45.520 | one thing that an agent may be whereas i think that's kind of a stupid thing to do and even

00:25:51.280 | if you take the example of okay let's say we have multiple like tool sets or we have okay once we've

00:26:00.080 | decided on one tool okay let's start here right let's say we make a decision our lm is making

00:26:08.000 | this decision that's fine no problem right but it goes down these two different paths

00:26:14.560 | all right so it says tool a or tool b and let's say if we have used tool a for you know whatever

00:26:20.400 | that is maybe it's reading about the news whereas tool b is always someone's asking a math question

00:26:28.720 | so you know that it's like a calculator or it's actually maybe it's searching for some explanation

00:26:33.760 | from like a like a math website right where's this to go into a news website right two different use

00:26:39.440 | cases and and what you might find that with these two different use cases is that the follow-on

00:26:45.520 | tools if any right maybe there aren't any who knows right but maybe there are and the follow-on

00:26:52.480 | tools for these would be different right what i mean you've already identified that the intent is

00:26:58.560 | very in you know is two very different things so why would the follow-on tools be similar there's

00:27:04.720 | no reason for them to be so in in this case right so you so you may still have an lm in the middle

00:27:11.600 | maybe maybe sometimes you own um but you would then you know follow this slightly different path

00:27:19.680 | right and if you're if you're thinking of agents as just oh it's a lm plus some tool calls in a

00:27:27.280 | loop all right you you already this is fairly simple and you can't do it so my uh yeah that's

00:27:34.400 | what i'm thinking about with agents right that's how i would approach them which is slightly

00:27:40.480 | different to i think what like what the the standard sort of narrative is for most people

00:27:49.840 | on what an agent is right which is so okay it's valid but it's not all that an agent is

00:27:55.760 | so i'm gonna leave that i don't want to go you know there's a there's a ton of stuff i can talk

00:28:01.840 | through i'll restrict it to this one thing for now i i will cover this with more structure fairly soon

00:28:10.080 | and hopefully we'll ramble a little bit less but at least i think with this you should get an idea

00:28:15.840 | of where i'm coming from and the hopefully sensible to some degree logic behind what i'm thinking here

00:28:26.320 | but anyway that's it so uh thank you for watching i will definitely try and make sure to

00:28:37.520 | release something else very soon but for now i'll leave it there

00:28:41.600 | thank you very much for watching i will see you in the next one bye

00:28:52.720 | you

00:28:53.220 | you

00:28:53.720 | you

00:28:54.220 | (upbeat music)

00:28:56.800 | you

AI Agents as Neuro-Symbolic Systems?

Chapters