back to indexAI Agents as Neuro-Symbolic Systems?
Chapters
0:0 AI Agents
2:4 ReAct Agents
7:28 Redefining Agents
12:48 Origins of Connectionism
17:23 Neuro-symbolic AI
21:9 Agents without LLMs
25:21 Broader Definition of Agents
00:00:00.000 |
Okay so I wanted to put together a sort of overview video of what I'm currently working on 00:00:09.040 |
which is thinking of restructuring the way that I'm thinking about agents and the way that I'm 00:00:16.960 |
also teaching or talking about agents. So this isn't going to be like a fully sort of edited and 00:00:27.680 |
structured video I just want to show you a little bit of what I'm thinking about and explain or 00:00:35.040 |
explain where I'm coming from really. So all in all this is is part of actually a broader thing 00:00:40.640 |
that I am working on which is actually why I haven't been posting on YouTube specifically for 00:00:45.600 |
quite a while now I think it's almost two months which is the longest I think I haven't 00:00:49.760 |
posted in forever and you know it's well okay it's because I'm working on this but it's also 00:00:57.120 |
for other things as well. I had a well I had my first son like a month ago so I've been pretty 00:01:04.640 |
busy there and just working on a lot of things over at Aurelio as well but I wanted to go through 00:01:11.600 |
this well this introduction to AI agents article that I'm working on and it is done but I do want 00:01:19.600 |
to put together a more structured video and some sort of course materials on this. There's even a 00:01:26.240 |
already a code example for this which is just taking a look at the React agents which obviously 00:01:35.920 |
one of the earlier not earlier it's probably I would say like the foundational structure for 00:01:43.280 |
what agents look like today what when I say agents I mean LM based agents and I think 00:01:54.000 |
well that is just the most popular type of agent okay React but now it's more like tools or tool 00:01:59.840 |
agents but they're very similar anyway I'll talk a little bit about them so first thing I do want to 00:02:06.640 |
maybe cover very quickly is the React agent because that's what we're most familiar with 00:02:13.360 |
so I'll come down to here so as a reminder okay React is basically this here 00:02:22.160 |
so we have some input okay there's some text and rather than just asking our LM to answer directly 00:02:30.560 |
we allow our LM to go through multiple reasoning steps and part of those reasoning steps is the 00:02:37.120 |
fact that the agent can also call tools so it can get some external information or do something else 00:02:44.000 |
and that's what I'm visualizing here right so we have our question this is from the React paper 00:02:50.320 |
which I have linked to but the example is okay aside from the Apple remote what other device can 00:02:56.640 |
control the program Apple remote was originally designed to interact with probably to be honest 00:03:01.920 |
most LM's can answer this directly now I think particularly given that this example is from 00:03:08.640 |
the React paper which is like two years ago but anyway it's off topic we're just giving an example 00:03:14.880 |
here right so in this example what we're doing is okay we have these tools that the LM can use 00:03:22.160 |
we provide them and we're also prompting the agent the LM sorry to say okay go through these multiple 00:03:29.040 |
steps of reasoning and action so that is where the React comes from so it's RE and act from action 00:03:36.640 |
and it goes through these steps so it's like okay I have access to the search tool 00:03:42.160 |
um and like an answer tool at the bottom here which not really a tool but it kind of is at the 00:03:47.760 |
same time so it goes through has this prompt and it knows I have to structure things in this React 00:03:55.680 |
methodology so that's what it does it says okay it starts it says okay I need to search Apple 00:04:01.920 |
remote and find a program is useful and then it provides it structures an action based on that 00:04:08.000 |
so it knows it has a search tool and it knows that the input to the search tool is a query which is 00:04:11.680 |
a string okay so we have the Apple remote that function runs using some logic that we've developed 00:04:19.840 |
and the observation from that is this the Apple remote is designed to control the front row media 00:04:27.440 |
center okay so what's the question we have aside from the Apple remote what other device can control 00:04:34.960 |
the program Apple remote was originally designed to interact with so now we know what that original 00:04:41.680 |
uh program was right which is front row and we have that information the LM now knows that 00:04:48.640 |
information so it goes on to a next step and it says okay what do I need to do now I know that 00:04:52.720 |
Apple remote controls the front row program but what other device controls the front row program 00:05:01.200 |
so it says okay based on this um my next reasoning step is I need to search front row and find other 00:05:08.800 |
devices that control it so then it does this search for front row it could also probably do 00:05:15.920 |
something if we're thinking in rag terms here we could be like device to control front row and 00:05:20.800 |
probably a more today LM would do that but that's fine this is just an example so it goes back to 00:05:29.280 |
the search tool again and it says query front row and this isn't like I've shortened this uh for 00:05:36.240 |
sake of brevity I think in the actual example it returned or at least from the paper the actual 00:05:40.960 |
example returns a lot more information uh but this is the part of it that is important all right so 00:05:46.320 |
front row is controlled by an Apple remote or keyboard function keys okay so now we know that 00:05:53.040 |
that gets fed back into the LM so the LM you know now knows you know everything that we've covered 00:05:58.080 |
here knows the original query and I was like okay well I have all the information that I need to 00:06:03.760 |
answer the original query which is side to a map or remote what other device can control front row 00:06:09.680 |
so the next step is the LM is like oh okay I have everything I can now provide the answer 00:06:19.120 |
of keyboard function keys to the user okay and so it doesn't use a search tool and now it instead 00:06:25.520 |
uses the answer tool which has this query this sorry parameter out and the the answer or the 00:06:35.280 |
output for that is keyboard function keys which then gets provided back to the user okay so this 00:06:42.240 |
is the react agent and this sort of structure of like reasoning building a like a query for a tool 00:06:53.520 |
getting a response and then potentially going through another iteration of reasoning and action 00:06:57.760 |
and another and then eventually providing an answer that is really the sort of commonly 00:07:06.400 |
accepted definition of what an agent is that's what most people are using at the moment and I 00:07:12.960 |
think that is great but I think it's very limiting because I I just wouldn't in production I would 00:07:20.880 |
never just put something like this whether it's react or openai tools or whatever else I wouldn't 00:07:26.240 |
just put that in my opinion an agent is much broader than just what this is and also in general 00:07:35.760 |
you know broader literature an agent is not just this either so I went back and I just went through 00:07:43.600 |
a few papers to try and figure out okay what what is a actual good definition of an agent that kind 00:07:49.440 |
of makes sense in the way that I also understand agents the way that I've been building like 00:07:55.040 |
agentic more workflows to be honest right but to me workflow or agent it's kind of the same thing 00:08:03.840 |
it's agentic workflow i.e. agent so anyway I went back and the the paper that I think had the nicest 00:08:15.280 |
definition that tied back to really like original like ai 00:08:21.600 |
not maybe well philosophy like the original ai philosophy or the original ai research 00:08:30.560 |
maybe not original but pretty close to original and I think maybe original um was this right so 00:08:36.640 |
the it was it was a miracle paper right which is another basically agent lm agent uh I think 00:08:42.640 |
this came just before the react agent paper uh it's very similar I would say it has a bit less 00:08:50.160 |
structured than the react agent but yeah it's super relevant and the way that they described 00:08:57.840 |
their system was that it was a neuro symbolic architecture right I really like this definition 00:09:03.840 |
because a so neuro symbolic architecture it's two things right you have the neural part you have the 00:09:11.920 |
symbolic part and I actually I have another kind of starting on this article but it's uh yeah there's 00:09:20.000 |
it's mostly notes at the moment so the neural part of this in fact let's start with the symbolic part 00:09:28.480 |
the symbolic part is the more traditional ai right so the you know I think this is back in 00:09:36.320 |
the 40s 50s 60s mostly and then maybe so actually 70s as well this was actually maybe not 70s this 00:09:46.960 |
was the sort of traditional approach to ai and the idea or the you know symbolists that were just 00:09:56.240 |
like full-on symbolists felt that true agi would be achieved through written rules ontologies and 00:10:05.280 |
these other logical functions so basically a load of handwritten stuff um like smart 00:10:11.840 |
like philosophical grammars an example of this is the I think it's syllog syllagostic logic 00:10:21.760 |
from aristotle and the so basically an example of this would be a I think so you have this major 00:10:32.480 |
premise then you have a minor premise and I haven't done this for a long time so forgive me 00:10:39.840 |
if I'm not super accurate but you have a major premise minor premise and conclusion based on 00:10:44.960 |
that so the idea is like if you say something like um all all dogs have four legs which is 00:10:56.800 |
maybe not actually true but let's just assume that you like all dogs have four legs um by nature okay 00:11:06.080 |
in fact let's just remove that bit let's not be too pedantic all dogs have four legs right 00:11:15.360 |
that is your major premise then you would say um my friend jacks is a dog okay your conclusion would 00:11:32.400 |
be okay um my friend jacks has four legs okay so this is a logical framework developed by aristotle 00:11:46.000 |
and the symbolic ai people would you know do things like these these sort of exercises where 00:11:57.360 |
they're going through all this and trying to build up some sort of logical methodology to allow you 00:12:04.400 |
to kind of construct some deeper um like agi type system where it can just kind of figure everything 00:12:12.320 |
out now the that was like one side of ai back then and this is like the the traditional ai 00:12:20.560 |
it's also called called like good old-fashioned ai I don't remember who or when that was turned but 00:12:28.880 |
gofi I don't know if they actually call it gofi but that's how it's written and yeah I mean that 00:12:37.200 |
that was one camp the other camp where the connectionists or this is what we we call them 00:12:42.160 |
back then now it's kind of the neural um ai type thing so uh connectionism was in so kind of emerged 00:12:52.080 |
back in 1943 that there was this basically a a paper that described a neural circuit uh but really 00:13:01.040 |
the where neural or connectionist ai really started with is with this guy Rosenblatt 00:13:11.280 |
who introduced this idea of a perceptron and it's actually the perceptron is 00:13:14.960 |
in an adapted version of the perceptron that he described is what we use in neural networks today 00:13:23.520 |
so it was you know okay now it's a big deal back then they were less useful um but a lot 00:13:32.320 |
of people really believed in it and you know they probably uh at least so far they were they were 00:13:38.400 |
they were more correct I would say now the connectionist approach is focused on building 00:13:46.960 |
ai systems um loosely based on the mechanisms of our brains right so neural network uh perceptron 00:13:57.360 |
was just like a kind of silly name we would now we would say things like neurons um within the 00:14:03.120 |
neural network they all have these sort of names right where it's you can tell they're kind of 00:14:08.800 |
coming from the idea of a brain um I don't have a example here but you can see okay if we have a 00:14:14.640 |
look at in google this is the perceptron right and then if you look at a sort of a neuro neuron 00:14:27.680 |
uh you know kind of there's something right so here you have a like an actual neuron diagram 00:14:41.360 |
and you can see there's a lot of similarity I think this one that they're probably comparing 00:14:46.640 |
it to an actual um neuron in the sort of ai sense actually here is a perfect example let me 00:14:54.240 |
make this bigger no right so this is a good example on the left you have all these inputs 00:15:01.680 |
basically for your neuron in your brain goes through some kind of calculation which in this 00:15:08.160 |
case is the axon and then you have all these outputs okay and this is actually many outputs 00:15:14.480 |
but you can think of them as kind of similar in some way because all of these 00:15:18.880 |
axons here to be fair I think they have different degrees of activation um but when you 00:15:25.760 |
when you get your output here you just have sort of one output so I suppose in some degree 00:15:35.280 |
this would be different um but just when you look at a single um neural network neuron obviously 00:15:43.520 |
we put many of these in a in many layers and then at that point you you have many sort of axons 00:15:50.000 |
although they're all each one is just coming from a single output here but anyways there is definitely 00:15:56.640 |
a lot of similarity here so yeah anyway that is kind you know one fundamental building blocks of 00:16:08.320 |
neural ai and yeah for neural ai to work in a lot of compute parallel processing all this sort of 00:16:16.720 |
stuff and because of that it didn't really kick off and there was a few like ai what we call ai 00:16:22.560 |
winters where people were just less interested in ai in general but particularly the neural or 00:16:29.360 |
connectionist ai um and yeah I mean that kind of carried on into into the future until we got 00:16:37.040 |
towards like 2011 2012 where you had image net and and the what was it called the alex net model 00:16:48.400 |
and they sort of kicked off interest in neural or connectionist ai again and at that point it's just 00:16:56.560 |
like neural networks like everyone's like wow neural networks are amazing and we still think 00:17:00.320 |
that that's what transformers and lms and you know their core building block is well they are a type 00:17:06.560 |
of neural network um just more kind of big and uh complicated anyway so that kicked off because we 00:17:17.440 |
had loads of data and compute and everything and and yeah led to where we are now right so that's 00:17:23.200 |
what the neural part is and that's also what the symbolic part is here right so uh so okay what do 00:17:29.840 |
we have here we have both we're mixing the like old traditional ai with uh neural ai well kind of 00:17:39.120 |
to some degree that they are almost kind of mixed together already with neural networks because 00:17:46.560 |
neural networks the way that they work they almost learn symbols like they learn representate 00:17:53.360 |
logical representations of different concepts which is what the symbol part is in some symbolic 00:17:59.920 |
they learn these right but they're just not handwritten okay so it you know neural network 00:18:06.080 |
kind of learns what are what strawberry is or what dog is uh but anyway it's kind of beside the point 00:18:13.760 |
um we can just assume okay maybe maybe neural networks are sub symbolic but for now let's just 00:18:19.280 |
assume they're purely symbolic that's fine so neural networks make up the neural part of this 00:18:25.200 |
so basically llms then we have this symbolic part the symbolic part as i mentioned before it's 00:18:32.080 |
handwritten stuff right so like code so if you if you write some like some code that can be run by a 00:18:40.720 |
or triggered by an llm or some some other type of neural network uh that you you have some sort of 00:18:47.680 |
neuros near neuro symbolic architecture you have a mix of both so that's that is what they that is 00:18:57.040 |
right and when they developed the uh miracle system here they were using i think it's like gpt2 no 00:19:03.680 |
maybe gpt3 um but like the first version of it which was not that great um and then they were 00:19:11.360 |
testing with so this was actually their sort of agent system but i think they built at least part 00:19:18.640 |
of this on top of an i'm not sure if it was open source model i don't remember the name of it to 00:19:26.240 |
be honest for the life of me but anyway it doesn't matter um so so they basically built this agentic 00:19:32.640 |
type thing by mixing neural networks with runnable code um yeah and then and then you actually see 00:19:38.800 |
some of the things that they're talking about here are you know kind of things that we try and 00:19:42.960 |
solve with rag in many cases um lack of up-to-date knowledge like proprietary knowledge all these 00:19:49.920 |
sort of things um which is kind of interesting i think but anyway so my my definition of agents 00:19:58.480 |
kind of goes along those lines it's neural neural plus symbolic and the reason i like it is one we 00:20:06.800 |
have that sort of um that that definition is anchored in you know the ai for the past almost 00:20:15.840 |
100 years maybe 80 years roughly um which is great i think it's good that we have some like really 00:20:25.520 |
very solid foundations behind that definition you're a symbolic and two one of the reasons i 00:20:33.840 |
like it is because when i'm building these systems okay lms are great but i i don't just use lms 00:20:42.400 |
um a lot of time there is very good reason to bring in other neural network based models 00:20:50.000 |
so by broadening that neural definition to neural network um you you you don't restrict yourself to 00:21:00.800 |
just saying lm right because okay use lms like amazing and of course i use them a lot but not 00:21:08.080 |
just lms right so the idea behind um if i go to semantic router i don't know if you've used it 00:21:18.400 |
not a big deal if you haven't but the idea behind semantic router let me find a an image instead 00:21:32.160 |
wow we don't have an introduction okay okay so this is a better example um or or easy to explain 00:21:43.040 |
example so semantic router uses embedding models which are on neural network base and what they do 00:21:51.680 |
is you you have some text i have a better image somewhere let me find it okay so this is the other 00:21:59.680 |
example so we have an embedding model this thing in the middle here and what we do is we provide 00:22:06.160 |
some example inputs all right so it's like political route it's just this is more for 00:22:10.720 |
like a guardrail right so this would be a this would be an actual guardrail here 00:22:14.640 |
as you know like protection basically and okay that's fine whatever that that is just one example 00:22:22.800 |
then we have the ask lm route and this is a better example so slm route all right so i'm saying okay 00:22:29.280 |
what is the llama 2 model i'm three now i wrote this a long time ago tell me about metas new lm 00:22:35.280 |
what are the differences between falcon and llama right all of those are obviously things i want to 00:22:42.880 |
trigger a search right so what i can do is i can i can identify this with the embedding model so i 00:22:49.120 |
can say okay anything that gets caught in this little area here anything caught in that area 00:22:56.320 |
there and that is probably the user asking for us to do a search essentially so then what we can do 00:23:03.280 |
what what as many things we can do but one thing we can just do is say okay that that's the user's 00:23:08.720 |
query just just send it straight across to some right pipeline right so don't even don't even ask 00:23:14.080 |
the lm don't ask an lm to rephrase it or you know make a decision to use the right pipeline just use 00:23:20.720 |
the right pipeline directly and it's way faster than going through an lm and i would say probably 00:23:27.680 |
much more controllable however lms provide a lot of flexibility so that's not what i would usually 00:23:34.720 |
do instead what i usually do is there's there's still an lm right so let's say over here right i 00:23:42.000 |
have an lm and what i do is okay we have our query that we got from the from the user what i'm going 00:23:53.520 |
to do is i'm just going to i'm going to modify a little bit all right so i'm going to come over 00:23:58.160 |
here and and this is just like a kind of lazy way but it works well and it's it it leaves the 00:24:04.480 |
flexibility of use down to the lm which i i like so i say okay original query right from the user 00:24:11.840 |
so whatever that was so we have the query and then and then i append something extra so i say 00:24:18.560 |
like system note is is something i've used fairly often before and i'll just say use the rag tool 00:24:30.320 |
okay i don't like this new maybe that is a nice fun that is fine whatever i'm using this one now 00:24:37.680 |
right so i modified the query that gets sent to the lm and in the lm you're basically a kind of 00:24:44.480 |
heavily suggesting to the lm what it should do and that works actually very well so this sort of 00:24:50.720 |
system you know the the agent is not just the lm it's it's also the embedding here right and 00:24:56.640 |
especially if you're not even you know not even including an lm here right there's like to me 00:25:04.960 |
this system without an lm is pretty agentic to me right that seems to me to be an agent 00:25:14.320 |
and then even more so when you add the lm and decision making in there so 00:25:19.680 |
yeah i i prefer to think of agents as this type of system right or or not just this type of system 00:25:30.800 |
but at least more flexibly because i think that if you think of agents just as an lm that can 00:25:38.880 |
call tools you're massively limiting you know you're sort of you're boxing yourself into this 00:25:45.520 |
one thing that an agent may be whereas i think that's kind of a stupid thing to do and even 00:25:51.280 |
if you take the example of okay let's say we have multiple like tool sets or we have okay once we've 00:26:00.080 |
decided on one tool okay let's start here right let's say we make a decision our lm is making 00:26:08.000 |
this decision that's fine no problem right but it goes down these two different paths 00:26:14.560 |
all right so it says tool a or tool b and let's say if we have used tool a for you know whatever 00:26:20.400 |
that is maybe it's reading about the news whereas tool b is always someone's asking a math question 00:26:28.720 |
so you know that it's like a calculator or it's actually maybe it's searching for some explanation 00:26:33.760 |
from like a like a math website right where's this to go into a news website right two different use 00:26:39.440 |
cases and and what you might find that with these two different use cases is that the follow-on 00:26:45.520 |
tools if any right maybe there aren't any who knows right but maybe there are and the follow-on 00:26:52.480 |
tools for these would be different right what i mean you've already identified that the intent is 00:26:58.560 |
very in you know is two very different things so why would the follow-on tools be similar there's 00:27:04.720 |
no reason for them to be so in in this case right so you so you may still have an lm in the middle 00:27:11.600 |
maybe maybe sometimes you own um but you would then you know follow this slightly different path 00:27:19.680 |
right and if you're if you're thinking of agents as just oh it's a lm plus some tool calls in a 00:27:27.280 |
loop all right you you already this is fairly simple and you can't do it so my uh yeah that's 00:27:34.400 |
what i'm thinking about with agents right that's how i would approach them which is slightly 00:27:40.480 |
different to i think what like what the the standard sort of narrative is for most people 00:27:49.840 |
on what an agent is right which is so okay it's valid but it's not all that an agent is 00:27:55.760 |
so i'm gonna leave that i don't want to go you know there's a there's a ton of stuff i can talk 00:28:01.840 |
through i'll restrict it to this one thing for now i i will cover this with more structure fairly soon 00:28:10.080 |
and hopefully we'll ramble a little bit less but at least i think with this you should get an idea 00:28:15.840 |
of where i'm coming from and the hopefully sensible to some degree logic behind what i'm thinking here 00:28:26.320 |
but anyway that's it so uh thank you for watching i will definitely try and make sure to 00:28:37.520 |
release something else very soon but for now i'll leave it there 00:28:41.600 |
thank you very much for watching i will see you in the next one bye