Okay so I wanted to put together a sort of overview video of what I'm currently working on which is thinking of restructuring the way that I'm thinking about agents and the way that I'm also teaching or talking about agents. So this isn't going to be like a fully sort of edited and structured video I just want to show you a little bit of what I'm thinking about and explain or explain where I'm coming from really.
So all in all this is is part of actually a broader thing that I am working on which is actually why I haven't been posting on YouTube specifically for quite a while now I think it's almost two months which is the longest I think I haven't posted in forever and you know it's well okay it's because I'm working on this but it's also for other things as well.
I had a well I had my first son like a month ago so I've been pretty busy there and just working on a lot of things over at Aurelio as well but I wanted to go through this well this introduction to AI agents article that I'm working on and it is done but I do want to put together a more structured video and some sort of course materials on this.
There's even a already a code example for this which is just taking a look at the React agents which obviously one of the earlier not earlier it's probably I would say like the foundational structure for what agents look like today what when I say agents I mean LM based agents and I think well that is just the most popular type of agent okay React but now it's more like tools or tool agents but they're very similar anyway I'll talk a little bit about them so first thing I do want to maybe cover very quickly is the React agent because that's what we're most familiar with so I'll come down to here so as a reminder okay React is basically this here so we have some input okay there's some text and rather than just asking our LM to answer directly we allow our LM to go through multiple reasoning steps and part of those reasoning steps is the fact that the agent can also call tools so it can get some external information or do something else and that's what I'm visualizing here right so we have our question this is from the React paper which I have linked to but the example is okay aside from the Apple remote what other device can control the program Apple remote was originally designed to interact with probably to be honest most LM's can answer this directly now I think particularly given that this example is from the React paper which is like two years ago but anyway it's off topic we're just giving an example here right so in this example what we're doing is okay we have these tools that the LM can use we provide them and we're also prompting the agent the LM sorry to say okay go through these multiple steps of reasoning and action so that is where the React comes from so it's RE and act from action and it goes through these steps so it's like okay I have access to the search tool um and like an answer tool at the bottom here which not really a tool but it kind of is at the same time so it goes through has this prompt and it knows I have to structure things in this React methodology so that's what it does it says okay it starts it says okay I need to search Apple remote and find a program is useful and then it provides it structures an action based on that so it knows it has a search tool and it knows that the input to the search tool is a query which is a string okay so we have the Apple remote that function runs using some logic that we've developed and the observation from that is this the Apple remote is designed to control the front row media center okay so what's the question we have aside from the Apple remote what other device can control the program Apple remote was originally designed to interact with so now we know what that original uh program was right which is front row and we have that information the LM now knows that information so it goes on to a next step and it says okay what do I need to do now I know that Apple remote controls the front row program but what other device controls the front row program so it says okay based on this um my next reasoning step is I need to search front row and find other devices that control it so then it does this search for front row it could also probably do something if we're thinking in rag terms here we could be like device to control front row and probably a more today LM would do that but that's fine this is just an example so it goes back to the search tool again and it says query front row and this isn't like I've shortened this uh for sake of brevity I think in the actual example it returned or at least from the paper the actual example returns a lot more information uh but this is the part of it that is important all right so front row is controlled by an Apple remote or keyboard function keys okay so now we know that that gets fed back into the LM so the LM you know now knows you know everything that we've covered here knows the original query and I was like okay well I have all the information that I need to answer the original query which is side to a map or remote what other device can control front row so the next step is the LM is like oh okay I have everything I can now provide the answer of keyboard function keys to the user okay and so it doesn't use a search tool and now it instead uses the answer tool which has this query this sorry parameter out and the the answer or the output for that is keyboard function keys which then gets provided back to the user okay so this is the react agent and this sort of structure of like reasoning building a like a query for a tool getting a response and then potentially going through another iteration of reasoning and action and another and then eventually providing an answer that is really the sort of commonly accepted definition of what an agent is that's what most people are using at the moment and I think that is great but I think it's very limiting because I I just wouldn't in production I would never just put something like this whether it's react or openai tools or whatever else I wouldn't just put that in my opinion an agent is much broader than just what this is and also in general you know broader literature an agent is not just this either so I went back and I just went through a few papers to try and figure out okay what what is a actual good definition of an agent that kind of makes sense in the way that I also understand agents the way that I've been building like agentic more workflows to be honest right but to me workflow or agent it's kind of the same thing it's agentic workflow i.e.
agent so anyway I went back and the the paper that I think had the nicest definition that tied back to really like original like ai not maybe well philosophy like the original ai philosophy or the original ai research maybe not original but pretty close to original and I think maybe original um was this right so the it was it was a miracle paper right which is another basically agent lm agent uh I think this came just before the react agent paper uh it's very similar I would say it has a bit less structured than the react agent but yeah it's super relevant and the way that they described their system was that it was a neuro symbolic architecture right I really like this definition because a so neuro symbolic architecture it's two things right you have the neural part you have the symbolic part and I actually I have another kind of starting on this article but it's uh yeah there's it's mostly notes at the moment so the neural part of this in fact let's start with the symbolic part the symbolic part is the more traditional ai right so the you know I think this is back in the 40s 50s 60s mostly and then maybe so actually 70s as well this was actually maybe not 70s this was the sort of traditional approach to ai and the idea or the you know symbolists that were just like full-on symbolists felt that true agi would be achieved through written rules ontologies and these other logical functions so basically a load of handwritten stuff um like smart like philosophical grammars an example of this is the I think it's syllog syllagostic logic from aristotle and the so basically an example of this would be a I think so you have this major premise then you have a minor premise and I haven't done this for a long time so forgive me if I'm not super accurate but you have a major premise minor premise and conclusion based on that so the idea is like if you say something like um all all dogs have four legs which is maybe not actually true but let's just assume that you like all dogs have four legs um by nature okay in fact let's just remove that bit let's not be too pedantic all dogs have four legs right that is your major premise then you would say um my friend jacks is a dog okay your conclusion would be okay um my friend jacks has four legs okay so this is a logical framework developed by aristotle and the symbolic ai people would you know do things like these these sort of exercises where they're going through all this and trying to build up some sort of logical methodology to allow you to kind of construct some deeper um like agi type system where it can just kind of figure everything out now the that was like one side of ai back then and this is like the the traditional ai it's also called called like good old-fashioned ai I don't remember who or when that was turned but gofi I don't know if they actually call it gofi but that's how it's written and yeah I mean that that was one camp the other camp where the connectionists or this is what we we call them back then now it's kind of the neural um ai type thing so uh connectionism was in so kind of emerged back in 1943 that there was this basically a a paper that described a neural circuit uh but really the where neural or connectionist ai really started with is with this guy Rosenblatt who introduced this idea of a perceptron and it's actually the perceptron is in an adapted version of the perceptron that he described is what we use in neural networks today so it was you know okay now it's a big deal back then they were less useful um but a lot of people really believed in it and you know they probably uh at least so far they were they were they were more correct I would say now the connectionist approach is focused on building ai systems um loosely based on the mechanisms of our brains right so neural network uh perceptron was just like a kind of silly name we would now we would say things like neurons um within the neural network they all have these sort of names right where it's you can tell they're kind of coming from the idea of a brain um I don't have a example here but you can see okay if we have a look at in google this is the perceptron right and then if you look at a sort of a neuro neuron diagram if that's a thing activation uh you know kind of there's something right so here you have a like an actual neuron diagram and you can see there's a lot of similarity I think this one that they're probably comparing it to an actual um neuron in the sort of ai sense actually here is a perfect example let me make this bigger no right so this is a good example on the left you have all these inputs basically for your neuron in your brain goes through some kind of calculation which in this case is the axon and then you have all these outputs okay and this is actually many outputs but you can think of them as kind of similar in some way because all of these axons here to be fair I think they have different degrees of activation um but when you when you get your output here you just have sort of one output so I suppose in some degree this would be different um but just when you look at a single um neural network neuron obviously we put many of these in a in many layers and then at that point you you have many sort of axons although they're all each one is just coming from a single output here but anyways there is definitely a lot of similarity here so yeah anyway that is kind you know one fundamental building blocks of neural ai and yeah for neural ai to work in a lot of compute parallel processing all this sort of stuff and because of that it didn't really kick off and there was a few like ai what we call ai winters where people were just less interested in ai in general but particularly the neural or connectionist ai um and yeah I mean that kind of carried on into into the future until we got towards like 2011 2012 where you had image net and and the what was it called the alex net model and they sort of kicked off interest in neural or connectionist ai again and at that point it's just like neural networks like everyone's like wow neural networks are amazing and we still think that that's what transformers and lms and you know their core building block is well they are a type of neural network um just more kind of big and uh complicated anyway so that kicked off because we had loads of data and compute and everything and and yeah led to where we are now right so that's what the neural part is and that's also what the symbolic part is here right so uh so okay what do we have here we have both we're mixing the like old traditional ai with uh neural ai well kind of to some degree that they are almost kind of mixed together already with neural networks because neural networks the way that they work they almost learn symbols like they learn representate logical representations of different concepts which is what the symbol part is in some symbolic they learn these right but they're just not handwritten okay so it you know neural network kind of learns what are what strawberry is or what dog is uh but anyway it's kind of beside the point um we can just assume okay maybe maybe neural networks are sub symbolic but for now let's just assume they're purely symbolic that's fine so neural networks make up the neural part of this so basically llms then we have this symbolic part the symbolic part as i mentioned before it's handwritten stuff right so like code so if you if you write some like some code that can be run by a or triggered by an llm or some some other type of neural network uh that you you have some sort of neuros near neuro symbolic architecture you have a mix of both so that's that is what they that is right and when they developed the uh miracle system here they were using i think it's like gpt2 no maybe gpt3 um but like the first version of it which was not that great um and then they were testing with so this was actually their sort of agent system but i think they built at least part of this on top of an i'm not sure if it was open source model i don't remember the name of it to be honest for the life of me but anyway it doesn't matter um so so they basically built this agentic type thing by mixing neural networks with runnable code um yeah and then and then you actually see some of the things that they're talking about here are you know kind of things that we try and solve with rag in many cases um lack of up-to-date knowledge like proprietary knowledge all these sort of things um which is kind of interesting i think but anyway so my my definition of agents kind of goes along those lines it's neural neural plus symbolic and the reason i like it is one we have that sort of um that that definition is anchored in you know the ai for the past almost 100 years maybe 80 years roughly um which is great i think it's good that we have some like really very solid foundations behind that definition you're a symbolic and two one of the reasons i like it is because when i'm building these systems okay lms are great but i i don't just use lms um a lot of time there is very good reason to bring in other neural network based models so by broadening that neural definition to neural network um you you you don't restrict yourself to just saying lm right because okay use lms like amazing and of course i use them a lot but not just lms right so the idea behind um if i go to semantic router i don't know if you've used it not a big deal if you haven't but the idea behind semantic router let me find a an image instead actually maybe i have an introduction here wow we don't have an introduction okay okay so this is a better example um or or easy to explain example so semantic router uses embedding models which are on neural network base and what they do is you you have some text i have a better image somewhere let me find it okay so this is the other example so we have an embedding model this thing in the middle here and what we do is we provide some example inputs all right so it's like political route it's just this is more for like a guardrail right so this would be a this would be an actual guardrail here as you know like protection basically and okay that's fine whatever that that is just one example then we have the ask lm route and this is a better example so slm route all right so i'm saying okay what is the llama 2 model i'm three now i wrote this a long time ago tell me about metas new lm what are the differences between falcon and llama right all of those are obviously things i want to trigger a search right so what i can do is i can i can identify this with the embedding model so i can say okay anything that gets caught in this little area here anything caught in that area there and that is probably the user asking for us to do a search essentially so then what we can do what what as many things we can do but one thing we can just do is say okay that that's the user's query just just send it straight across to some right pipeline right so don't even don't even ask the lm don't ask an lm to rephrase it or you know make a decision to use the right pipeline just use the right pipeline directly and it's way faster than going through an lm and i would say probably much more controllable however lms provide a lot of flexibility so that's not what i would usually do instead what i usually do is there's there's still an lm right so let's say over here right i have an lm and what i do is okay we have our query that we got from the from the user what i'm going to do is i'm just going to i'm going to modify a little bit all right so i'm going to come over here and and this is just like a kind of lazy way but it works well and it's it it leaves the flexibility of use down to the lm which i i like so i say okay original query right from the user so whatever that was so we have the query and then and then i append something extra so i say like system note is is something i've used fairly often before and i'll just say use the rag tool okay i don't like this new maybe that is a nice fun that is fine whatever i'm using this one now right so i modified the query that gets sent to the lm and in the lm you're basically a kind of heavily suggesting to the lm what it should do and that works actually very well so this sort of system you know the the agent is not just the lm it's it's also the embedding here right and especially if you're not even you know not even including an lm here right there's like to me this system without an lm is pretty agentic to me right that seems to me to be an agent and then even more so when you add the lm and decision making in there so yeah i i prefer to think of agents as this type of system right or or not just this type of system but at least more flexibly because i think that if you think of agents just as an lm that can call tools you're massively limiting you know you're sort of you're boxing yourself into this one thing that an agent may be whereas i think that's kind of a stupid thing to do and even if you take the example of okay let's say we have multiple like tool sets or we have okay once we've decided on one tool okay let's start here right let's say we make a decision our lm is making this decision that's fine no problem right but it goes down these two different paths all right so it says tool a or tool b and let's say if we have used tool a for you know whatever that is maybe it's reading about the news whereas tool b is always someone's asking a math question so you know that it's like a calculator or it's actually maybe it's searching for some explanation from like a like a math website right where's this to go into a news website right two different use cases and and what you might find that with these two different use cases is that the follow-on tools if any right maybe there aren't any who knows right but maybe there are and the follow-on tools for these would be different right what i mean you've already identified that the intent is very in you know is two very different things so why would the follow-on tools be similar there's no reason for them to be so in in this case right so you so you may still have an lm in the middle maybe maybe sometimes you own um but you would then you know follow this slightly different path right and if you're if you're thinking of agents as just oh it's a lm plus some tool calls in a loop all right you you already this is fairly simple and you can't do it so my uh yeah that's what i'm thinking about with agents right that's how i would approach them which is slightly different to i think what like what the the standard sort of narrative is for most people on what an agent is right which is so okay it's valid but it's not all that an agent is so i'm gonna leave that i don't want to go you know there's a there's a ton of stuff i can talk through i'll restrict it to this one thing for now i i will cover this with more structure fairly soon and hopefully we'll ramble a little bit less but at least i think with this you should get an idea of where i'm coming from and the hopefully sensible to some degree logic behind what i'm thinking here but anyway that's it so uh thank you for watching i will definitely try and make sure to release something else very soon but for now i'll leave it there thank you very much for watching i will see you in the next one bye you you you (upbeat music) you