Okay, cool. So let's see here. So let me find -- actually, the paper is not pulled up. So let me load Zotero again. This is actually the code. So let me get rid of that. Okay. All right. Awesome. So the paper that I'm going to go over is this gene PT paper.
So basically, the concept that -- the concept is basically to use the embedding -- GPT embedding. So actually -- so I haven't tried it yet, but presumably you could do this with any embedding. And so what they do conceptually, there's a good diagram here, is that first they just take -- there's like a database of gene to description.
So you can give it a gene. You get back a description of that gene. And so you just put that into the embedding model. They used Ada2, but they -- you could do this with other ones. And in fact, they did try it with some others. And then you get back a vector.
And then first of all, you can do certain clustering and other things just from there. But -- so there's also a technology called single cell sequencing, which means that they -- instead of -- so normally what happens is you have a cluster of -- you have a sample from a person or whatever animal or plant.
You take that sample, and then you like sort of just -- you know, conceptually speaking, you just stick it in a blender, find all the RNA. And then that gives you a -- or DNA, depending on what you're doing. You can -- DNA or RNA, and it gives you like the expression level of the gene or the genes itself so you can get for that whole sample, right?
So there's different types of cells. If it's genes, then all the genes should be the same, unless you're looking at like certain epigenetic factors like chromatin and other things. So what I'm saying is that -- and I'm -- like I don't know a lot about this. I'm just talking about my very high level understanding here.
Or RNA, which is -- so just to brief summary, genes are sort of like a program to make RNA. RNA, there's like a -- there's like a whole machinery in the cell that takes snippets of the DNA, and it uses that to replicate RNA, which then folds into proteins.
And those proteins do various things. Some of them are -- so like proteins are just these long sequences of RNA that like spontaneously fold up into various shapes. And these -- sometimes these like perform functions in the cell like, you know, building structures in the cell, or they can be, you know, sort of -- they can like sort of be -- have functions like the molecules can bind to other molecules and used for signaling, or they can actually bind to the DNA itself and cause it to amplify or suppress the expression of a gene.
So there's like this super nonlinear, highly complex, you know, sequence of things that interact with each other within the cell and then also these proteins can be like sort of end up on the surface of the cell and then other -- when cells bump against it, or, you know, other things bump against it, then they can, you know, you're signaling to those other things.
So like that's -- okay, that's RJ's very, very poorly done genetics 101. So, anyway, you have these cells. There's a technology now that has become common in the last couple years, in which you can take each individual cell and look at the expression level of genes in that individual cell.
So, you know, sort of you can imagine just like kind of like separating out the cells and then, you know, sort of like, you know, sticking each cell individually in a blender and looking at counting the, you know, sort of instance of RNA sequences in genes in that RNA expression of genes in that individual cell.
So that's what -- so what we're looking at here -- so first of all, does anyone want to ask a question? That was really, you know, spontaneously bad explanation, but if anyone wants to understand that better, I'd be happy to do my best. Okay. Okay, good. I'll just -- my quick feedback is A is a lot more intuitive than B.
B looks like gene attention. Yeah, so, okay, so here, let me, let me go over this diagram then. So what happens is you take one of these. So you have a, you have a cell. And then that cell, it's gone through a set of cells, N cells, and they've gone through this sequencing thing so that each -- so you have a, a expression level for each gene for each cell, right, so that's what this matrix is.
So then, and then what you do is you take the embeddings from each of -- from, from GPT or wherever, in this case GPT, and you just weight them by the expression level in that particular cell. Right, so, so like I have all these genes. These are my embeddings for each gene so like each, each row is a gene.
And then I just weight that for each cell by that gene, and then I sum them up. So now I have a -- for each cell I have like a single embedding that corresponds to that cell, which is just the weighted sum of all of the embeddings for each gene weighted by the expression level.
Does that make sense? That's the core concept here so I want to make sure that came across clearly. Yeah. Okay. And then there's a second. There's a second mechanism that they also tried so C is just another way of doing this, where they just even simpler, they just take the, they, they look at the thousand highest expressed genes, and then they just put their name into a, into a string.
And, you know, it starts with like cell one or in reality it's like a slightly longer prompt. And it just says, like, here's the, you know, here's like all the thousand most expressed genes in this cell. And then they stick each one of those into the embedding model and out comes a, you know, sort of n embedding embeddings.
Yeah, n embeddings for all those cells. That cleared the mechanism. So this is like really, really, really simple in the sense that the high level mechanisms very straightforward. Of course there is the really complex transformer underneath that is doing all the textual embedding, which is, but but something that we most well many, many people in the paper club should understand pretty well.
Okay, good. So, so then. So then, like, skipping over the related work. So, they, so basically, without going into too much detail they, I think that I've, you know, been pretty clear about what they do, they have like these 30,000 genes they do a bunch of duplication. And this is because what I mean a big part of this, getting this to work is actually just figuring out how to, like, all the different papers have different names for genes and there's, there's sort of like a whole bunch of databases and they all use slightly different, you know, sort of stuff so there is this one ensemble database which is this right here.
And this ensemble database actually is like has a, the goal of it is to take all the different names for a gene and just give it one ID. And so they use this to sort of unify everything. And then. Okay, so then they have some different applications. They have this, you know, sort of like.
Functionality class prediction, meaning. So like what is the impact of what is the function of this cell. And sort of like gene gene interaction. Sort of like other unsupervised stuff. And then like, like looking at different cell states and so they have data sets that are labeled or partly labeled for all these things so they can just use the embedding to test for, like, you test its accuracy and all these different things.
Okay, so then, and then they have this, the results here. So if you look at, you know, so you have this confusion. Oh, okay yeah this confusion matrix. And you can see it you know like it kind of gets a lot of stuff right it looks like this protein coding one, which I think is one of the main things that you actually care about for RNA expression is, you know, sort of has a lot of errors and then this, or sorry, like it thinks that a lot of things are protein coding, when they're not, and it's processing that I don't know what this process transcript means but like that one goes, gets along wrong a lot.
But, so like these. Oops. So like you can see that like, it seems to beat the competition pretty well this gene former is, is a model that you know that's a little bit older and then this scgpt has gotten a lot of, a lot of attention recently it's a.
So it's like a BERT, so similar to a BERT model, or no sorry this one gene former is a BERT model GPT is like a GPT model, and they, and they, you know, do pretty good at, at, you know, sort of they beat these models. And I've actually used their, I'll show you guys, because we probably will run out of time.
I have, I've gone through the notebooks that the authors provide which were pretty good they're a little old so they were broken but I kind of fix them. So, but, and this was actually the. This was similar to the result I got. So these results do seem pretty valid.
But like so maybe some of these other things are harder protein protein interaction, you know so your ROC curve is like, not so great. So a lot of these other ones are, you know, there be, but in any event they're winning. Right, so like I, this is to me I think this is like the headline is it's just pretty incredible that you have these specialized models that were designed specifically to code for, you know, genetic, the relationship between genetic information, and that, like, instead we just take text that is a description of the gene that, you know, sort of briefly summarizes our knowledge about that gene, and then turn it into embedding and it does dramatic pretty dramatically better than these specialized models.
And, I mean, I think if you think about it's not super surprising because, you know, like the language has so much nuance nuanced information in it that if you use a good model then you might be able to capture all that nuance better. And, but, you know, like, in any event, like it's pretty incredible how much how well they beat them.
And so this was like, so I didn't go through this part, really, in detail but like they basically were trying to prove okay yeah we're actually capturing underlying biology well there, you know, this sort of have these clusters of genes that make sense. I think I think has a question.
Yeah. Okay. Sorry to interrupt. One question. So, could you like, I didn't know you had a chance to look at this but like how, how, how exactly are they going from the vector representation of a gene to a protein protein interaction prediction, how does that exactly how do you go from, from that to predicting an interaction.
So, let me actually, that's a good segue here, I will I think that is this one. So if you look, so let's see protein. So, you asked about protein protein or gene gene. I mean, honestly, I either of them whichever is simpler I guess we can start with. Yeah, so this one is the gene gene interaction.
So they have these. So let's see. Let's see. Where's the meat of this. Yeah, so okay right what they do is they just, they create a embedding of the two. They take the two gene embeddings, and they, and they sum them. So they've now you have like these two genes that you have two genes you sum them together.
And then you use that so let's see what do they do here, they, they fit. In this case logistic regression I, I actually tried also light GBM and it doesn't make a huge difference but it's a lot faster. It does slightly beat the, the logistic regression but it doesn't matter.
So, let's see, they, they sum the two genes together. So here like row of label is just like zero or one, I imagine like for whether it's. Yeah, that's right. That's right. Okay, does that make sense. Yeah, so yeah you're trading in logistic regression against the sum of the two genes.
Yeah, I'm trying to predict like yeah whether they interact or not, whether Yeah, okay. That's right. Yes. Is that clear. Yeah, I think that answers my question. Thank you. Okay, awesome. Yeah, so it's actually, I was surprised that actually brings brings up a good point of how useful embeddings actually are right like I wouldn't have.
I had the same question in my mind is like, so it's like it's kind of obvious how to cluster genes and whatever whatever, but I wouldn't have thought to do what they did to predict the gene gene interaction. So, it seems like there are lots of uses of these embeddings, like there's been a lot, a lot of work, I think previously using much less capable embeddings to do these kinds of analysis so I think this is like sort of standard hat for bioinformatics and then you just throw a much better embedding model and you get much better results.
So, um, yeah. They actually go. Yeah, sorry I think what's really interesting here is that like normally when people have tried to do this in the past they like they try to assume like like this like ground up approach, where they either use the sequence itself or like some gene expression protein expression something, and they try to like sort of predict these interactions using that.
Yeah, again, like very surprising that like, like just using text from the internet, or like text from any like standard description you can get like even like similar, like, and in this case it's like you're saying it's better right like it's very surprising that you can get like better results.
I think, like yeah one thing that might be happening I guess is that it's possible that a lot of like that a lot of those descriptions itself contain information about what the, what other genes interact with the gene. Right. So, yeah, I guess that could be a confounding factor but yeah like across like millions of or I don't know I don't know what their what their set what the size of the data set was but like across even like 10s of thousands of examples, it would.
Yeah, it's really it's, it's a really cool approach, I think. Yeah. So I mean, I think you're exactly right there is like information in those descriptions I've looked at some of them about the, the like the interactions and like so it'll say like this is, you know, express commonly in cancer or whatever.
Right. So it says, there's like so a lot of semantic content in those descriptions. And so that like, it's not surprising. After, like, in retrospect, after they've kind of tried it, it's not surprising to me that it works it's just kind of like it's pretty amazing how, like, how you like that it works so well to me.
And the other thing that this brings up is it. Okay, but what else can we stick in the description right like you could you, I think they have left a lot on the table here on like okay, what if we start to put more complex descriptions that talk about, like more explicitly about gene interactions and like there's, if you go to, like, if you go to the watch my caught the human genome browser and other things there's like all this rich rich information about every, you know, every gene in the genome that we know about.
And so that like what what other information can you add to these descriptions to create like a really nice embedding that has like multiple purposes and so forth. So yeah, um, so let's see. I think they did. So, like, um, they did do some, they like tried, they say llama seven be.
I don't know what they did for the, this is not an embedding model obviously I don't, unless they use some llama seven be embedding. They presumably are just using the class token as the for the embedding, I don't know that it's because it hasn't been trained on contrastive text I'm not, it's not clear to me that that's actually a great comparison.
But, you know, maybe some of that like something from the open MTB whatever that is a massive text embedding leaderboard. Right. Something from there might do better so I was going to try actually try that and see how it did. But interestingly, so while we're here I actually did try.
You know, I so like if you look here at this aorta data one. I actually tried. Where is it here. Yeah, so I tried this also the embedding three small which is supposed to be a replacement for Ada, and, you know, is pretty dramatically cheaper. But if you look at the, the results.
I think, presumably, this was a distillation or something and they lost probably gene and from like scientific literature or gene codings or I don't know what they lost in this but it, this, this model does way worse. So I found that to be quite interesting too and so I definitely think that I'm going to spend some time and go through a bunch of models and just try the same try to do the same metrics on a whole bunch of models and see what we can see about that because they didn't really, I don't think they did a great job in the paper of exploring embedding models are different embedding models.
Not to criticize the paper it's awesome paper but that that thing came short. And then, um, yeah so so then there's like actually this is what I'm reproducing this line here is what I'm reproducing right now. And they have, you know, this like SCGPT model. And then this is the two weighted and sentence versions of the thing and you can see mostly like this either SCGPT or this weighted model wins.
It occasionally looks like for aorta actually, which it like everything does kind of poorly on for phenotype that this sentence wins for some reason, but not by a ton so. Okay, so is there, let me see if there's anything else interesting. There was a few, there's maybe one other table that I wanted to discuss briefly.
Where is it in the bottom here I think. Yeah, here, this table here. So, I found it. Another thing that they tried was this ensemble of the three SCGPT plus the, the WNS model. And, you know, sometimes it looks like the ensemble doesn't hardly ever win, like it wins these two but but or no I guess it wins three out of six so that's fine, but, um, but it looks like, even when it loses it's, it tends to be close to the max right so that's not surprising for an ensemble model.
Like the losing like it last year but so it does seem like maybe it's worth doing an ensemble of different models and that the point that they make was in here is that like, it seems like SCGPT is giving you like a slightly different. You know, it's like complimentary, the result is complimentary here.
So I thought that's, that's basically all I did they have some cool visualizations that are you maps, but that's that's basically all I all I really looked at. Maybe I can stop here I can go over my notebooks too but they're pretty straightforward. So does anyone want to comment ask question something we're like half hour and so plenty of time to talk or we can try, we can go over something else entirely.
Well, we'll just one question on this like, can we, apart from annotating can we like generate. I mean it's embedding so no. But, but I feel that maybe. So I've been talking about this with my buddy that I mentioned the beginning. The, like, I think that there's a lot of opportunity here.
So, or like there's for most papers, you know, the great thing about medical research biomedical research is oftentimes that they publish the data from, especially if it comes from an academic lab you have a paper, and you have a really nice unique data set. Right, so there's I think there's a, and I know that there are models that do this to some extent, where you like train on both the, you know, sort of the gene or sequence level data and the, and the text of the model and then so that maybe like you get a model for which you can put in a cell and get a description back or something like that.
Or the reverse, maybe you put in description and get out like a genome that matches that, or like a gene level expression information or sequence that kind of models what you what you think it should. I think those are like, I think those are something that we hope to work up to.
And I know that there, there's already a lot of work being put into Eric Schmidt thing. I think I suspect, probably those guys are probably doing a lot of that. So, I think our focus is like sort of build up these like basic capabilities and then put it yeah eventually to be able to generate your like either gene level or for for training kinds of things or sequence level for maybe even designing like designing genes for for like specific purposes, according to like whatever the scientific literature says you should do.
So yeah, definitely not from this, I don't think but but I think it's possible. Yeah. Yeah. Yeah. And the other, the other really interesting thing that and this will be the next thing that we try that I should mention is that this is actually be right now it's trained.
This technique actually probably we have to test this but there's like an order of magnitude more data on bulk what's called bulk data where you like I said in the beginning you kind of like you have a sample that's you know reasonably large piece of tissue that has lots and lots of different types of cells, and you get the gene level expression data on those that whole sample all mixed up together.
And so that there's actually an opportunity here to apply exactly the same technique to that and use it for like the classification of that sample like say did like what were the changes either classification or other things where like what was the change in gene expression between when I applied gave this drug, or like diet for diagnosis like is, is this cell like exhibit this or this sample exhibit this gene, or this.
Sorry, exhibit a disease because you know the RNA is being expressed to counteract the disease or whatever. So like, so there's like a really cool opportunity that, which is what we're probably going to start working on. You know, like later this week or something or next week. And so, you know, maybe in the watch my caught the AI in action.
I'll try to, if we have some cool results and I'll, I'll try to, you know, sort of do an AI in action on it. Awesome. Cool. All right, well let me. It seems like it seems like I've kind of run out of steam. Is there, I am so I'm happy to talk more if people have questions but you know I want to give other people an opportunity to go over stuff too.
Yeah, well I guess it will open the floor to like it you know any other papers that people want to explore. I was also had I had in my mind like clearing up the, the voting list that we had on Slido, where like you know obviously some stuff gets stale, and we might want to just refresh it and make a new Slido.
Yeah. Yeah. Like what do people want to cover, or what have people read recently. Open, open up the floor. Did Modern BERT come with the actual paper, or was it just like a blog post? Yes it did. Modern BERT very very full fledged. I was actually literally just adding Modern BERT to the AI engineering reading list like right, right, right before this.
I would love to honestly just get hands on with it like, yeah, it seems like a good paper. What do you like about it. Oh Sebastian says, cooperation among LLM agents. What is this. It's a paper that's suggesting like a benchmark for evaluating like a organization of LLM agents.
I also want to hear about Modern BERT too, but I don't know if anybody's ready to present that. I don't think we're ready. We don't have to do it today, we can do it next week. But yeah, I mean, I guess, I guess, I'd be down for a Modern BERT slash LLM agent cooperation thing next week.
I was kind of, I don't think I was too interested in like going over cultural evolution or cooperation. I just wanted to like have, like if we don't have anything else to cover, I'll just go, I'll just run through it quickly. Sure. Yeah, I mean nothing for today I was going to end it early, but if you if you like I'm still trying to understand this donor game that's on the, on the, on the paper, but if you already have it in your head and yeah, by all means.
Yeah, I can go through it. So, I mean, so, this is related to a paper I've been talking about recently with multi agent steganography, where they're hiding messages, it's a kind of game, but more sort of safety oriented. You got it. Yeah. So, just. Okay. Thanks. Yeah, nice. Okay, I'll share my screen.
I'm not. I also have a Jupyter notebook but I'm not going to open, open that up instead I'm just going to open up the GitHub repo for it. I own GitHub repo I don't think they actually published a GitHub repo, but okay. So, in general, like to set the base of this paper.
They're trying to explore, like, how agents are in the hypothetical scenario that you're building an organization that consists of multiple agents interacting with each other. And they want these agents to pursue, like the collective interest of all the agents put together, instead of a single agent, just looking out for itself and trying to maximize rewards for itself.
And they posit that by, by having, by having the positive that, like the game of the donors game is a representation of, like, collective benefit of an organization. And so what the donors game is is a game that is covered in game theory, which suggests. Just kind of draw it out.
So you have, like, three players, you have a player one, you have player two, you have player three, and there's three, there is. The idea is that if the whole of the three players will benefit from donations between each of the agents. So, one donated three, three donated two, back and forth, and then over the course of, like, at the limit of, like, X amount of donations.
If, if there was a lot of donations between the different agents, one will end up with more, like, sum of money than had none of them donated in the first place. Yeah. And a key, a key principle of this is that it has to be a positive sum game, meaning that there is an increase in, like, the total pool of wealth/resources in the collective, in the organization.
So, what they do in the paper is that whenever one donates to three, let's say one donates a two quantity of something, then that gets multiplied by two when three receives it. So, three ends up receiving four, even though one only lost two. So, this is just a representation of positive sum.
And so, if you just continue cycling this over and over and over again, assuming that everybody's, like, completely, everybody, everybody just continues to donate to each other and nobody just hoards the information and decides to, decides to just keep the money, then you eventually have, like, three with, assuming that ten start, one starts with ten, everybody, assuming that everybody starts with ten, he loses, he loses two, which leads to, which leads to three.
So, he goes from ten, three goes from ten to fourteen, and then three donates also two to the second agent, and then two gets fourteen, two goes from ten to fourteen, and then two donates two back to ten, reciprocating the, reciprocating the original, the original donation, resulting in ten, which went, which went from ten to eight, going to basically twelve, right?
So, that's the idea behind the donor's game. Like, if everybody donates in this system, then every single player is going to result in having a greater sum than if they just didn't donate at all. And so, that is what they're basically trying to illustrate with agents and trying to see, like, which, they explore which models are, perform best in this, in this, in this game.
And they end up finding, I think, that, and here's, here's a design, but just to not get sidetracked, they point out that, okay, here's the, here's the results of the models. Cloud 3.5 Sawnit performs the best, and something that was interesting is that GPT 4.0 didn't perform very well, and as far, as far as what I gathered from the paper is that GPT 4.0 was, like, too, too careful with its donations.
Like, GPT 4.0 was, would refuse to donate, was, like, too skeptical of donating whenever it would, it would be his turn. So, yeah, he just wouldn't donate. Cloud 3.5 Sawnit understood the premise of the game, and so just was more liberal with donations. And, again, something I didn't point out is that if, for example, 3 just decides to keep all the money, then this game completely just falls apart.
1, at the worst case, will just end up losing money, and then 2 won't get anything, and then maybe 2 donates to 1, and so the, the, the ultimate, like, loser is going to be 2. So, that, that's what you got to take into account. So, 1 is that it's a positive sum, 2 that you can actually see the reputation of the players.
So, you can see, for example, if 3 donated to 2, or if 3 decided not to donate to 2, and based off that information, you inform whether or not you should actually carry on donating. So, that is the idea of the paper, and I didn't prepare to present this, but I just want to have kind of your, the crowd's take on whether you guys think this is, like, an accurate, a good benchmark to judge models on for, for basically organization, organization-wide agent clusters.
Well, just, any thoughts on that? Any thoughts on the, the whole thing? Donor's game as a benchmark, basically, for, couple of questions. Donor's game as a benchmark for, like, agent, agent organizations, and 2 is, what is 2? Oh, yeah, agent organizations in general. Yeah, I like it. I want to, I want to let other people respond before I jump in, but anyone, anyone have takes on this?
It's a nice paper. I never, I can't believe I haven't heard of it. Oh, yeah. I would like to see some, like, more models. I think the whole, like, PVP, 333 thing has been tried in crypto. I've seen it, like, play out, and I think more often than not, it ends up with, like, the whole, like, somebody, it's like a game of chicken, right?
Like, whoever pulls out first, keeps the money first, like, wins, essentially. But, but it's super interesting to me, but I would, it would be cool to see other, like, maybe some of the open source models or, like, see how, how it plays out, because I, I do think, as people, like, bring more agents online, or as agents become useful, this would be, like, an important thing to, like, figure out how to, how to incentivize them all to, you know, play nice.
Yeah, you mentioned the game of chicken, and they did kind of address that towards the, they tried to prevent people from gaming, prevent the agents from gaming the system. Somewhere in the paper, they discuss how they actually don't disclose when the round is going to end, because ideally, if you know when the round is going to end, you just wait till that round and just, like, keep everything.
So that's an interesting, interesting, just, policy that they added. Oh, yeah, and here are some, some, here are some of the results. I forgot to cover this, but some of the results of, like, me running, me running this locally on GVD4, and total money per round. This is basically just one of the metrics that they were using to, to evaluate, determine the effectiveness of the paper.
Well, the effective, collaboration. Basically, this is, like, the overarching metric behind it. Like, if, the more, the more the organization donated, the more the, the virtual cycle would take over. And you can kind of see a trend where the generation one, oh, and another thing is that over the course of, so, so the idea is you start with, like, a set of, you start with a set of agents.
And I'm not sure if I'm sharing. Yeah, we can see. Okay, so over the course of, you start with, they started with 12 agents, and what would happen is, like, you would run a donor's game, a donor's game iteration, and then towards the end of the donor's game iteration, you would have the top six, the top half, be able to descend, they're basically going to be the parents for the, for the next generation.
So you're going to give the, the, the original strategies of the top six performing onto the next generation, and so on and so forth. And so that's what these different Gs represent in the legend. And so you can see some, like, upwards trend in, like, accumulated money. Then I did, like, average donation percentage.
Were there, were the, was this, like, an evolutionary algorithm? Meaning, like, you combine, so, so, like, when you, the next generation is some sort of combination of the strategies of the previous generation? Yes, exactly. That's what it is. I see. And you can actually see the prompts. I love what they did with the, the way they illustrated the prompts, it was very clear.
So in the, where's the evolution prompt? Okay, so a system prompt, that one's passed. This one just, like, describes the game. Donation prompt. This just determines how you should donate. And it also includes, like, the trace, the reputation trace of who you're donating to. And this was also something that was interesting, how they did the donation trace.
And then the strategy prompt. Yeah, okay, so this is, this is basically what you're looking at for generation. If the, if it's a first generation, it just generates some strategy. If it is any subsequent generation, it is going to, it is going to just inherit the top strategies of the previous game.
So, as described here. Okay, so it's, it's basically, that's really cool that's using the LLM to kind of merge this or like sort of combined strategies from the previous generation. Nice. Yeah. And I think I'm thinking of expanding this to like be more direct, like a lineage where the top, the top donator is going to have like three children, the second top donator has two children, the third top donator has one children, just like replicates.
What is it called? Natural selection. So, I don't really understand, like, what is the, so I was just poking around online that I don't understand the difference between the donation game and prisoners dilemma except that I read that the donation game is a version in which you have a growing, it's a positive some, which is what you said as well.
So, why does that matter. Like, what's the distinction here between prisoners dilemma and the donation game. Remind me what the prisoners dilemma is about. Oh, okay. It's like, oh, well, you know, you know, like, if you don't, if you don't sell the other guy out, then, you know, you're gonna, he's going to sell you out and you're going to go, you're going to go to jail for like five years, but if you, you know, and if you sell, if you each sell each other out.
Then, you know, each of you gets three if, if you sell the other person out and they don't sell you out. And if neither sells each other out and you only get one right so like there's been a lot it's been probably the most studied game. And if you.
And, and so there's like this iterated prisoners drama which is like this, where you go over, if you play over and over again for an infinite amount of time then there are strategies for which you can induce cooperation by the other by your opponent. And, and so, um, so like yeah super, super well studied game.
And this is I think related game, but I guess you don't since you don't understand, you might have trouble distinguishing the two since you haven't encountered it before. I think it, I think, I think they both, they both represent like a similar dynamic where you want to, you want what's beneficial for like you to like as the sum of all parties involved, rather than the reward for a single one.
I'm actually, you could, you could potentially do the same thing for prisoners dilemma I just don't know how, how the information transfers between prisoners between the prisoners game. I think that the setup, it sounds to me like the setup is the same it's just that you have a growing like you have this positive some game right so that that's in that that changes that dynamics but I don't understand how so I.
Okay, yeah. Anyway, um, maybe I'll, I'll look into that and put it in the chat later. Yeah. Okay, I'm just gonna go through the rest. I don't think this graphic was actually very interesting. Oh, what happened here is on the third generation, some of the LLMs, and instance, and agent, and bear in mind this is a.
What was this for, like, the run that I did was. So it's four generations, six rounds for each generation and four players, and one of the players on the reason you see like a big spike here is that one of the players here decided to just donate more than all of his money.
I basically told them to do like a percentage between zero and one, and they just decided to do 1.3. So I just donated everything, which led to a massive spike in like the overall game pool here. Then, LLMs are terrible at numbers. Yeah. Yeah. It would be nice if you could do like constraint generation constraint, constraint decoding within like the boundaries of like something or something that would have been better but it is what it is and then maximum wallet amount for generations and this is basically the lucky guy who received the entire wallet from somebody else.
Yeah, I'm probably gonna do another run of this with that actually is like a perfect. That is in line with what they did on the paper which was like 12 generations, or like 12 rounds each generation for six generations with hopefully or something like that. Yeah. I'm curious, have you shared this with the authors been that they didn't produce any code for the paper.
I did. They like my Twitter posts and that's all I got. Yeah. Yeah, this is fire. The fact that you put it in code is fine. It's really cool to see it in code. Yeah, it's pretty easy to run to, like, it's just a Jupyter notebook. It's basically completely enclosed since it's just like a simulated environment.
So you can go ahead and try it yourself. It's just this. That's all I have. No book or get up or whatever. Yeah, that the tweet as well to please retweet or something. The what, oh the tweet. Yeah, I'll retweet it. Yeah, I'll take it. Wow. Oh, okay. The description was like AI generated and he just, and it was just, it just wrote like three sentences.
There was a super nice, you know, it's rare to have a reimplementation of a paper. That's always hard. Awesome. Great timing. Yeah, no comments like, well that's very smart. Yeah, I was about to say the classic like you know you should try it on one right. So, oh, what, what, you should try it on one because obviously you know this is a test of intelligence right.
Yeah, I'll take funding for that though. Yeah, it's gonna get expensive. But yeah, for sure. Just a thought, like, it is not like purely oh one, I can kind of show that in the, in the code. It does have a, it does think through whether or not it should donate for example.
This is right at the beginning. Yeah, so here are thoughts. This is just basically a chain of thought. Yeah, maybe running this on something like QWQ or DeepSeek will be like less expensive than 0.1 but like still interesting results. I have to try and run it. Yeah, I was thinking of using open source, but it actually wasn't too expensive with 4.0 mini.
Nice. I have no clue what the implications mean that 4.0 like was super stingy, but that's crazy to see. What? I have no clue what that means, what that implies that 4.0 was the stingiest of the models right like open AI training on the models. Yeah, I, no idea.
Did you use 4.0 or 4.0 mini? I use 4.0 mini. Oh, I see. So maybe, maybe there's a model class thing there too, right? Because I think the comparison to 4.0 would be Haiku. Yeah, yeah, yeah. Oh, that is interesting, right? Because they didn't, wait, is that true? 4.0 is equivalent to Haiku?
I mean, I don't know if it's equivalent, but I think there's similar class. They're like in the same cost class. Let's put it that way. So you're saying there's not a fair comparison? No, no, no. You said you use 4.0 mini. Yeah. That is noticeably different than 4.0. I haven't done that analysis.
I wouldn't be able to tell you. Yeah. Oh, okay. So the fact that it's like growing in, like the pot of money is growing over time. I'm not sure if my graphic is a good comparison to this, if that's what you're thinking. No, no, no. We're just talking about model choice.
You said you use mini, so. Yeah. Okay, well, very, very cool. That was actually like very productive. Really well done. Thank you. I guess we'll do Modern BERT next week. Yes, that would be sick. Yeah, Modern BERT would be awesome. Awesome. Okay, well, happy new year to everyone. We have continued our unbroken streak of paper clubs.
This is very hard to keep up, but we're through it. We're through the tough part. Oh yeah, and by the way, you posted the X post in there. I see it. All right. Awesome. Cool. See you guys. Happy New Year. Happy New Year.