[Paper Club] Berkeley Function Calling Paper Club!

All right. Hello everybody. Today we're going to be talking about the Berkeley Function Calling Leaderboard or BFCL. I'm Sam Juleen, if you don't know me, I'm in the Discord, I'm around. In my day job, I lead developer relations for Writer, which is an enterprise AI company. And then I also write at samjuleen.com and have a newsletter and things like that.

So BFCL is the Berkeley Function Calling Leaderboard that we've come to know and love recently. And it basically ranks models according to how well they do function calling, also known as tool calling, depending on who you are. And they evaluate on different things and post these scores and it's great.

And so the main thing is the leaderboard, but behind the leaderboard, there's also these three releases, three blog articles that they've come out with. And I can't, I'm going to open the chat to the side here so I can see. And when I was preparing for this, I didn't realize how quick all of these, like, this is all super recent.

The first blog article was in March, second version was in August, and third version was in September, which was a little mind blowing to me. They've done a lot in a very short amount of time. And then I listed out the folks who are on the team, some of them you'll notice from other projects.

I think Shashir has been in a lot of different avenues. And then they also have a Discord server if you're super interested in chatting with these folks. They have an LLM, Gorilla LLM Discord. So first up, we're just going to walk through sort of the three blog articles and just kind of go over what's in each of them.

And the third one, we'll have a little bit more time, since it's the most recent and also the most, kind of the biggest evolution as they've been doing it. So the first one that came out, like I said, in March, was really the first of its kind to make a function calling leaderboard.

So the purpose of it is to evaluate the LLM's ability to call functions and provide a comprehensive benchmark for function calling capabilities. Oh, and I see that I need to make Vibhu co-host. So let me see if I can do that really quick. If I can look at the participants, and make Vibhu co-host as a person.

There we go. Okay. Awesome. Done and done. Okay. So D1 basically was just the starting point. A pretty solid starting point. They had a pretty diverse data set, 2000 question function answer pairs across multiple languages and domains, and includes simple multiple parallel and parallel multiple function calling scenarios. And so the first version of the leaderboard looked like this.

Not terribly different than what we see today, but there are some new things that you'll see as we evolve. The initial data set composition is really interesting, because it's very code oriented. A lot of this initial set of function calling was very focused on specific language tasks across Python and other languages.

And then the main, I think sort of the main innovation of this first version of the leaderboard was that it used this, it introduced this abstract syntax tree evaluation, where it would look at the executable functions and evaluate them. And they have this nice flow chart in here. It's pretty interesting how they created this, where it will evaluate the functions and the parameters and see if it matches what was expected, and then kind of run it through these different scenarios.

And they go into more detail in the actual, in the actual blog article of how they actually parse through the functions with abstract syntax tree navigation and match the parameters and that kind of thing. So if you want to dig deeper into that, you can go through there. So that was sort of the first version.

Like I said, it was only six or seven or eight months ago, so not that long ago. And then with the second version, they were really focused on improving the data set. So they came out with this second version titled live data set, and they were really focused on getting more real world scenarios, like with live user contributed data, having more rare use cases, like having lots and lots of Nested parameters, and then fixing some of the issues from the first version, just figuring out some areas of like data contamination and bias and things like that.

So they spent a lot of time, like most of the second article is just about all of the different issues that they've, that they've worked on with the data pre-processing, filtering, quality improvement, that kind of thing. And I'll share these slides in the, in the discord as well. So you don't have to worry about taking notes.

And they ended up with a data set of 2,251 question function answer pairs so that they could address some of these issues. So they did a lot of like deduplication using Rouge scores. They did a bunch of filtering and just like standardized the function documents to better fit their format and enhance their user prompting.

So then if you look at the, the, the way they display the dataset composition in the second version, you can see that it's a lot less focused on specific coding languages and things like that, and more about what they were trying to accomplish with it. So like relevance detection, irrelevance detection, and maintaining some of these things with the abstract syntax tree and parallel multiple function calling and things like that.

So a lot of the sort of like underlying methodology of how they did the benchmarking didn't really change in the second version so much as the, the data process. So in that second version, they have this nice, another nice visual. I really like how many, how many visuals they include in these, in these blog articles.

So they have this one for how they actually did kind of the flow for all of the live data that they were getting. So they took the user queries and did this data pre-processing with them, and then did some filtering and some standardization and things like that to end up with this dataset.

So that's the second version. And then the third version was a pretty big leap forward to, to go into multi-term and multi-step function calling. So previously it was single turn or some parallel function calling, but this is more, this is really digging into multi-term and multi-step function calling, which I'll show a diagram of in a second.

But really to get to where we're testing for like agentic, mimicking, you know, mimicking agentic behaviors where we're kind of all trying to get to by using function calling. So what, one of the sort of like main innovations of this that we'll talk about in a second is the what, what they called state-based evaluation.

So they really updated their evaluation from the abstract syntax tree to something called state-based evaluation, which we'll, we'll talk about in just a second. So they define the terms in the, in the third version of single term, multi-step and multi-turn. And that's basically the difference in being able to say something like just, you know, find me flights from New York to Tokyo tomorrow versus like, I want to plan a flight or I want to find multiple flights or, or multi-term being sort of like maintaining a context and, and asking follow-up questions.

And that's where like having to evaluate state comes into play because the model is going to have to remember that it's got these different responses kind of rolling of like, I want to book this one, okay, book me the 10 a.m. version, okay, like what's the confirmation ID, that kind of thing.

So really interesting how they set this up. So the dataset composition, they also made some pretty interesting improvements on how they did the data composition. They started categorizing them according to like multi-turn and augmented. I see a question in the chat, go full screen or slideshow. I can, I was leaving it like this so that we could bounce back and forth to the actual blog articles.

But for now I could do, oh, actually then I'd have to stop sharing and re-sharing. So I'm going to leave this for now, unless it's like really, really hard for people to see. I can also hide the navigation and zoom in a little bit probably. Okay. Oh, that's going to be too much.

Let's go back to fit. I'll also share these slides and see if I can just, let me just dump the link in here as well. But if you want to look at it close up, you can. Okay, so they had to change their approach to actually curating the data, which was really interesting because they wanted to be able to capture the multi-turn and the long context multi-turn and that kind of thing.

And so they did this really interesting approach where they basically created their own API for the sake of this testing, and then did this like mapping to create a graph edge construction and like generate tasks through that. And so I'll just show really quick, I'm jumping ahead a little bit, but what was cool is they built out this like whole API code base where they had like vehicle control and stock trading and travel booking, and then like a file system.

And then also like a couple of different like cross-functional APIs like for messaging and simulating Twitter and creating tickets and doing math and that kind of thing. And kind of like built out this system where they would build out these APIs and generate a graph from it and then use that to derive a bunch of example queries and function lists and that kind of thing.

And then generate this dataset that way, which I thought was like pretty ingenious. So you can now see the third version, the dataset composition looks really, really different than it did in the first couple. And that's because they're really honing in on like what they're trying to test for there.

And so, and they actually go into, in the article, they go deeper into like what all of these mean and what they do. But you can see that they're doing things like having a specific set of questions for testing for long context or testing for missing params, that kind of thing.

And so, yeah, like I said, they turned that into this nice system with an API that they created that they could use to evaluate. And then they have this interesting validation process where they check for the questions and they use this human ground truth as a way to check for these things.

And they go into a little bit more on that validation side. Yeah, so they actually like have humans label like what would be the correct answer for these different scenarios and that way they could check to see how close the model gets to aligning with the human response. And so, then that gets us into state-based evaluation.

And so, they use this as the primary metric to assess the performance of the models. And this is really interesting because the way they basically compare the final state after all the function calls are executed to see whether the system's internal changes after each step align with what you're expecting.

Which is better for reflecting real world performance because in multi-turn interactions, you're trying to update a system state, whether it's with a vehicle or a file system or something like that. You want to be able to check to see whether you've successfully deleted a file or something like that.

So, it's not as straightforward as just like answer a math problem or call an API or something like that. So, they compare the attributes of the system state after every turn against the expected state. So, for example, if you had a series of function calls of asking the LLM to like create a file, write data to the file and close it, it would basically check after each turn whether the file exists, whether the correct data was written, whether the file was properly closed.

And if they're present and correct, then the evaluation succeeds. And so, then they also have a section at the end on the results and error analysis. I thought this was like a super interesting section of what the models struggled with. And they gave a few examples. So, the first one was failure to perform implicit actions.

So, they had this example here of like asking the model to fill a fuel tank. And so, the model thought that the tank was already full or nearly full and didn't like understand that they didn't actually need to do the filling. So, that was pretty interesting. And then the second one they gave was failure to understand the current state before performing the action.

So, if, for example, you're already in the present working directory is already Alex, and you tell it to go into the directory named Alex, it didn't know that it could just like check to see whether you were already in the present working directory named Alex. It would just create it anyway.

So, pretty interesting. And then the last example that they give is LLMs incur unnecessary planning and thinking, which I thought was like a really funny way of putting that. But so, they give the example of like you're already authenticated in the Twitter API, and you ask the model to do something with Twitter, and it like goes ahead and tries to authenticate you again.

It didn't know that you were already authenticated with Twitter. And so, it's sort of like unnecessarily added a step. And so, it's interesting because all three of these sort of tie into like knowing the current state or like current context that the question is asked in. And that turns out to be like more challenging than you would think for these models.

And so, that's like, that's all I had like for the actual overview. And then I figured if anybody wanted to just like chime in and talk about any of these things we could. I also posted the data set on Hugging Face, which is pretty interesting. It doesn't quite work.

There's some sort of error where you can't like see all of it, but you can see some of it. A lot of them are the math problems, but there's also like some travel questions and other like planning questions and things like that, which are kind of just like interesting inspiration for function calling in general.

But yeah, that's all I had for the actual like overview of everything. Since nobody's jumping in here, I found what you mentioned about using the syntax tree to generate sort of like, I think, I don't know if I understood this correctly, but generating sort of test cases that are like a sequence of API calls basically.

And then from that, generating us like a scenario that sort of like implements that so that you can check that has an absolute correct answer and then like a scenario that you can have the LLM, the function calling LLM follow through. Is that, did I understand that correctly? I think so.

Yeah. Caveat that I'm still wrapping my head around it too, but yeah, I think that's basically what they did, where they kind of had these different categories and that they wanted to accomplish. And then they used the source code and everything to create the graph and like generate the tasks to do it.

But then I still think, how did, did you grok how they graph, how they did the graph edge construction? Yeah. Let's see. We'll talk about that a little bit more down here. Yeah. Yeah. So there's this section here where they talk about each function represents a node and then they manually map out direct edges, meaning a functions output is an input of the downstream functions.

I guess they've got this example here of placing order, like getting the available stock is necessary in order to place the order for the stock. Whenever we need a data set, we sample a node on the graph and randomly traverse the graph to generate an execution path. I see.

Execution path. We're able to extrapolate. Yeah. Oh, right. Okay. So they basically go something like, okay, manually create the graph for the API and then sample a node in the graph and then traverse the graph according to the edges. And then from that, then generate. So you have function calls, a list of function calls and probably parameters, and then generate a scenario based on that, that sort of matches with that.

Is that kind of what I am? Yeah, I think so. And I forgot to mention this part, which is really interesting. They use this data set from Persona Hub to take a few examples of different personas and then generate the different queries based on those personas, basically. So like stock trading by an elderly hermit, for example.

Yeah. I saw that. Yeah. But then I think after, once they had that, oh yeah, so they've got a triplet of question function list and initial config, and then they had humans actually look at it and verify everything and validate that these are good questions that actually make sense.

Right. Okay. Yeah. I agree. That is a pretty cool way to generate a data set like this, where you kind of get a good covering set and sort of generate the language around it, and then I guess it's small enough that you can label the whole thing, or did they label the whole thing?

I think so, because there were only like 2,200 questions. Yeah. So you can probably... So it seemed like they labeled everything. Yeah. Yeah. Okay. And they did a series of like validations for each one, and also built tests around unit tests and error handling tests and things like that.

All right. Got it. Yeah. Because, I mean, I guess at the end of the day, the whole idea here, if I understand correctly, is that I can, because I'm tracking my state, I can know that the series of function calls that it made, I ended up in the correct state because of them, or I didn't.

And so you get a hard yes/no on that. Yeah. I think that's exactly what they were designing around because of the state-based evaluation. Yeah. Yeah. Yeah. That's pretty cool. Yeah. There was also a whole section they didn't really dive into of like, they go through like a lengthy discussion on like response-based evaluation and why they don't use like the REACT technique and things like that that they didn't really dig into.

But other things that are just interesting around like the limitations for evaluating with multi-turn scenarios. Yeah. Yeah. Cool. Well, yeah. Yeah. Thank you. Yeah. That's all I got. These aren't like super long. And so it's not like a crazy amount of stuff to read through, but it's definitely really interesting and then helps you kind of understand things a little bit better.

The only thing I didn't, I didn't see a lot of information on was how they derived this hallucination measurement. I don't know if anybody else read through it and caught that, but it was the only thing that I didn't see like a lot of information on how they did the hallucination measurement.

I believe, and I could, this is not from the paper, but just like a thing that I think it would be is cause they have a, they have a data set or something of a data set called like API zoo, where basically it's sort of the open source repository of like the correct documentation of all the API or of a large amount of, of AB APIs.

So presumably that could be like a ground truth to benchmark against and then like how far, you know, you are, I don't know, probably semantically or something from that or state base wise. I don't know. Oh, interesting. I guess. That's interesting. I don't suppose anybody tried to run this locally that they, cause they link out to the code.

You can actually like run it locally. I did not try to do it, but just so, just so people know you actually can, you can just clone this and try it yourself, although you need like a thousand API keys, like any other, any other leaderboard thing. But yeah. I've not tried the leaderboard specifically, but there's a, there are a few things in that repo that are, that are quite useful, open functions, Gorilla, CLI, et cetera, et cetera.

Check them out. Oh, cool. Sam, is there, is there any description on how they, how they provide the, the state of the model, like what API queries it has already executed or what the original request is? How do they go about that? Yeah. I think there was in this section, I mean, they'll go into a ton of detail, but basically let's see.

Ah, so yeah, it says as the model interacts with the API, the backend tracks, how the state evolves. Each function call alters the state, such as creating files at every turn, we compare the instances, current state from executing the model function calls with the expected ground truth. So they don't go into like any technical detail.

I bet, I bet it's in the code somewhere. But it sounds like it's done in the API itself. I'd be curious to know more detail about like, what, like how that actually happens and how they read it out. It seems like that would need to be specific to the API, right?

Because each API is going to have its own set of state variables that it needs to track, right? Like a file system is, oh yeah, Twitter posts or whatever. But it seems like they did their mock API is probably implement some sort of state. That's my guess. That might be a good, I think Swig said they're going to have somebody from BFCL on the podcast soon.

That would probably be a good question for them. I'd be super curious about that. So do they have, do they have like an API that orchestrates the request to, to other APIs, or is this just like referring to any API that you hit? I think it's, yeah, they must've had, they must've had sort of one collective API that was a collection of the, of the like three or four or no, it was like four single things in like four cross-functional APIs.

So they must've had something that was orchestrating those different. Yeah. I mean, if you're not familiar with Gorilla itself, that's what it does is it, it is, it's only, it's a model that specifically generates API calls and the thing else. Wait a minute, but we're talking about a model and then we're talking about like an API that orchestrates requests to others, other towards other APIs.

So you can make an API call that calls this model, but that's, that's where I'm like, do you say it was a model, um, calling API calls that it was trained on. I showed up to this meeting late, so you might not know this, but I don't know is I'm not really sure, can't picture this.

Yeah, that's a good question. I'm not, I'm not sure how they, how they did that. If they had each API just separate, um, and isolated so that the model would just call them, like call, call for the, that API directly, or if they probably, I mean, that would be my guess is that they kept them separate because then they would know whether the model was actually calling the right API and not just like a gateway.

I don't know. That's a good question. So Yikes, you're having somebody from the gorilla team come to speak. Um, I, I think that is a, I think that was, I saw Swicks mentioning something about that. Um, I think he's going to have somebody come on the podcast was what I think he said.

I don't remember who, yeah, there's something in discord about it. Uh, I, I am not organizing anything at the moment, but, uh, you know, if, if you have questions, they, they tend to be like super responsive in discord. So just go jump in the gorilla LLM and fix some brains.

Yeah. Yeah. It looks like a pretty active server. It would be so much fun if he hosted the podcasts with us as an audience, I would love that. I could dig it. That could be fun. Yeah, I would have loved to have spoken to like the, um, the person behind structured generation and open AI and a lot of other people.

It'd be interesting to like, uh, uh, at least like a, a questions to ask drop off boxes worth doing or something. Yeah. Or something we can do is like, we could do like an AI in action where we actually like, you know, deploy this. Um, I don't know, Sam, would you be like open to doing that?

I I'm not in charge here. I don't know. I don't know. I'm just say deploy this. What do you mean? Like deploy this? No, no, no. It's not the play. Just, just get it running. Um, just to like clear up any doubts. Like the leaderboard. Do you mean Gorilla?

I'm talking about this, this system that they're describing here. Oh, well, uh, okay. So yeah. So that's the leaderboard. I think, um, I mean, uh, I mean, yeah. Like, if that's something that you are interested in trying to, to make happen, I presume that we would have an open slot on AI in action at some point.

And that like, sounds interesting, just like try to get the thing working. Um, I'm not committing to anything. Not committing. I mean, that's, that's fair. That is an interesting idea of like, we could do an AI in action rather than of a, like, let's try and get the thing working as opposed to like, here is how the working thing that I use works.

No, no, no. Interesting. Or I already, I already have it working. No, get it, get trying to get it working in a span of like one hour is not feasible for me at least. Yeah. I mean, you know, cool idea. I think, um, I'm just throwing it out there.

If somebody wants to do that, I'm, I'm gonna be on there. For sure. I'm going to be an audience. Yeah, I've had Gorilla come up very often, partially from Yikes and other people. I'm a huge Gorilla shill and kind of a Shishir fan, you know, um, but that's, uh, the Gorilla CLI is really helpful.

Um, cause I, I was using GitHub CLI or the GitHub copilot CLI for quite some time. Um, but Gorilla CLI is like open and I don't have to worry about GitHub connections. I can just like grab it and it does the same thing basically. What is the Gorilla CLI do?

I don't, I'm not, um, you just, you put in a natural language query of the CLI command that you want to run, um, and it'll, it'll give you the, the CLI command to do it. It's like, uh, if you used copilot CLI, it's same thing basically. I see. Okay.

I haven't used that. That's, that's a cool idea. I mean, I've, I've, I've wanted to, to implement it myself, but it sounds like, you know, Gorilla already has, uh, they already tied up their, their, their model with, with a little front end. That's cool. Yeah. No, there's, uh, the, the other interesting thing, I guess, to go over the cool stuff in the repo, there's a thing called, uh, uh, Gorilla open functions where you can sort of more or less like staple a Gorilla to whatever model that you're playing with.

So if it doesn't have function calling, you can just implement open functions and it'll give it function calling basically. Oh, cool. And then you ended up with no files in your file system. Yeah. Well, you know, it does elucidate from time to time, so yeah, we didn't actually, I almost wonder if that'd be another, either AI in action or paper club, just like the Gorilla stuff in general.

Like we didn't, there's a paper for Gorilla and everything, but yeah, like it was going to be too much to try to cram into this. That's really interesting. Yeah. I do feel like I would like to see a demo of, of the, like that, the CLI CLI total CLI tool chain.

Like I know that, um, uh, I've, I've seen a bunch of like, sort of these things in use and, but I, you know, I haven't gotten started with them myself and I wanted to, and never had the time to get started and know what's actually useful. So yeah. Yeah.

It would be a cool AI in action. That would. Cool. Anything else? Anybody other, other discussion topics? So I actually, this, I have been, I've had an idea in my head for a while and I wanted to share it with people. If there's no other, it's relevant to this topic, I want to share with people in case if there's no other topics, um, but the idea is basically to build a leaderboard of the quality of predictions of a, of a, um, of an LLM or LLMs using, um, like an allowee, like, uh, so the idea would be to have questions that are like, that, that are around the, some future event, like who will be elected president.

Right. the, and, and, and you have a list of them and then you're, and then you ask LLMs to predict based on some sort of canonical data that you also bring in for the LLM to use plus any other data that it wants to, like if it's an agent system, then it can go and search the web and do whatever it wants and brings in like whatever context it wants and then makes prediction on a given day about that event.

And then those, a series of questions like sort of evolves every day and you basically, the LLMs placing bets basically on that. And then over time you track the quality of prediction of those LLMs or agent systems. And the reason why I think this is interesting is because, um, judging a, a, an AI or a person's prediction quality is actually one of the only ways that you can actually assess like quality of information on the, on the internet.

Right. So if, if you have a, you have a person or an AI that can accurately predict the future and that, and, and do it better in sort of, in like sort of stacked rank, right, because I can always predict, I can very accurately predict that the sun will come up tomorrow.

So it's not, it has, you have to be, it has to be a relative ranking. And if you have a, a, an AI or a person that can predict better than, you know, sort of the, than the crowd, then that person likely has a good model of the world.

Right. And that, that good model of the world represents, you know, sort of like at least, um, somebody who you probably want to listen to more than the crowd. Right. So I feel like this would be a really cool, um, start to developing a way to judge both people.

Like you can imagine now look like crawling the news sites and look at who people are making predictions and then sort of give everybody like track scores for different people, um, and their predictions in addition to LLM. So I think, I feel like this would be a really cool way to get a project started.

I wonder if people have thoughts about this and if anyone would be interested in working together on it, deafening silence. Uh, I was, uh, I was, I was busy trying to get my Gorilla CLI, uh, working. Um, but I, uh, so I was distracted. What is the thing that you want, uh, help on that you were doing?

Okay. Summary leaderboard of the quality of predictions that AIs and agents and people make. So like news items or whatever, like certain current events and then you predict and that becomes a bet kind of sorta, and I can explain the betting mechanism that is fair. And then, um, like a, just a general prediction market.

Well, it's like, it is like a prediction market, but it probably looks a little different. Um, because I don't think it's useful to have, it's particularly useful to have it like, like maintaining an account and the ability, like that's a separate ability that I think is not necessary to track.

So it, but something like a prediction market, but for us, um, a set of things that, um, you know, you sort of like a limited set of things that you track over time and whichever LLM and or person does the best kind of filters to the top. That sounds interesting and probably related to things that I'm interested in.

So if you like, I don't know if you'd start cranking on it or throw up a repo and shoot me a link or whatever. I'll probably take a look and play around with it. Yeah. Um, if you're not familiar at, and I don't know if they have like an SDK or something so you can rig up your own, but, uh, you can check out poly market.

It's like one of the biggest, uh, yeah. Yeah. So, I mean, I don't, I don't want to place real bets. Um, and I'm not sure that the, maybe, maybe I'm sure there are like, I'm thinking like just throw up a poly market instance on like test net or something and just use monopoly money or whatever.

Yeah. Yeah. Monopoly. Okay. It's wait. So poly market. Well, okay. Maybe I'm not familiar. I thought it was familiar with poly market, but maybe I'm thinking of a different one that is that the one that Nate's over is, uh, discord thread about this. Yeah, sure. Yeah. Yeah. Yeah. They get there and see how this is interested.

Uh, yeah. Uh, back on paper club, do we have any last questions for Sam on gorilla? I think some stuff won't popped up in the chat, but it looks like you kind of answered it in chat to, um, basically that, and then for next week's paper, if anyone wants to volunteer, okay, has anyone, uh, have you guys seen Momo?

It's the open weight, open data, multimodal state of the art model. So it's from. Yeah. 21. I think, I don't know how good, but the model is pretty good. It's a pretty decent technical report. I'm hoping the paper is good because I'm down to cover it. They do go a lot into data, how they, um, yeah, how they get the data.

I think they pre-train it from flip, but it's, it's a pretty good recent multimodal foundation model. Um, I can cover that unless someone else wants to volunteer. That sounds super interesting. Yeah. Um, I was going to say I can be a backup if I pick a Monte Carlo paper and it actually, and everything works out.

Um, but I would actually be pretty interested to, to get a, get a download on, on Momo too. So. Okay. I'll do, I'll do Momo next week. Um, here's a link, I'll, I'll throw it in paper club, but thanks for, um, sharing Sam. Real quick. I did want to check in.

Is there anybody that's like new to paper club that has not presented or is like kind of maybe on board with it? Maybe not. If so, uh, fricking, uh, raise your hand, I guess. And, uh, you know, if you need, uh, like a little bit of help or support or whatever, trying to like get something, because I went trying to facilitate new people that haven't done it before.

Cause I know you and I have done it a bunch of times. Um, but yeah, feel free to like ping me in discord or whatever, if you have questions or something, or if you want to do it, um, feel free to, yeah, let's let one of us know. And we would, we would happily hand over the reins.

Yeah. I'll make an announcement that this is the paper. If anyone else wants to sub in, we can always take it. Yeah. Cool. Thanks. Uh, Sam, for covering any last thoughts on this. I know, I know some questions have popped up. You're answering. No. Yeah. This has been great.

Thanks for everybody for listening, listening and joining. Awesome. Thanks guys. Take care. Bye everyone. Thanks. Bye.

[Paper Club] Berkeley Function Calling Paper Club! — Sam Julien, Writer

Transcript