Back to Index

Unlocking Agent Creation: Agentic Architecture Lessons


Transcript

Ben Kuss: Hi, I'm Ben Kuss, and I'm here to talk about the lessons that we learned at Box building agentic architectures. If you don't know Box, we're a B2B company. Many people know us from our content sharing, but we think of ourselves as an unstructured data platform. We typically deal with large enterprises, so big companies across Fortune 500.

We have over 115,000 companies, tens of millions of users, and our customers have given us and trusted us with over an exabyte of their data. And in many of these companies, we are actually the first AI that they started to deploy across their company, partially because many large enterprises are scared of AI, and we were lucky enough to already have been trusted.

And so when we do AI, we're always thinking for enterprise grade. Ben Kuss: Now, when we went to do AI on content, we typically would think about it in these different ways, where we had kind of standard rag stuff, doing Q&A across a bunch of documents, searching, doing deep research across a bunch of corpus of data.

Ben Kuss: And then data extraction is also a feature that we have. So we do extracting structured information from unstructured data. Ben Kuss: In addition to things like AI-powered workflows, like being able to do loan origination, like insurance of some regeneration, and this kind of things. Ben Kuss: But today to talk about our journey, I'm going to talk about the middle one here for data extraction and talk about how, since we've been integrating our AI into our products since 2023, how we've kind of evolved to be more agentic.

Ben Kuss: And I picked this one partially because I think of this list, this is the least agentic looking type of functionality. Ben Kuss: There's no chatting. There's no chatbot associated with it. Ben Kuss: And so this was an interesting lesson that we learned here. Ben Kuss: So if you don't know much about metadata extraction, the idea behind it is that many companies have an awful lot of unstructured data, probably 90% of data in the world is not in a database.

It's in this unstructured form. Ben Kuss: And there's a lot of very useful data in it. And so companies always want to get data out of their unstructured data. Ben Kuss: So this is what we call metadata extraction or data extraction. And it's a common request for many companies.

Ben Kuss: And many of, there's actually a whole industry here that you've probably never heard of called IDP. Ben Kuss: And it's really oriented around machine learning based systems where you would like train a model and get a bunch of data scientists to do this kind of extraction. But it really didn't work that well historically.

Many companies just, they would only automate things at an extremely high scale and it just wasn't very commonly utilized. And also it would break all the time, very brittle, because if you change the format of anything, it would just kind of stop working. Ben Kuss: So when Generative AI came out, this was like a gift for anybody who deals with unstructured data.

Because you could actually just use the power of AI to be able to pull out structured data. Ben Kuss: So for us, we started with this architecture. Ben Kuss: Really straightforward. Ben Kuss: Take your document. Ben Kuss: Take the fields that you're looking for. Ben Kuss: Do some preprocessing.

Ben Kuss: And then some OCR. Ben Kuss: And then be able to give it to the large language model. Ben Kuss: You say, give me these fields. Ben Kuss: And it pops it out. Ben Kuss: You get the extracted data. Ben Kuss: This is amazing. Ben Kuss: When we did this, we immediately deployed it to 10 million pages, the first customer, everything was working.

Ben Kuss: And we got to the point where we were saying, like, this is, can do any document now. Ben Kuss: This is amazing. Ben Kuss: And so it was just really built around the basics of AI on content. Ben Kuss: And so this was great, you know, it was kind of like, yeah, generative AI solved.

We did it. Ben Kuss: You know, high fives. Ben Kuss: But then we started to hit the problems. Ben Kuss: When we started to tell our customers, just give us any data and we'll be able to extract the things you want, like they did. And so they were like, oh, I've never been able to automate this thing before.

This 300-page document that was well beyond the context windows at the time. Ben Kuss: And we were like, okay, no problem, we'll pre-process more when we built the concept of like an enterprise rag where we were able to get the data out and so, okay, solve that. But then they were like, okay, turns out OCR doesn't work that well in certain cases when people cross things out or when you have to deal with different languages.

Ben Kuss: So we had to start to solve that. Ben Kuss: Then we had this challenge where some people were like, okay, I want not just 20 pieces of data from this document, but like 200 or 500 different pieces. Ben Kuss: And that just kind of like overwhelmed the attention of the model to be able to pull all those things out for you, especially on complex documents.

Ben Kuss: And then people in this world are used to things like confidence. Ben Kuss: They're like, well, how do I know it's right? What's your confidence score? Ben Kuss: And of course, generative AI doesn't have confidence scores like old ML models do. Ben Kuss: So we had to like start to do things like, oh, we'll run an LM as a judge and it'll tell you after it's done if it thinks it was accurate or not.

Ben Kuss: And I was like, okay, sure. Ben Kuss: But like it told me it was wrong. Ben Kuss: So why are you telling me if it says it's wrong? Ben Kuss: So we ended up with all these challenges. Ben Kuss: And this was like our moment of like the trough of disillusionment of AI, generative AI because the thing that was working so well, that was so awesome, that was so elegant, just didn't work.

And so for us, like a natural engineering response to this is like, okay, we'll pre-process more or we will solve each of those little problems. But then we were thinking about it more and we watched Andrew Ng's deep learning class with Harrison. And then we realized that if we applied an agentic approach to this, then maybe you can get a much better outcome.

Ben Kuss: And some people at the time were like, that's kind of crazy. This is not an agent, this is just a function, you know, get the data out of this document, it's not that hard. And so then we ended up re-architecting from scratch with an agentic approach. So rather than just do the pre-process, pull out the data post-process, we did a steps multi-agent architecture where we separated out the problems that we had into a series of sub-agents whose job was to solve these kind of problems and solve them intelligently.

When it came across some of these files and somebody said, I have 500 fields, our previous heuristic-based approach is like, oh, just chop them up into different field groups. It stopped working when you have client files, client customers of your contract, and then customer addresses. Those kind of need to go together.

Otherwise, weird things happen with the large language model. And so those kind of things, it would learn to group together. And then being able to do things like when it went to go extract the data, rather than us pre-deciding what it should do, it agentically would figure out, I'm going to call this to get these parts of the data.

Maybe it's going to look at the picture of the pages in addition to just the OCR. And then we incorporated a quality feedback loop, not just to give you confidence, but then also to give feedback so that the AI could try different techniques. It looks like field number three is wrong, so all right, well, let me try again.

Maybe I'll use different models to vote and to do other techniques. And this approach really solved a lot of our problems, not just because it solved the issues at that moment, but because it became easy for us to update. And this really sort of is the key of what we learned here, which is that when you're thinking of building these intelligent-powered solutions, if you do an agentic-based approach, it's a much cleaner abstraction.

And you start to, from an engineering perspective, especially if you're dealing with large-scale systems, you start to separate out, rather than it be like, okay, we need a large-scale conversion, an OCR system to process all these things. You start to think of it like, no, I've got one document, and I've got to get through these fields, and I'm just going to think of it the way that you would do it as a person or as a team of people.

And this really helped the abstraction for us to then go in and be able to improve it. And this is maybe the biggest benefit was, it was very easy to evolve. We got to the point where we were saying, oh, for this kind of least document and for this other kind of other document, it's going to make sometimes a specialized agent who had his own specialized routine to do these things.

And the ability for us to quickly evolve, rather than say, oh, I know, we'll build a new distributed large-scale system, but instead say, let's just add a new supervisor to the graph to double-check the results when you're done. This let us quickly evolve. And then so that when a customer came to us and said, it's not working very well on this new crazy type of document I'm giving it, we could say, ah, give us a little bit of time to build you a slightly updated agent to do these things.

And the last piece here is, and I didn't quite fully realize this at the time, was that by making your engineers think about AI and agentic workflows and think about the kind of lessons that you learn when you're building these things, they then start to think about customers. So many of our customers are actually building their own Landgraf-powered or other system-powered agents.

And then so they'll call us as a tool. And so then our engineering teams will now be like, oh, I have some ideas on how it might be easier for us to make the tools that call us to do these kind of data extraction steps or anything else easier.

And so this is one of the key lessons was, as many people are on this quest to build an AI-first engineering organization, these kind of like actually building this way helps quite a bit. So if I went back in time to talk to myself before, or if you asked me for advice on anybody who's got an existing system, and you said, I'm going to build some intelligent features, what advice would I give, my number one piece of advice, and I think I'm the last speaker here, so maybe this is a piece of advice that can hopefully summarize part of this conference is, if you think you have, if you start to go down the path of building something, build agentic, build it early.

And with that, thanks, everyone. Thank you very much. Thank you very much. Thank you very much. Thank you very much. Thank you very much. Thank you very much. Thank you very much. Thank you very much. I appreciate it.