Unlocking Agent Creation: Agentic Architecture Lessons

00:00:00.000 | Ben Kuss: Hi, I'm Ben Kuss, and I'm here to talk about the lessons that we learned at

00:00:16.000 | Box building agentic architectures.

00:00:20.000 | If you don't know Box, we're a B2B company.

00:00:23.660 | Many people know us from our content sharing, but we think of ourselves as an unstructured

00:00:27.480 | data platform.

00:00:29.380 | We typically deal with large enterprises, so big companies across Fortune 500.

00:00:36.160 | We have over 115,000 companies, tens of millions of users, and our customers have given us and

00:00:41.620 | trusted us with over an exabyte of their data.

00:00:43.620 | And in many of these companies, we are actually the first AI that they started to deploy across

00:00:48.240 | their company, partially because many large enterprises are scared of AI, and we were lucky

00:00:52.860 | enough to already have been trusted.

00:00:54.220 | And so when we do AI, we're always thinking for enterprise grade.

00:00:58.760 | Ben Kuss: Now, when we went to do AI on content, we typically would think about it in these

00:01:04.540 | different ways, where we had kind of standard rag stuff, doing Q&A across a bunch of documents,

00:01:11.000 | searching, doing deep research across a bunch of corpus of data.

00:01:14.540 | Ben Kuss: And then data extraction is also a feature that we have.

00:01:18.320 | So we do extracting structured information from unstructured data.

00:01:21.320 | Ben Kuss: In addition to things like AI-powered workflows, like being able to do loan origination,

00:01:26.780 | like insurance of some regeneration, and this kind of things.

00:01:31.320 | Ben Kuss: But today to talk about our journey, I'm going to talk about the middle one here for data

00:01:37.100 | extraction and talk about how, since we've been integrating our AI into our products since

00:01:41.880 | 2023, how we've kind of evolved to be more agentic.

00:01:45.880 | Ben Kuss: And I picked this one partially because I think of this list, this is the least agentic

00:01:49.920 | looking type of functionality.

00:01:52.380 | Ben Kuss: There's no chatting.

00:01:53.880 | There's no chatbot associated with it.

00:01:54.880 | Ben Kuss: And so this was an interesting lesson that we learned here.

00:01:58.380 | Ben Kuss: So if you don't know much about metadata extraction, the idea behind it is that many

00:02:04.220 | companies have an awful lot of unstructured data, probably 90% of data in the world is

00:02:08.560 | not in a database.

00:02:09.560 | It's in this unstructured form.

00:02:10.560 | Ben Kuss: And there's a lot of very useful data in it.

00:02:12.700 | And so companies always want to get data out of their unstructured data.

00:02:16.740 | Ben Kuss: So this is what we call metadata extraction or data extraction.

00:02:20.880 | And it's a common request for many companies.

00:02:23.740 | Ben Kuss: And many of, there's actually a whole industry here that you've probably never heard

00:02:27.920 | of called IDP.

00:02:28.920 | Ben Kuss: And it's really oriented around machine learning based systems where you would like

00:02:33.660 | train a model and get a bunch of data scientists to do this kind of extraction.

00:02:37.740 | But it really didn't work that well historically.

00:02:40.920 | Many companies just, they would only automate things at an extremely high scale and it just

00:02:44.980 | wasn't very commonly utilized.

00:02:47.400 | And also it would break all the time, very brittle, because if you change the format of anything,

00:02:50.920 | it would just kind of stop working.

00:02:54.100 | Ben Kuss: So when Generative AI came out, this was like a gift for anybody who deals with unstructured

00:03:00.100 | data.

00:03:01.100 | Because you could actually just use the power of AI to be able to pull out structured data.

00:03:06.100 | Ben Kuss: So for us, we started with this architecture.

00:03:09.280 | Ben Kuss: Really straightforward.

00:03:10.280 | Ben Kuss: Take your document.

00:03:12.280 | Ben Kuss: Take the fields that you're looking for.

00:03:14.280 | Ben Kuss: Do some preprocessing.

00:03:16.280 | Ben Kuss: And then some OCR.

00:03:18.280 | Ben Kuss: And then be able to give it to the large language model.

00:03:20.280 | Ben Kuss: You say, give me these fields.

00:03:22.280 | Ben Kuss: And it pops it out.

00:03:23.280 | Ben Kuss: You get the extracted data.

00:03:25.280 | Ben Kuss: This is amazing.

00:03:26.280 | Ben Kuss: When we did this, we immediately deployed it to 10 million pages, the first customer, everything

00:03:32.460 | was working.

00:03:33.460 | Ben Kuss: And we got to the point where we were saying, like, this is, can do any document

00:03:37.460 | now.

00:03:38.460 | Ben Kuss: This is amazing.

00:03:39.460 | Ben Kuss: And so it was just really built around the basics of AI on content.

00:03:45.340 | Ben Kuss: And so this was great, you know, it was kind of like, yeah, generative AI solved.

00:03:50.340 | We did it.

00:03:51.340 | Ben Kuss: You know, high fives.

00:03:53.120 | Ben Kuss: But then we started to hit the problems.

00:03:55.720 | Ben Kuss: When we started to tell our customers, just give us any data and we'll be able to extract

00:03:58.820 | the things you want, like they did.

00:04:01.140 | And so they were like, oh, I've never been able to automate this thing before.

00:04:04.200 | This 300-page document that was well beyond the context windows at the time.

00:04:09.140 | Ben Kuss: And we were like, okay, no problem, we'll pre-process more when we built the concept

00:04:13.980 | of like an enterprise rag where we were able to get the data out and so, okay, solve that.

00:04:20.240 | But then they were like, okay, turns out OCR doesn't work that well in certain cases when

00:04:24.320 | people cross things out or when you have to deal with different languages.

00:04:27.660 | Ben Kuss: So we had to start to solve that.

00:04:30.560 | Ben Kuss: Then we had this challenge where some people were like, okay, I want not just 20 pieces

00:04:34.820 | of data from this document, but like 200 or 500 different pieces.

00:04:38.820 | Ben Kuss: And that just kind of like overwhelmed the attention of the model to be able to pull

00:04:41.580 | all those things out for you, especially on complex documents.

00:04:43.660 | Ben Kuss: And then people in this world are used to things like confidence.

00:04:47.660 | Ben Kuss: They're like, well, how do I know it's right?

00:04:49.540 | What's your confidence score?

00:04:50.540 | Ben Kuss: And of course, generative AI doesn't have confidence scores like old ML models do.

00:04:54.000 | Ben Kuss: So we had to like start to do things like, oh, we'll run an LM as a judge and it'll

00:04:57.840 | tell you after it's done if it thinks it was accurate or not.

00:05:00.840 | Ben Kuss: And I was like, okay, sure.

00:05:01.840 | Ben Kuss: But like it told me it was wrong.

00:05:03.340 | Ben Kuss: So why are you telling me if it says it's wrong?

00:05:05.720 | Ben Kuss: So we ended up with all these challenges.

00:05:07.720 | Ben Kuss: And this was like our moment of like the trough of disillusionment of AI, generative

00:05:12.840 | AI because the thing that was working so well, that was so awesome, that was so elegant,

00:05:16.600 | just didn't work.

00:05:18.340 | And so for us, like a natural engineering response to this is like, okay, we'll pre-process more

00:05:25.020 | or we will solve each of those little problems.

00:05:29.200 | But then we were thinking about it more and we watched Andrew Ng's deep learning class with

00:05:35.360 | Harrison.

00:05:36.520 | And then we realized that if we applied an agentic approach to this, then maybe you can get a much

00:05:41.620 | better outcome.

00:05:42.620 | Ben Kuss: And some people at the time were like, that's kind of crazy.

00:05:45.880 | This is not an agent, this is just a function, you know, get the data out of this document,

00:05:50.320 | it's not that hard.

00:05:51.940 | And so then we ended up re-architecting from scratch with an agentic approach.

00:05:56.980 | So rather than just do the pre-process, pull out the data post-process, we did a steps multi-agent

00:06:04.040 | architecture where we separated out the problems that we had into a series of sub-agents whose

00:06:08.160 | job was to solve these kind of problems and solve them intelligently.

00:06:12.420 | When it came across some of these files and somebody said, I have 500 fields, our previous

00:06:17.800 | heuristic-based approach is like, oh, just chop them up into different field groups.

00:06:21.120 | It stopped working when you have client files, client customers of your contract, and then

00:06:27.680 | customer addresses.

00:06:28.680 | Those kind of need to go together.

00:06:29.680 | Otherwise, weird things happen with the large language model.

00:06:31.680 | And so those kind of things, it would learn to group together.

00:06:33.940 | And then being able to do things like when it went to go extract the data, rather than

00:06:37.860 | us pre-deciding what it should do, it agentically would figure out, I'm going to call this to

00:06:43.020 | get these parts of the data.

00:06:44.700 | Maybe it's going to look at the picture of the pages in addition to just the OCR.

00:06:49.940 | And then we incorporated a quality feedback loop, not just to give you confidence, but then

00:06:54.080 | also to give feedback so that the AI could try different techniques.

00:06:57.680 | It looks like field number three is wrong, so all right, well, let me try again.

00:07:02.100 | Maybe I'll use different models to vote and to do other techniques.

00:07:05.740 | And this approach really solved a lot of our problems, not just because it solved the issues

00:07:11.100 | at that moment, but because it became easy for us to update.

00:07:15.840 | And this really sort of is the key of what we learned here, which is that when you're thinking

00:07:22.280 | of building these intelligent-powered solutions, if you do an agentic-based approach, it's

00:07:29.060 | a much cleaner abstraction.

00:07:31.200 | And you start to, from an engineering perspective, especially if you're dealing with large-scale

00:07:34.900 | systems, you start to separate out, rather than it be like, okay, we need a large-scale

00:07:38.460 | conversion, an OCR system to process all these things.

00:07:42.000 | You start to think of it like, no, I've got one document, and I've got to get through these

00:07:45.940 | fields, and I'm just going to think of it the way that you would do it as a person or as

00:07:49.360 | a team of people.

00:07:50.360 | And this really helped the abstraction for us to then go in and be able to improve it.

00:07:54.840 | And this is maybe the biggest benefit was, it was very easy to evolve.

00:07:58.840 | We got to the point where we were saying, oh, for this kind of least document and for this

00:08:02.560 | other kind of other document, it's going to make sometimes a specialized agent who had

00:08:05.640 | his own specialized routine to do these things.

00:08:08.260 | And the ability for us to quickly evolve, rather than say, oh, I know, we'll build a new distributed

00:08:13.260 | large-scale system, but instead say, let's just add a new supervisor to the graph to double-check

00:08:18.800 | the results when you're done.

00:08:20.380 | This let us quickly evolve.

00:08:22.180 | And then so that when a customer came to us and said, it's not working very well on this

00:08:25.680 | new crazy type of document I'm giving it, we could say, ah, give us a little bit of time

00:08:28.700 | to build you a slightly updated agent to do these things.

00:08:32.900 | And the last piece here is, and I didn't quite fully realize this at the time, was that by making

00:08:39.620 | your engineers think about AI and agentic workflows and think about the kind of lessons that you

00:08:45.120 | learn when you're building these things, they then start to think about customers.

00:08:49.500 | So many of our customers are actually building their own Landgraf-powered or other system-powered

00:08:53.800 | agents.

00:08:54.880 | And then so they'll call us as a tool.

00:08:56.680 | And so then our engineering teams will now be like, oh, I have some ideas on how it might

00:09:00.360 | be easier for us to make the tools that call us to do these kind of data extraction steps

00:09:05.000 | or anything else easier.

00:09:06.960 | And so this is one of the key lessons was, as many people are on this quest to build an

00:09:11.340 | AI-first engineering organization, these kind of like actually building this way helps quite

00:09:15.500 | a bit.

00:09:16.800 | So if I went back in time to talk to myself before, or if you asked me for advice on anybody

00:09:22.560 | who's got an existing system, and you said, I'm going to build some intelligent features,

00:09:26.840 | what advice would I give, my number one piece of advice, and I think I'm the last speaker

00:09:30.900 | here, so maybe this is a piece of advice that can hopefully summarize part of this conference

00:09:35.280 | is, if you think you have, if you start to go down the path of building something, build

00:09:40.140 | agentic, build it early.

00:09:41.720 | And with that, thanks, everyone.

00:09:43.520 | Thank you very much.

00:09:44.520 | Thank you very much.

00:09:45.520 | Thank you very much.

00:09:46.520 | Thank you very much.

00:09:47.520 | Thank you very much.

00:09:48.520 | Thank you very much.

00:09:49.520 | I appreciate it.