The Billable Hour is Dead; Long Live the Billable Hour — Kevin Madura + Mo Bhasin, Alix Partners

00:00:00.000 | I'm Mo. I'm director of AI products at Alex Partners. Prior to this, I was a co-founder of an

00:00:21.440 | anomaly detection startup, and prior to that, I was a data scientist at Google. Together, we co-lead

00:00:27.800 | the development of an internal gen AI platform. We've been working for the last two years.

00:00:33.160 | We have 20 engineers. We've scaled it to 50 deployments and hundreds of users, and we're

00:00:37.800 | excited to tell you everything we've learned on that journey.

00:00:40.040 | Kevin Madera: Great. I'm Kevin Madera. I help companies, courts, and regulators understand

00:00:44.660 | new technologies like AI and LLMs. As Mo mentioned, both of us work at a company called Alex Partners.

00:00:50.160 | It's a global management consulting firm. I realize lots of you in this room might be rolling

00:00:54.440 | your eyes at that, rightfully so, but I like to think our firm does a little bit more than

00:00:57.600 | deliver Powerpoints. We actually roll up our sleeves and solve problems, whether that's

00:01:01.520 | coding or actually getting into the weeds of things. We're here to talk to you today about

00:01:06.000 | really three different things. One is how we see AI reshaping knowledge work as we see it today,

00:01:11.120 | so a lot of how it's impacting professional services, advisory services, that sort of thing.

00:01:15.680 | We'll bring three real-life use cases that we'll walk through in terms of how we've actually deployed

00:01:20.320 | it realistically, concretely within the way that we work in our business, and then wrap up with what

00:01:25.120 | doesn't work and where we see things going. So some of you here might recognize this chart from an

00:01:30.560 | organization called Meter, which evaluates the ability for LLMs to complete a certain set of tasks,

00:01:37.840 | and it very specifically measures the length of task that LLMs can complete, at least with 50%

00:01:43.120 | success rate. And so the takeoff rate is pretty significant here. Now, we think that's mostly because

00:01:50.960 | it's a verifiable domain, and as we all know, model capabilities are a little bit jagged, so they

00:01:55.520 | perform very, very well in software development, maybe not so well in non-verifiable or more messy

00:02:02.400 | domains like knowledge work. So we think it's a rough proxy for the coming disruption for professional

00:02:08.400 | services and knowledge work more broadly. Do we think the takeoff will be as steep as software

00:02:13.280 | engineering? Probably not, just because of the messiness of the real world, if you will.

00:02:17.600 | And for those of you not familiar, there's typically two main models for professional services. One is the

00:02:24.480 | junior-led model. This is where you have very senior individuals and more junior individuals provide that

00:02:30.320 | leverage. So it's a lot of directing, okay, do this, and you throw 50 people at a problem, and they kind of

00:02:35.200 | figure it out and probably waste some time in doing so. There's also the senior-led model, which is more senior

00:02:40.880 | folks who have 15, 20 years of experience. They are much more involved in the day-to-day. They're actually

00:02:46.160 | doing the work, delivering the work. This is the Alex Partners model, where it's a little bit less

00:02:50.400 | leverage, but we can deliver results a lot faster and more impactfully because it's the senior-led

00:02:57.360 | folks. We think the future is probably somewhat of a hybrid, but we think because of model capabilities

00:03:03.600 | and how quickly they're advancing, it really provides that those more experienced folks, so

00:03:08.880 | people who have been in a particular domain or industry for 15, 20 years. If you've listened to

00:03:14.560 | Dwarkesh Patel and his podcast, fantastic podcast, he has this concept of an AI-first firm, where you can

00:03:20.480 | basically take the knowledge and start to replicate that out. So you can have 50 copies of the CEO, as an

00:03:24.960 | example. We think the future is something like that, where you have, you're basically replicating the

00:03:30.080 | knowledge, experience of more senior individuals, and you provide and you scale out that leverage

00:03:35.280 | below using AI to do so. And so the way we think about typical engagements, it roughly falls into

00:03:43.360 | these three different buckets, not always, but just for demonstration purposes. There's a lot of upfront

00:03:48.320 | work initially, whether it's an M&A transaction, a corporate investigation, some type of due diligence.

00:03:54.880 | Oftentimes you're left with a bunch of PDFs, databases, Excels, whatever it might be.

00:04:00.080 | There's just a lot of upfront work to just understand what you've got, right? Just ingest the data,

00:04:06.240 | normalize it, categorize things, put it into a framework that you can then use to do what you do

00:04:12.320 | best, which whatever that might be. If you're a private equity expert or investigator, whatever it is,

00:04:18.240 | you typically have some type of playbook and that's phase two, which is the black part, which is the

00:04:23.360 | analysis, the hypothesis generation. You're basically getting all that data into a format that then you

00:04:28.400 | can take and use and derive some type of insights from. And all of that, of course, is in support of

00:04:35.760 | the last piece, which is really what clients actually care about, which is you solving their business

00:04:40.960 | problem. That's the recommendation, the deliverable, the output, whatever that might be. That's the reason that

00:04:46.400 | they've hired you in the first place. We're seeing AI today just significantly compressing at minimum

00:04:53.280 | that first part. So if it was 50%, maybe it's 10 to 20% today in terms of what's required from a human

00:05:00.880 | perspective just to get up to speed about understanding the contents of a data room or whatever it might be.

00:05:06.240 | And it's not only that, because to date you're largely limited by the throughput of human beings. So you

00:05:13.680 | think of doc review as an example. If you have 5,000 different contracts, Box is a perfect

00:05:19.520 | precursor to this talk because that's exactly what they do. If you have 5,000 contracts, think of how

00:05:26.800 | many people it would take if it takes 30 minutes to review each and every contract. You have 5,000 of them.

00:05:32.560 | You want to extract some type of information from it. You're inherently limited by either time or cost.

00:05:37.680 | And so inevitably there's some type of prioritization that occurs. You're only focusing on kind of the

00:05:42.640 | top 20% or whatever it might be, the most valuable pieces of the data. With AI, that's completely changed,

00:05:49.680 | right? You can now look at 100% of the corpus of data, whatever that might be. And you can start to

00:05:55.200 | derive insights. You can apply your same methodology, your analysis, your insights to all of the data now,

00:06:01.280 | because you're able to extract that information from across 100% of the data set. So now you can

00:06:06.160 | look at 100% of the vendor contracts, 100% of the customer base. You can start to derive those insights,

00:06:12.560 | to identify savings opportunities, free up time to do more interviews, whatever it might be. You're freed

00:06:18.000 | up to do much more high value work. And the value is that because it's on across 100% of the data,

00:06:24.560 | instead of just the first 20 or so percent, the output is just that much better. So to bring to

00:06:28.960 | life a little bit, I'll turn it over to Mo to talk through some real life examples.

00:06:32.080 | Mo to talk through some real life examples. Thanks, Kevin.

00:06:33.520 | So to motivate the use cases that we have, I want to start with the paradox that we face.

00:06:40.720 | Everyone's investing in AI. 89% of CEOs said that they're planning to implement agentic AI,

00:06:47.680 | according to Deloitte. But we find ourselves in this paradox where National Bureau of Economic

00:06:53.680 | Research says that there's been no significant impact on earnings or recorded hours.

00:06:57.280 | BCG says that three quarters of company failed to struggle and achieve and scale value with their

00:07:03.040 | Gen AI initiatives. And then finally, S&P Global said that almost half the companies were abandoning

00:07:10.640 | their AI initiatives this year. So how is it that everyone's spending, but no one's seeing the value?

00:07:15.760 | We think there's a difference between employee productivity and enterprise productivity. And so we

00:07:20.880 | want to talk about the use cases that we found that helped drive enterprise productivity.

00:07:25.200 | So the first example I want to start with is categorization. Maybe trying to put a square peg

00:07:33.280 | in a round hole. How does this show up for us? I think if you have IT support tickets, your laptop

00:07:40.880 | keeps restarting and that needs to be triaged in the hardware department. You need to categorize those

00:07:45.360 | ticket support accordingly. Something closer to home is we analyze companies a lot. And so we want to look

00:07:51.840 | at accounts payables or spend data across companies. And we need to say, what is United Airlines? All right,

00:07:57.280 | if it's under travel. How was this done before? Does anyone remember word clouds? You'd have to build a

00:08:05.280 | machine learning model. You'd have to stem your data, remove stop words, build a classifier, support vector

00:08:13.200 | machines, naive base. It's a lot of work. Enter the new way, structured outputs. So with structured outputs,

00:08:21.840 | you can get the answer a lot easier. This is unsupervised learning. This is literally what that

00:08:26.240 | would look like. Say you have a list of companies, JD factors, and you have to categorize it into a taxonomy.

00:08:31.760 | Here, the taxonomy would be the North American industry classification system, the NAICS codes.

00:08:37.200 | Each code has a description. And in this case, it would be other cache management, for instance.

00:08:42.160 | Typically, JD factors is probably not part of the foundational model's knowledge. So how do we

00:08:49.280 | ensure that the classification works well? Enter tool call. You can run a web query to append information

00:08:55.760 | to each of these pieces of -- to each of these companies and then categorize enormous volumes.

00:09:00.720 | So this is what we've been doing. And we found that we've had huge wins from this. So what this has

00:09:08.080 | done is this democratized access to text classification for us. I want to talk about the learnings that we've

00:09:15.360 | had from deploying this surgically at our company. Enormous wins in speed and accuracy. Those accuracy gains have

00:09:23.200 | not come cheaply. This might be unsupervised learning, but it's not unchecked. We've had to have the right

00:09:29.600 | relationships with the business partners who've worked hand in hand with us to ensure that we get to the

00:09:33.200 | accuracy that we wanted. What this does is convert skeptics into champions. We don't become snake oil

00:09:39.680 | salesmen pushing and peddling AI. It becomes a pull from the firm that's asking us, "Hey, can you use this,

00:09:45.280 | or can you apply gen AI for us in these other initiatives?" Which is really powerful. It's important

00:09:50.720 | to have business context. That gets embedded for us in those taxonomies which are being used for

00:09:55.040 | classification. Everyone's talking about agents. Well, you need to get the individual steps right

00:10:01.600 | correctly. And what this does is it builds that individual step to a high level of robustness and

00:10:05.840 | accuracy that we can daisy chain into the agentic workflows that we want. And finally, you know,

00:10:11.280 | a callout is that these results are stochastic and not necessarily deterministic. That comes with some

00:10:16.640 | risks. Kevin will talk more about those. Punchline here, we've been able to achieve 95% accuracy across

00:10:23.440 | 10,000-- categorizing 10,000 vendors, doing in minutes what would have taken days at an order of magnitude less

00:10:30.320 | cost. All right. Next use case. This wouldn't be an AI conference if we didn't talk about RAG. So what do we--

00:10:37.760 | how do we-- how do we-- how do we see RAG at our firm? You get dumped with a bunch of data. Here's 80 gigs of

00:10:43.920 | internal documents. What did ACME release in 2020? Let's say you've got a court filing that you have to

00:10:49.920 | submit on Monday and it's Friday. You know, you might get asked the question, "What is ACME's escalation

00:10:54.400 | procedures for reporting safety violations?" How do we do this in the past? You'd have an index, a literal

00:11:00.160 | index. Someone would say, "In an Excel file, what documents have been received, what documents haven't been

00:11:04.640 | received and where are they?" Or hope not, but maybe you'd use search and you have SharePoint search or

00:11:11.600 | something like that that probably wouldn't find you what you're looking for. Well, what do we do now?

00:11:16.320 | We have an enterprise-scale RAG app. It has to handle hundreds of gigabytes of data, PowerPoints,

00:11:22.480 | documents, Excel, CSVs, all sorts of formats, and huge volumes. What can you append to that? You can

00:11:29.120 | append tool calls to third-party proprietary databases. Let me talk about that for a second.

00:11:33.440 | What are the trade-offs that we've had? Sorry, I'm going really fast, short on time.

00:11:38.080 | The wins and the losses. So it's been--RAG is invaluable at consulting companies because you get

00:11:45.520 | dumped on a project really quick and you have to get up to speed, so it ends up being really valuable.

00:11:50.240 | But I want to call out the teaching LLM APIs part. Typically, certain data sources would be siloed

00:11:58.560 | behind organizations that had licenses that would have to pull information from a web UI. That would

00:12:03.600 | then be emailed to a certain team, and then that team would analyze the Excel. Well, what we did was we

00:12:08.880 | took the API spec, embedded it, and taught the LLM how to call an API. We have democratized access to

00:12:14.080 | information that would otherwise have taken days for people to use. Really condensing the time, as Kevin said

00:12:18.240 | before, in some of the projects on the high-value work. The last thing to call out about RAG is that

00:12:24.000 | it serves as a substrate on which you can tack on a number of Gen AI features that's proven really

00:12:28.880 | valuable for us at our firm. A number of call-outs, you know, people have high expectations on what they

00:12:34.160 | want to receive from a prompt box. If you say reason across all documents, that's just not how RAG works.

00:12:39.680 | So we have to build those solutions step by step, and it's a long journey that we have to go on,

00:12:43.680 | and we're excited to be on it. With that, over to Kevin and the third use case.

00:12:47.360 | Yeah. Thanks. So it's a good thing Box went before us, because they covered a lot of the advantages

00:12:53.200 | of the ability, fundamentally, to take unstructured data and create structure from that. It is an

00:12:59.280 | unbelievably powerful concept. It's very simple on its face, but it is incredibly powerful in an

00:13:04.240 | enterprise context, because you can take something like this credit agreement, it's 50 or so pages long,

00:13:09.920 | in terms of a PDF, and you can very quickly extract information that's useful, like contract parties,

00:13:15.200 | maturity date, senior lenders, whoever that might be. And so you see folks like Jason Liu,

00:13:21.360 | Pynantic is all you need. It is still true, it is still all you need. And fundamentally, what this looks

00:13:26.560 | like, and Box went through a lot of it, but it's combining a document with a schema, with an LLM, with

00:13:32.880 | some validation and scaffolding around it to make sure that you're pulling out the values that you need. And the

00:13:38.960 | business value really is in the schema of what you're actually what you're extracting and why you're

00:13:43.440 | extracting that information. It's the flexibility that is really powerful here, because you can start to

00:13:48.960 | reapply it across different types of engagements. Investigations might be looking at something entirely

00:13:54.160 | different than an M&A transaction. This fundamental capability can span across all those.

00:13:59.120 | And the power is there at the bottom, where you can do this type of thing repeatedly across

00:14:04.000 | multiple documents of thousands, tens of thousands, hundreds of thousands of documents,

00:14:08.320 | where doing a human review might take days or weeks. Using an LLM, you can get it down to minutes.

00:14:13.280 | It's incredibly powerful. In terms of user trust, we not only are using external sources like Box and others as

00:14:21.280 | well, but we've rolled our own internally as well. And so in terms of just exposing some of the model

00:14:28.400 | internals to users to have somewhat of an off-ramp for them to understand where the model is more or

00:14:34.000 | less confident, we use the logprobs that's returned from the OpenAI API, and we align that with the output

00:14:40.800 | schema from structured outputs. So we ignore all the JSON data, we ignore the field names themselves, we just

00:14:46.720 | hone in on the values themselves. So in this case, the green box above the interest rate of LIBOR

00:14:52.560 | plus one percent per annum, that's the field that we want. We basically take the geometric mean of the

00:14:58.480 | logprobs associated with those tokens in particular, and use that as a rough proxy of the model's confidence

00:15:04.720 | in producing that output. So the boxes way at the beginning that you saw in terms of green and yellow is a

00:15:12.160 | direct reflection of the confidence level. So it's a really relatively intuitive way for users to get

00:15:17.440 | an understanding of the model's confidence, again, for human review to the extent that's needed.

00:15:21.680 | I won't go through all these, but fundamentally, like I said, it is magic when it works, and it works at

00:15:27.200 | scale. It is a total unlock, particularly for non-technical folks who are not up to speed with the

00:15:31.600 | capabilities of LLMs. To be able to do this is a light switch, light bulb moment for them, and it really is a game

00:15:37.920 | changer. Now, that being said, there's a lot of work to be done in terms of validation. You saw all the

00:15:43.520 | work that Box and others have done in terms of getting it to a level of rigor that users can trust,

00:15:48.960 | and so that's really a key tenet for all this. And so finally, I'll turn it to Mo for the must-haves.

00:15:55.680 | So just a couple quick call-outs. I know this is a tech conference, but a lot of this, to get to work at the

00:16:01.840 | enterprise requires people skills and working closely with the organization. There are a couple things I want to

00:16:06.080 | call out that have been really important for us to scale our Gen AI initiatives at our firm.

00:16:09.520 | The first one is demos. We build in Streamlit, but we prototype in Streamlit, but we build in React.

00:16:17.120 | And so we have a constant cadence once a month that we show the latest and greatest of what we're

00:16:21.200 | building. This inspires the firm in what we're able to build and continue to invest in our initiatives.

00:16:26.240 | And the second thing is, you know, there's always the next shiny thing. Agents, MCP,

00:16:33.120 | the latest model. NPS is our metric. ROI is our metric, and that is one hard-earned, one bug fix at a

00:16:40.560 | time. I'll skip the other one. You know, partnerships are really important. It's a shared journey, so.

00:16:45.680 | And I think we're out of time, but I'll leave you with this. Once Excel-powered

00:16:51.200 | LLMs actually work, we will be at AGI. So I'm looking forward to that next talk. Thank you. Thank you.

The Billable Hour is Dead; Long Live the Billable Hour — Kevin Madura + Mo Bhasin, Alix Partners

Chapters