back to indexThe Billable Hour is Dead; Long Live the Billable Hour — Kevin Madura + Mo Bhasin, Alix Partners

Chapters
0:0 Introduction to Alix Partners and the AI Shift
1:5 How AI is Reshaping Knowledge Work
2:19 The Future of Professional Services Models with AI
3:36 AI's Impact on the Three Phases of Engagements
5:7 Scaling Data Analysis Beyond Human Limitations
6:36 The Paradox of AI Investment and Productivity
7:22 Use Case 1: Categorization with Structured Outputs
10:34 Use Case 2: Retrieval-Augmented Generation (RAG)
12:46 Use Case 3: Structured Data Extraction from Unstructured Data
15:54 Key Requirements for Scaling GenAI Initiatives
16:48 Final Thoughts: The Future of LLMs in the Enterprise
00:00:00.000 |
I'm Mo. I'm director of AI products at Alex Partners. Prior to this, I was a co-founder of an 00:00:21.440 |
anomaly detection startup, and prior to that, I was a data scientist at Google. Together, we co-lead 00:00:27.800 |
the development of an internal gen AI platform. We've been working for the last two years. 00:00:33.160 |
We have 20 engineers. We've scaled it to 50 deployments and hundreds of users, and we're 00:00:37.800 |
excited to tell you everything we've learned on that journey. 00:00:40.040 |
Kevin Madera: Great. I'm Kevin Madera. I help companies, courts, and regulators understand 00:00:44.660 |
new technologies like AI and LLMs. As Mo mentioned, both of us work at a company called Alex Partners. 00:00:50.160 |
It's a global management consulting firm. I realize lots of you in this room might be rolling 00:00:54.440 |
your eyes at that, rightfully so, but I like to think our firm does a little bit more than 00:00:57.600 |
deliver Powerpoints. We actually roll up our sleeves and solve problems, whether that's 00:01:01.520 |
coding or actually getting into the weeds of things. We're here to talk to you today about 00:01:06.000 |
really three different things. One is how we see AI reshaping knowledge work as we see it today, 00:01:11.120 |
so a lot of how it's impacting professional services, advisory services, that sort of thing. 00:01:15.680 |
We'll bring three real-life use cases that we'll walk through in terms of how we've actually deployed 00:01:20.320 |
it realistically, concretely within the way that we work in our business, and then wrap up with what 00:01:25.120 |
doesn't work and where we see things going. So some of you here might recognize this chart from an 00:01:30.560 |
organization called Meter, which evaluates the ability for LLMs to complete a certain set of tasks, 00:01:37.840 |
and it very specifically measures the length of task that LLMs can complete, at least with 50% 00:01:43.120 |
success rate. And so the takeoff rate is pretty significant here. Now, we think that's mostly because 00:01:50.960 |
it's a verifiable domain, and as we all know, model capabilities are a little bit jagged, so they 00:01:55.520 |
perform very, very well in software development, maybe not so well in non-verifiable or more messy 00:02:02.400 |
domains like knowledge work. So we think it's a rough proxy for the coming disruption for professional 00:02:08.400 |
services and knowledge work more broadly. Do we think the takeoff will be as steep as software 00:02:13.280 |
engineering? Probably not, just because of the messiness of the real world, if you will. 00:02:17.600 |
And for those of you not familiar, there's typically two main models for professional services. One is the 00:02:24.480 |
junior-led model. This is where you have very senior individuals and more junior individuals provide that 00:02:30.320 |
leverage. So it's a lot of directing, okay, do this, and you throw 50 people at a problem, and they kind of 00:02:35.200 |
figure it out and probably waste some time in doing so. There's also the senior-led model, which is more senior 00:02:40.880 |
folks who have 15, 20 years of experience. They are much more involved in the day-to-day. They're actually 00:02:46.160 |
doing the work, delivering the work. This is the Alex Partners model, where it's a little bit less 00:02:50.400 |
leverage, but we can deliver results a lot faster and more impactfully because it's the senior-led 00:02:57.360 |
folks. We think the future is probably somewhat of a hybrid, but we think because of model capabilities 00:03:03.600 |
and how quickly they're advancing, it really provides that those more experienced folks, so 00:03:08.880 |
people who have been in a particular domain or industry for 15, 20 years. If you've listened to 00:03:14.560 |
Dwarkesh Patel and his podcast, fantastic podcast, he has this concept of an AI-first firm, where you can 00:03:20.480 |
basically take the knowledge and start to replicate that out. So you can have 50 copies of the CEO, as an 00:03:24.960 |
example. We think the future is something like that, where you have, you're basically replicating the 00:03:30.080 |
knowledge, experience of more senior individuals, and you provide and you scale out that leverage 00:03:35.280 |
below using AI to do so. And so the way we think about typical engagements, it roughly falls into 00:03:43.360 |
these three different buckets, not always, but just for demonstration purposes. There's a lot of upfront 00:03:48.320 |
work initially, whether it's an M&A transaction, a corporate investigation, some type of due diligence. 00:03:54.880 |
Oftentimes you're left with a bunch of PDFs, databases, Excels, whatever it might be. 00:04:00.080 |
There's just a lot of upfront work to just understand what you've got, right? Just ingest the data, 00:04:06.240 |
normalize it, categorize things, put it into a framework that you can then use to do what you do 00:04:12.320 |
best, which whatever that might be. If you're a private equity expert or investigator, whatever it is, 00:04:18.240 |
you typically have some type of playbook and that's phase two, which is the black part, which is the 00:04:23.360 |
analysis, the hypothesis generation. You're basically getting all that data into a format that then you 00:04:28.400 |
can take and use and derive some type of insights from. And all of that, of course, is in support of 00:04:35.760 |
the last piece, which is really what clients actually care about, which is you solving their business 00:04:40.960 |
problem. That's the recommendation, the deliverable, the output, whatever that might be. That's the reason that 00:04:46.400 |
they've hired you in the first place. We're seeing AI today just significantly compressing at minimum 00:04:53.280 |
that first part. So if it was 50%, maybe it's 10 to 20% today in terms of what's required from a human 00:05:00.880 |
perspective just to get up to speed about understanding the contents of a data room or whatever it might be. 00:05:06.240 |
And it's not only that, because to date you're largely limited by the throughput of human beings. So you 00:05:13.680 |
think of doc review as an example. If you have 5,000 different contracts, Box is a perfect 00:05:19.520 |
precursor to this talk because that's exactly what they do. If you have 5,000 contracts, think of how 00:05:26.800 |
many people it would take if it takes 30 minutes to review each and every contract. You have 5,000 of them. 00:05:32.560 |
You want to extract some type of information from it. You're inherently limited by either time or cost. 00:05:37.680 |
And so inevitably there's some type of prioritization that occurs. You're only focusing on kind of the 00:05:42.640 |
top 20% or whatever it might be, the most valuable pieces of the data. With AI, that's completely changed, 00:05:49.680 |
right? You can now look at 100% of the corpus of data, whatever that might be. And you can start to 00:05:55.200 |
derive insights. You can apply your same methodology, your analysis, your insights to all of the data now, 00:06:01.280 |
because you're able to extract that information from across 100% of the data set. So now you can 00:06:06.160 |
look at 100% of the vendor contracts, 100% of the customer base. You can start to derive those insights, 00:06:12.560 |
to identify savings opportunities, free up time to do more interviews, whatever it might be. You're freed 00:06:18.000 |
up to do much more high value work. And the value is that because it's on across 100% of the data, 00:06:24.560 |
instead of just the first 20 or so percent, the output is just that much better. So to bring to 00:06:28.960 |
life a little bit, I'll turn it over to Mo to talk through some real life examples. 00:06:32.080 |
Mo to talk through some real life examples. Thanks, Kevin. 00:06:33.520 |
So to motivate the use cases that we have, I want to start with the paradox that we face. 00:06:40.720 |
Everyone's investing in AI. 89% of CEOs said that they're planning to implement agentic AI, 00:06:47.680 |
according to Deloitte. But we find ourselves in this paradox where National Bureau of Economic 00:06:53.680 |
Research says that there's been no significant impact on earnings or recorded hours. 00:06:57.280 |
BCG says that three quarters of company failed to struggle and achieve and scale value with their 00:07:03.040 |
Gen AI initiatives. And then finally, S&P Global said that almost half the companies were abandoning 00:07:10.640 |
their AI initiatives this year. So how is it that everyone's spending, but no one's seeing the value? 00:07:15.760 |
We think there's a difference between employee productivity and enterprise productivity. And so we 00:07:20.880 |
want to talk about the use cases that we found that helped drive enterprise productivity. 00:07:25.200 |
So the first example I want to start with is categorization. Maybe trying to put a square peg 00:07:33.280 |
in a round hole. How does this show up for us? I think if you have IT support tickets, your laptop 00:07:40.880 |
keeps restarting and that needs to be triaged in the hardware department. You need to categorize those 00:07:45.360 |
ticket support accordingly. Something closer to home is we analyze companies a lot. And so we want to look 00:07:51.840 |
at accounts payables or spend data across companies. And we need to say, what is United Airlines? All right, 00:07:57.280 |
if it's under travel. How was this done before? Does anyone remember word clouds? You'd have to build a 00:08:05.280 |
machine learning model. You'd have to stem your data, remove stop words, build a classifier, support vector 00:08:13.200 |
machines, naive base. It's a lot of work. Enter the new way, structured outputs. So with structured outputs, 00:08:21.840 |
you can get the answer a lot easier. This is unsupervised learning. This is literally what that 00:08:26.240 |
would look like. Say you have a list of companies, JD factors, and you have to categorize it into a taxonomy. 00:08:31.760 |
Here, the taxonomy would be the North American industry classification system, the NAICS codes. 00:08:37.200 |
Each code has a description. And in this case, it would be other cache management, for instance. 00:08:42.160 |
Typically, JD factors is probably not part of the foundational model's knowledge. So how do we 00:08:49.280 |
ensure that the classification works well? Enter tool call. You can run a web query to append information 00:08:55.760 |
to each of these pieces of -- to each of these companies and then categorize enormous volumes. 00:09:00.720 |
So this is what we've been doing. And we found that we've had huge wins from this. So what this has 00:09:08.080 |
done is this democratized access to text classification for us. I want to talk about the learnings that we've 00:09:15.360 |
had from deploying this surgically at our company. Enormous wins in speed and accuracy. Those accuracy gains have 00:09:23.200 |
not come cheaply. This might be unsupervised learning, but it's not unchecked. We've had to have the right 00:09:29.600 |
relationships with the business partners who've worked hand in hand with us to ensure that we get to the 00:09:33.200 |
accuracy that we wanted. What this does is convert skeptics into champions. We don't become snake oil 00:09:39.680 |
salesmen pushing and peddling AI. It becomes a pull from the firm that's asking us, "Hey, can you use this, 00:09:45.280 |
or can you apply gen AI for us in these other initiatives?" Which is really powerful. It's important 00:09:50.720 |
to have business context. That gets embedded for us in those taxonomies which are being used for 00:09:55.040 |
classification. Everyone's talking about agents. Well, you need to get the individual steps right 00:10:01.600 |
correctly. And what this does is it builds that individual step to a high level of robustness and 00:10:05.840 |
accuracy that we can daisy chain into the agentic workflows that we want. And finally, you know, 00:10:11.280 |
a callout is that these results are stochastic and not necessarily deterministic. That comes with some 00:10:16.640 |
risks. Kevin will talk more about those. Punchline here, we've been able to achieve 95% accuracy across 00:10:23.440 |
10,000-- categorizing 10,000 vendors, doing in minutes what would have taken days at an order of magnitude less 00:10:30.320 |
cost. All right. Next use case. This wouldn't be an AI conference if we didn't talk about RAG. So what do we-- 00:10:37.760 |
how do we-- how do we-- how do we see RAG at our firm? You get dumped with a bunch of data. Here's 80 gigs of 00:10:43.920 |
internal documents. What did ACME release in 2020? Let's say you've got a court filing that you have to 00:10:49.920 |
submit on Monday and it's Friday. You know, you might get asked the question, "What is ACME's escalation 00:10:54.400 |
procedures for reporting safety violations?" How do we do this in the past? You'd have an index, a literal 00:11:00.160 |
index. Someone would say, "In an Excel file, what documents have been received, what documents haven't been 00:11:04.640 |
received and where are they?" Or hope not, but maybe you'd use search and you have SharePoint search or 00:11:11.600 |
something like that that probably wouldn't find you what you're looking for. Well, what do we do now? 00:11:16.320 |
We have an enterprise-scale RAG app. It has to handle hundreds of gigabytes of data, PowerPoints, 00:11:22.480 |
documents, Excel, CSVs, all sorts of formats, and huge volumes. What can you append to that? You can 00:11:29.120 |
append tool calls to third-party proprietary databases. Let me talk about that for a second. 00:11:33.440 |
What are the trade-offs that we've had? Sorry, I'm going really fast, short on time. 00:11:38.080 |
The wins and the losses. So it's been--RAG is invaluable at consulting companies because you get 00:11:45.520 |
dumped on a project really quick and you have to get up to speed, so it ends up being really valuable. 00:11:50.240 |
But I want to call out the teaching LLM APIs part. Typically, certain data sources would be siloed 00:11:58.560 |
behind organizations that had licenses that would have to pull information from a web UI. That would 00:12:03.600 |
then be emailed to a certain team, and then that team would analyze the Excel. Well, what we did was we 00:12:08.880 |
took the API spec, embedded it, and taught the LLM how to call an API. We have democratized access to 00:12:14.080 |
information that would otherwise have taken days for people to use. Really condensing the time, as Kevin said 00:12:18.240 |
before, in some of the projects on the high-value work. The last thing to call out about RAG is that 00:12:24.000 |
it serves as a substrate on which you can tack on a number of Gen AI features that's proven really 00:12:28.880 |
valuable for us at our firm. A number of call-outs, you know, people have high expectations on what they 00:12:34.160 |
want to receive from a prompt box. If you say reason across all documents, that's just not how RAG works. 00:12:39.680 |
So we have to build those solutions step by step, and it's a long journey that we have to go on, 00:12:43.680 |
and we're excited to be on it. With that, over to Kevin and the third use case. 00:12:47.360 |
Yeah. Thanks. So it's a good thing Box went before us, because they covered a lot of the advantages 00:12:53.200 |
of the ability, fundamentally, to take unstructured data and create structure from that. It is an 00:12:59.280 |
unbelievably powerful concept. It's very simple on its face, but it is incredibly powerful in an 00:13:04.240 |
enterprise context, because you can take something like this credit agreement, it's 50 or so pages long, 00:13:09.920 |
in terms of a PDF, and you can very quickly extract information that's useful, like contract parties, 00:13:15.200 |
maturity date, senior lenders, whoever that might be. And so you see folks like Jason Liu, 00:13:21.360 |
Pynantic is all you need. It is still true, it is still all you need. And fundamentally, what this looks 00:13:26.560 |
like, and Box went through a lot of it, but it's combining a document with a schema, with an LLM, with 00:13:32.880 |
some validation and scaffolding around it to make sure that you're pulling out the values that you need. And the 00:13:38.960 |
business value really is in the schema of what you're actually what you're extracting and why you're 00:13:43.440 |
extracting that information. It's the flexibility that is really powerful here, because you can start to 00:13:48.960 |
reapply it across different types of engagements. Investigations might be looking at something entirely 00:13:54.160 |
different than an M&A transaction. This fundamental capability can span across all those. 00:13:59.120 |
And the power is there at the bottom, where you can do this type of thing repeatedly across 00:14:04.000 |
multiple documents of thousands, tens of thousands, hundreds of thousands of documents, 00:14:08.320 |
where doing a human review might take days or weeks. Using an LLM, you can get it down to minutes. 00:14:13.280 |
It's incredibly powerful. In terms of user trust, we not only are using external sources like Box and others as 00:14:21.280 |
well, but we've rolled our own internally as well. And so in terms of just exposing some of the model 00:14:28.400 |
internals to users to have somewhat of an off-ramp for them to understand where the model is more or 00:14:34.000 |
less confident, we use the logprobs that's returned from the OpenAI API, and we align that with the output 00:14:40.800 |
schema from structured outputs. So we ignore all the JSON data, we ignore the field names themselves, we just 00:14:46.720 |
hone in on the values themselves. So in this case, the green box above the interest rate of LIBOR 00:14:52.560 |
plus one percent per annum, that's the field that we want. We basically take the geometric mean of the 00:14:58.480 |
logprobs associated with those tokens in particular, and use that as a rough proxy of the model's confidence 00:15:04.720 |
in producing that output. So the boxes way at the beginning that you saw in terms of green and yellow is a 00:15:12.160 |
direct reflection of the confidence level. So it's a really relatively intuitive way for users to get 00:15:17.440 |
an understanding of the model's confidence, again, for human review to the extent that's needed. 00:15:21.680 |
I won't go through all these, but fundamentally, like I said, it is magic when it works, and it works at 00:15:27.200 |
scale. It is a total unlock, particularly for non-technical folks who are not up to speed with the 00:15:31.600 |
capabilities of LLMs. To be able to do this is a light switch, light bulb moment for them, and it really is a game 00:15:37.920 |
changer. Now, that being said, there's a lot of work to be done in terms of validation. You saw all the 00:15:43.520 |
work that Box and others have done in terms of getting it to a level of rigor that users can trust, 00:15:48.960 |
and so that's really a key tenet for all this. And so finally, I'll turn it to Mo for the must-haves. 00:15:55.680 |
So just a couple quick call-outs. I know this is a tech conference, but a lot of this, to get to work at the 00:16:01.840 |
enterprise requires people skills and working closely with the organization. There are a couple things I want to 00:16:06.080 |
call out that have been really important for us to scale our Gen AI initiatives at our firm. 00:16:09.520 |
The first one is demos. We build in Streamlit, but we prototype in Streamlit, but we build in React. 00:16:17.120 |
And so we have a constant cadence once a month that we show the latest and greatest of what we're 00:16:21.200 |
building. This inspires the firm in what we're able to build and continue to invest in our initiatives. 00:16:26.240 |
And the second thing is, you know, there's always the next shiny thing. Agents, MCP, 00:16:33.120 |
the latest model. NPS is our metric. ROI is our metric, and that is one hard-earned, one bug fix at a 00:16:40.560 |
time. I'll skip the other one. You know, partnerships are really important. It's a shared journey, so. 00:16:45.680 |
And I think we're out of time, but I'll leave you with this. Once Excel-powered 00:16:51.200 |
LLMs actually work, we will be at AGI. So I'm looking forward to that next talk. Thank you. Thank you.