Hi everyone. Thanks for that welcome. As you just heard, my name is James Lowe. I'm head of AI engineering at the incubator for AI. We're a small team of experts in the UK government. We were created by 10 Downing Street to deliver public good using AI and we do that via experimentation and product building.
The UK government delivers for its citizens. It spends over a trillion pounds delivering for its over 70 million citizens. So there's a lot to play for. At the incubator for AI, we deliver products that, a wide range of products, all the way from frontline services all the way up to the Prime Minister's meetings.
This remit is very wide and so we've had to get quite good at deciding what we should build. And that is what I'm here to talk to you about today. I'm going to start with a post from Andrew Ng. He says, "Writing software, especially prototypes, is becoming cheaper." This is not just because of AI coding agents and assistance, but also because AI features in products make the previously impossible possible.
He says, "This will lead to increased demand for people who can decide what to build." AI product management has a bright future. In this talk, I'm going to build on this post and I'm going to make the case for the AI product manager. I'm going to argue that AI expertise is really important for this role.
I'm going to deliver three hard-worn lessons from the incubator for AI. So I hope that whether you're a product manager, whether you're an AI engineer, or whether you're a founder, there's going to be a lot for you to learn from these lessons and help you build great AI products.
Before I talk about AI product management, I'm going to quickly recap product management. This is an extremely rich field, so I'm only going to skin the surface here. Product management can be thought of as the intercept between these three important areas. We have the business. Is your product viable?
For example, is it going to be profitable? We have the technology. Is your product feasible? For example, do we have the right skills on the team? And we have users. Most importantly of all, is your product desirable? What problem are you solving for your users? A product manager sits at the intercept of these three areas and has to balance them all to find the right path forward for the product.
Then AI comes along and makes the whole process a bit more complicated and a bit messier. It intersects with each of these areas in slightly different ways. For example, for the business, is your business happy with the fact that for AI products, a higher amount of experimentation is needed and there's a higher chance of failure?
For technology, how do you evaluate and monitor the performance of your AI? And for users, how should you handle the probabilistic nature of AI? In particular, will it work for your users? What guardrails do you need? And how do you build human in the loop? And for AI products sitting right in the middle of all of this, we have a big question, is what you're doing even possible?
An AI product manager has to resolve all of these different areas to find the right path forward. A lot of the existing product manager skillset is still very important, but now there is an increased importance in things like data and AI proficiency. AI product managers need to understand the importance of data, the necessity of evaluation, and how to deal with the probabilistic nature of AI.
What that essentially means for you is if you're a product person in this room, if you're a product manager, it's the importance of upskilling in AI. But what it also means is if you're an AI engineer or someone more technical, that actually that is a good background also to go into the product manager space as well.
And just to be clear, when I talk about the product manager space, I actually think of this as more of a mindset than like a specific role you need on your team. What's really important is that you have someone on your team that is grappling with these four areas in order to find the path forward.
As Brett Taylor said on a recent episode of the Latent Space podcast, there is a lot of power in combining product and engineering into as few people as possible. Few great things have been created by committee. And that's exactly the point that we're stressing here. So I hope you feel excited by the prospect of adopting that AI product manager mindset.
And the question now is what lessons can you learn from the incubator for AI? The first lesson is going to come from our project called Consult, and it's going to be all about evaluating AI early. Every time the government wants to undertake a really big policy change, they need to and want to get input from the public.
And in fact, they have a legal duty to do so. They do this by consultations, which are essentially large surveys with free text responses. They run hundreds of these a year, and some of these attract hundreds of thousands of responses. Analyzing these responses can take months and cost millions of pounds.
This is a prototypical use case for AI. But when we started this project 18 months ago, we weren't sure exactly what path to take. You see, there was already precedent for using natural language programming techniques such as Burt Topic to analyze consultations. And we were under a large amount of pressure to start delivering.
So we made the mistake of going straight into product building mode. What we did is we built a product around those existing techniques. But what we found is once we started testing with real users, we found that the results were inaccurate. They were inconsistent. And they not only didn't meet user needs, but wouldn't have passed the very high legal threshold that we needed to pass.
So we went back to the drawing board and instead prioritized the AI capability first. We got data from real users and generated synthetic data to create evals which we optimized against. And then we started testing the outputs as well with real users. And we developed that into a package which we call theme finder, which has now been open source that other people can benefit from it.
What we found was that the output of this package was not only comparable to what humans were doing, but it was a thousand times faster and 400 times cheaper. Most importantly of all, by prioritizing the AI capability, what we found was the key points in the package in the pipeline where human input and human in the loop was really valuable.
That meant the product that the product that we then went on to build was actually different from the one we originally envisioned. That shows that starting with the AI capability and getting that right not only means you don't waste time building something that's not possible, but also don't waste time building the wrong product.
We've now taken this and we've been evaluating it on live consultations and it leads us very nicely to our first lesson. which is resolve AI uncertainties early on with evaluations and tests with real users. With those live consultations, we've been creating evaluations which we've actually published and our first one of those even made its way onto the BBC front page.
I'm going to take us on to another product now for our second lesson. That product is our AI transcription tool called Minute and the lesson is all about going wide with features. There are many use cases in the government where secure AI transcription and summarization could be transformational. There are many places where frontline staff, for example, are spending time away from the job that they want to do to do administration and filling in paperwork and forms, for example.
There's also very good existing off-the-shelf solutions such as the AWS and Azure transcription services. So for this product, the question was more about how do you create a streamlined frictionless experience for users that gives them this capability. When we were exploring the possibility of this space, what we found was we thought there was lots of ways that AI could help by developing AI features that could help the user get access to this experience.
But there was a lot of different ways you could do this and there's a lot that was quite uncertain. We also knew that AI could help us build those features really quickly with AI coding assistance and tools. So what we ended up doing is going extremely wide and trying quite a lot of features with different groups of users and seeing what worked and what didn't.
The important thing is after that point, we then stripped back and focused on what actually worked. One of the benefits of using AI coding assistance to make those features as well is that you don't have the sentimental attachment to them so it makes it much easier to strip them out again afterwards.
I'm going to illustrate that point by showing an example of what the tool looked like when it had lots of features and then when we streamlined it down. So here the user's already recorded their meeting and they've been taken to this page to help them generate the summary of them.
You see at the top there's like the ability to choose lots of different templates because we have different users we're experimenting with. Some of our users seem seem to want the the output to follow an agenda from the from the meeting so we gave them the option of inputting that agenda information.
At the bottom we had two different AI features we had an AI edit button so they could use free text to edit the output of the meeting, but we also said AI chat so they could ask questions of the meeting. And this doesn't touch on some of the AI that's happening behind the scenes such as automatically predicting who the speaker names are and also doing citations back to the original transcript.
It's no surprise that when we were testing this with a lot of our users they found it a little bit overwhelming and a little bit complicated. And in fact many of them weren't even using these features. We also found because we were testing with different groups a specific group that there was quite a lot of value of pursuing with which was the probation services use case.
So what we did next is we focused in on that use case and then we streamlined the app down and what we ended up with is with this justice transcribe and we built this in collaboration with justice AI who's an AI team in the Ministry of Justice. As you can see it's a lot simpler because we're focusing on one set of users.
We didn't need to have the template picking option. These users didn't need the agenda option so we could strip it out entirely. What we found with the AI edit and the AI chat feature is an overwhelming amount of pressure to merge them into one feature. So we've taken them out and we're experimenting heavily so that there's not that same confusion.
We've been getting extremely positive feedback from users with this and we're currently taking part in an evaluation where we're being compared to other tools in this space to work out which ones are the most impactful. But I hope this illustrates the point and the lesson that we're making here which is experiment hard and go wide with lots of features.
Lean into that uncertainty of what makes a good AI feature at the moment but then cut back and streamline the app down. I thought you might be interested in another use case that we were exploring which was for prime ministerial meetings. This was actually from a recent meeting which was the first ever prime minister meeting where AI was used to transcribe and summarize the meeting and it was done using our tool.
For the final lesson I'm going to tell you a little bit about red box and the lesson is all about being ready to pivot. For those of you that don't know all of our government ministers carry around a big red box which is full of paperwork and submissions and important decisions that they have to make.
They're private offices do a lot of work to summarize collate and collect all that information to put it in the red box. Again this is a prototypical use case of AI so it was no surprise that the idea to digitize this red box was the winning idea at a hackathon run by one of our sister teams evidence house.
This became the first incarnation of red box to digitize the ministerial red box. We took this winning idea from hackathon built it into a full product. However what we found when we actually tested it with real users is that the feature that they were most after the one that they wanted above all else and that they didn't really care about anything else was just the ability to securely chat with a large language model.
You see this was over a year ago when enterprise the ability for enterprises to chat with large language models was definitely a bit rarer and particularly in the civil service. So people were familiar with the value they could get from things like chat GPT but they couldn't put their work information into it.
This led to the second incarnation of red box which was to be the easiest and cheapest way to securely chat to a large language model for civil servants. This also gave us an opportunity. A lot of our other tools we were experimenting with ways of making government specific data more accessible and easy to navigate.
For example we had a product called parlex which was all about making parliamentary and legislative data more available. But we were developing these as independent products with their own user interfaces. The opportunity we saw is to use red box as that interface. Why not bring these tools and products into this kind of chat interface that we were already creating and that lots of people already had access to.
That's why our third incarnation was to be the client to access the incubator for AI's tools and data. It's worth saying after the second one, the reason that we validated that that was a useful use case, we launched it within the cabinet office and within just a matter of weeks we had thousands of users.
So that's why we knew it would be a useful front end for some of our other tools as well. The next thing that happened were two important things. The first is that the commercial landscape changed. Microsoft announced that co-pilot chat, their enterprise version of chat GPT, was going to be free for enterprise Microsoft users.
And a lot of the government is an enterprise Microsoft user. The second thing that happened is that Claude's model context protocol exploded onto the scene and provided a way of providing standardization for being able to bring tools and data to models. This meant we had to pivot again. It no longer made sense for us to bank on red box being the main way that civil servants would be accessing secure chat with large language models.
And it no longer made sense for that to be the only way for people to access our tools and data. So instead, we've been investing hard in using the model context protocol to bring our tools and data to any client, whether it's red box, whether it's co-pilot chat, whether it's enterprise versions of other tool like anthropic or chat GPT.
So it's worth stressing that throughout that time, red box has been valuable and is still valuable. It's still the main way that a lot of people in the cabinet office are able to access that secure chat with a large language model. But I hope this lesson shows that things are moving really quickly and it's really important to evolve and change with it.
Otherwise, you get stuck on the wrong path. That's why our third lesson is you'll have to pivot harder and faster than ever before. So let's recap the four lessons we've covered today. And yep, there were four. Lesson zero, the importance of AI product managers and the fact this is a vital role which requires AI expertise.
Lesson one, evaluate AI early. Resolve AI uncertainties early on with evaluations and tests with users. Lesson two, go wide with features. Experiment hard with new features on real users, then cut back. And lesson three, be ready to pivot. You'll have to pivot harder and faster than ever before. Now, some of you are probably sat there thinking, how much of this is really new?
And that's a fair question to ask. There's a lot of wisdom within existing product management that feels very familiar to the stuff we're covering here. For example, the principle of resolving your biggest uncertainties first has been around for a long time. As is putting your users first, listening to them and testing features with them.
However, I hope that what these lessons have emphasized is that AI really does make things different. AI really does, resolving AI's uncertainties really is an important thing you have to do and is something that is a bit more challenging with the extra need for experimentation and evaluation. For lesson two, which we had, which was going wide with features, AI really does change the landscape.
It makes it easier to go faster with features and have less attachment to them and therefore you should be doing that and testing those features and scaling back. As well as AI features being new and there being more uncertainty around exactly what makes a good AI feature. And finally, the AI landscape is changing extremely quickly, which is why pivoting is more necessary now than ever before.
So I hope those lessons have been useful. I hope that you feel ready to step up into that AI product manager mindset that your product needs. And thank you so much for listening and please do check us out. We're currently hiring. Thank you. Thank you for hiring. Thank you.
Thank you. Thank you. Transcription by ESO. Translation by —