back to indexHiring & Building an AI Engineering Team: Dr. Bryan Bischof

00:00:00.040 |
Hex is a data science platform company. What you should think about is sort of like a really 00:00:18.600 |
powerful Jupyter notebook. I've been building AI capabilities at Hex for about a year and a half. 00:00:23.280 |
I've been in data science and machine learning for about 12 years. So let's get one thing out 00:00:28.520 |
of the way. Why might you care about my opinion? Well, let's take a look at this tweet of mine. 00:00:34.980 |
This is when I was hiring my first AI engineer. I'll draw your attention to the date. I'll 00:00:40.220 |
also draw your attention to the date on this very famous blog post. So my tweet was one 00:00:47.440 |
of the inspirations for the rise of the AI engineer. That does not mean that I'm fully aligned with 00:00:52.840 |
that blog post, but it does mean that I have some opinions. What I expect, many of you are 00:01:01.300 |
expecting, is for me to start talking about what AI engineering is. And so to kind of try 00:01:09.300 |
try to help with this, I made a little simple chatbot. I call it AI leader GPT. And I figured, 00:01:15.760 |
let's have it answer your questions. I'm pretty lazy. I have a lot of stuff on my plate. I figured, 00:01:19.760 |
just hand it over to AI. That's kind of our goal anyway. So I asked my little chatbot that I want to 00:01:25.980 |
ask some questions. I asked it, what does AI engineer mean? Unfortunately, this is one of those moments where, 00:01:32.840 |
clearly, the intelligence isn't there yet. Despite a lot of improvement on MMLU, we yet can't define 00:01:40.020 |
this term. So unfortunately, I'll have to do this myself. Okay, so let's start with the question, 00:01:46.440 |
what? Building an AI product requires a team. But what does this role really look like? I'm going to 00:01:54.560 |
tell you what my very first job posting looked like. And I'm going to talk you through it to try to give 00:01:59.340 |
you some sense of what this actually should be. I was looking for a senior engineer. That senior 00:02:04.500 |
engineer could come from SWE or MLE. And we wanted them to rapidly expand our capabilities for greenfield 00:02:12.840 |
applications. That should sound very normal, should sound very expected. Unfortunately, while we respect 00:02:22.740 |
ML researchers, we are explicitly not looking for this. This was in the job posting. For those of you 00:02:30.080 |
that are putting up job postings, I highly recommend that if this is not what you're looking for, that you 00:02:35.820 |
tell them. It is both a waste of your time and their time for them to apply hoping to go to ICLR based on the 00:02:46.020 |
work they're going to work they're going to do at your company. Once again, this is not for lack of a deep 00:02:51.180 |
interest in their work. This is just not the stage we're at. I would love to hear from you if you have 00:02:59.740 |
experience getting ML or AI capabilities into production and serving real users. If you have a lot of that 00:03:07.680 |
enthusiasm for applications of AI to business problems. And if you have a core understanding of the 00:03:15.660 |
architectural things, maybe you've read one of the books on MLOps. Maybe you've read previous discussions 00:03:22.080 |
about MLOps. Maybe you've worked in some infra adjacent things as a back-end engineer. That all sounds 00:03:27.840 |
wonderful. And here we start to get a little controversial. You should be comfortable working in both Python 00:03:35.760 |
and TypeScript. It's okay if you are only strong in one, but you should be open to both. Our application 00:03:46.440 |
is built in TypeScript. I need people who are going to be willing to get into the details that get into 00:03:51.660 |
the nitty-gritty there. I don't need you to come in as an expert on React, but I do need you to be able 00:03:57.420 |
to interface with those people in a really productive way. So I went back to my little GPT and I said, I'd 00:04:07.800 |
really like to understand the when, the why, the who, and the how of all of this process. And so let's go 00:04:14.580 |
through those things. First up, when? Well, it actually kind of depends. It depends where you 00:04:22.080 |
are in your journey. If you're early in your journey, you need SWE skills, you need data profiles, and you 00:04:28.200 |
need product competency. If you're middle stage, you need SWE some more, probably some more infra. You need 00:04:35.040 |
data profiles, but you also need some design. This last one is the number one thing that I see teams 00:04:40.800 |
under-investing in. If you're later stage, you definitely need infra. You need all the above, 00:04:47.940 |
but a little bit scaled. And now you need to start actually thinking about machine learning engineers. 00:04:52.420 |
This is when you want to start. Yeah? Yeah, totally. So data scientists, data analysts, people that have a 00:05:01.200 |
lot of experience looking at distributions of data and saying, wait a minute, that's strange. Looking at 00:05:08.340 |
user output and saying, hmm, this is actually quite different than we were expecting. Looking at sort of 00:05:14.180 |
like product analytics and saying, you know what? This retention is pretty poor. One thing that I would 00:05:23.500 |
sort of ask everybody in this room to do for yourself right now is how good should retention be on an AI 00:05:29.000 |
product? If you don't know that about your product, both what the comparables are for other products, 00:05:35.500 |
you're missing an opportunity. You're missing an opportunity to hire people who have been doing this 00:05:39.600 |
for a long time. They will up-level your team. So in the later stages, now you need to start talking 00:05:46.760 |
about MLEs. Maybe because you need to fine-tune a model. Maybe because you need to fine-tune an embedding 00:05:52.020 |
you should start early. However, it is extremely dangerous to accidentally fall into the trap of the 00:06:07.660 |
mythical man month. This is true in all software's, all product development. It is somehow magnified in AI. I see more 00:06:16.020 |
people overdoing this in AI than I do in other domains. So if the AI demo only takes one week and the AI product 00:06:27.280 |
clearly takes only four weeks if you throw 20 engineers at it. I don't have 20 engineers, but even if I did, I certainly wouldn't be putting them on the same AI product. 00:06:38.540 |
There is a very, very, very important reminder that nine women cannot give birth in a single month. 00:06:49.880 |
Because all AI products are early by definition, the mythical man month is especially true for early products. You really, really, really need to be careful here. When I speak to my peers, this is the number one mistake that I'm hearing. 00:07:07.540 |
The hiring schedule should reflect the development schedule. So you should start with an early product that you get in front of users. Then you should start by building evals. Then you should get user feedback. And then you should iterate. 00:07:23.100 |
If you want more details about this, you can either consult this paper that we released, what we learned last year of building LLMs, or you can come to our talk later today to get more information on this. 00:07:33.360 |
This development schedule, this is not just my opinion. It's not just six of our opinions. This is tried and tested, speaking to a lot of other people. This is what works. This order of operations. 00:07:45.140 |
So if this order of operations works, then your hiring had better be well aligned to it. 00:07:49.780 |
Data needs to come much earlier than in traditional product engineering efforts. You asked the question about, like, what do data profiles look like? 00:07:59.040 |
One of the things that is very, very true about AI, partly because of this development schedule, but also because of the type of products that we're trying to build, you really need to be looking at your data. 00:08:11.100 |
And all of us can look at data, but some people are literally professionals at just looking at the data. 00:08:17.460 |
That intuition takes a long time to build and will level up your team. 00:08:22.080 |
So, unfortunately, why is a little bit of a harder question existentially. 00:08:30.140 |
So we'll try to scope it down a teeny bit to just be why hire for these teams. 00:08:38.120 |
The hiring theses for this initial team is going to look like the following. 00:08:46.520 |
And the reason I wanted to kind of give you these theses is because ultimately your leadership is probably asking you, what's your hiring thesis? 00:08:58.440 |
If you can pull them over and satisfy these theses, then you don't need to make a hire. 00:09:03.980 |
So for the full stack engineer, the hiring thesis is they're going to integrate your system with an LLM provider. 00:09:17.420 |
The hiring thesis for the data scientist is evaluation, quality, and user data. 00:09:25.100 |
I've already talked about this a couple times in this talk already. 00:09:32.160 |
I'm not super specific here that this needs to be a product manager, a program manager, a product developer. 00:09:41.620 |
And the reason is because they need to be talking to users. 00:09:45.100 |
They need to be understanding what the jobs to be done are. 00:09:48.120 |
If you personally don't know what the jobs to be done are for your application, that's a hole. 00:09:59.560 |
A lot of times we think of designers as coming later in the process. 00:10:03.280 |
But right now, none of us know what the shape should be. 00:10:08.680 |
Think about early technology and how different it looks for users from what it eventually becomes after you've been doing it for five years. 00:10:26.680 |
And how many applications are we asking people to pay to use that don't look much better than this React? 00:10:36.720 |
The reality is your AI application probably looks like shit. 00:10:44.340 |
I just say that in a way of there's a lot of opportunities. 00:10:51.280 |
So, what I challenge you to do is bring in a professional. 00:10:54.700 |
And finally, when you need an MLE, it's because you need to push your capabilities beyond what is the commodity intelligence. 00:11:03.900 |
That delta is what the MLEs are going to bring. 00:11:12.260 |
In my experience, the attributes that are strongly co-varying with a lot of impact are data intuition. 00:11:24.700 |
But there's a big difference between I made a semantic embedding of all of my documents and I made a semantic embedding of all my documents. 00:11:33.700 |
And when I looked at the mutual distances, they fall into a very ridge-like, jagged structure. 00:11:51.740 |
We are still trying to figure out what the actual utility of most of this is. 00:11:57.100 |
I am incredibly skeptical that we are already at the boundary of the value for these things. 00:12:03.840 |
If we believe that there's a lot more juice to squeeze, then we must also accept that we don't know what the right products are right now. 00:12:12.220 |
I would hope that for most of us, what we're building right now, what we're laser-focused on, what we're telling our investors is the breakout thing. 00:12:21.400 |
We look back in two years like, okay, so it was very naive. 00:12:24.840 |
But I hope that that's the case for all of us. 00:12:27.900 |
I don't want to be building the same thing in two years, and I hope you don't either. 00:12:33.060 |
This is always the case for engineering teams, that urgency has a really high value. 00:12:39.220 |
But when everything changes under your feet every three months, it's even more true. 00:12:44.060 |
A little ADHD can be useful, too, speaking from personal experience. 00:12:52.300 |
If you are giving leak code interviews for your AI engineering hiring, you are doing yourself a major disservice. 00:12:59.680 |
I cannot think I've done a lot of leak code interviews. 00:13:10.560 |
I cannot imagine a leak code experience I've ever had that gives signal on what is actually useful for building this shit. 00:13:23.660 |
Make data intuition part of your hiring loop, and so, too, for product intuition. 00:13:32.480 |
That take-home, ultimately, is a data-cleaning exercise. 00:13:55.400 |
You were able to look at the data and make some conclusions. 00:13:58.180 |
That sure sounds a whole hell of a lot like what I need them to do on the job. 00:14:03.120 |
My coding challenge, most of the people in this room would think is, like, too easy. 00:14:10.120 |
But I promise you, I get a whole hell of a lot of signal out of it. 00:14:16.740 |
Invest in data intuition and product intuition. 00:14:20.420 |
One thing that Hex does that I think is really amazing is we have a product design interview. 00:14:34.660 |
Look for people who are paying attention, but not necessarily just riding the wave. 00:14:43.520 |
I understand that a lot of people are really enthused right now, and they really want to get involved. 00:14:49.280 |
What I'm really looking for, though, is people that are going a little bit deeper. 00:14:54.560 |
They're playing with other AI products, and they're forming opinions about what is good and what is bad. 00:15:00.980 |
I recently had the privilege of hiring an AI engineer who had written a blog post about, like, AI design patterns. 00:15:08.480 |
There was a certain extent to which, just from that blog post alone, I could have predicted that she was going to get hired. 00:15:15.500 |
That's not to say that, like, I'm hiring based on blog posts, but the amount of awareness codified in that single blog post about design patterns, design thinking, what AI should feel like. 00:15:33.300 |
She also happens to be technically very competent. 00:15:40.220 |
So, those are my main guidance for hiring these teams, not just the AI engineer profile itself, but more generally, how to build these teams. 00:15:53.360 |
But I was curious if I could get my chat bot to give us any sort of, like, alpha. 00:15:58.920 |
And so, we'll go ahead and ask this question live and see what it says. 00:16:12.060 |
Unfortunately, it's not pleased with me taking all of its good ideas and delivering them as if they're my own. 00:16:29.980 |
How many of you have worked with experts before in ML and AI teams? 00:16:35.340 |
How many of you, for whom that you did, worked with that in, like, data labeling? 00:16:44.900 |
So, this is the key thing that can take what you are building and make it go much more smoothly. 00:16:57.400 |
Work directly with the people that understand what you want the AI to do. 00:17:01.940 |
If you're building a customer support bot and you don't have customer support bot and you don't have customer support people using that every day, you're insane. 00:17:21.380 |
I talk to our data scientists every single week without exception. 00:17:35.740 |
I don't care how smart you are as an engineering leader. 00:17:38.980 |
I don't care how smart you are as a machine learning engineer. 00:17:41.780 |
The only product that you could possibly be building that doesn't require you to work with other experts is if you're building an AI bot for generating fucking ML and AI to products. 00:17:54.520 |
That is the only one because then you are still the expert. 00:17:57.820 |
This is by far and away the most important thing that you should be thinking about beyond hiring. 00:18:20.360 |
Could you maybe at like a high level give an example for data intuition and the product design sort of prompts that you're doing? 00:18:30.020 |
Because I find I think the coding one's a little bit more deterministic and easy. 00:18:34.200 |
With those I'm not really sure where to start but it sounds like an excellent way to conduct the interview. 00:18:38.580 |
Yeah, so this is specifically for getting data intuition signaled during the interview process. 00:18:43.200 |
Yeah, so for me it is actually part of the interview process like I'm giving them a large set of data and I'm asking them to like form some opinions about some of the data contained there in. 00:18:54.920 |
So, roughly a clustering problem but the secret is there's like no actual good clusters, there's no actual like objective way to cluster that data. 00:19:04.100 |
So, what I'm hoping that they're going to do is be able to pull out some sort of like latent meaning in that data. 00:19:11.100 |
So, I do this as a take home because one, I think people, I mean we all know that people program substantially worse during an interview process. 00:19:20.100 |
And also like how often is your manager staring over your shoulder when you're doing data analysis? 00:19:29.100 |
I give them seven days to complete a take home challenge, they get to look at the data, they get to write up a little report, and then I do a live interview with them where I give them feedback on their proposal. 00:19:40.100 |
One, I get to see how well did they do and I get to really like talk to them about it. 00:19:45.100 |
Two, if I misunderstood something about their approach, them talking through that notebook with me is a really good opportunity for me to say, oh, actually I misunderstood what you did. 00:19:56.100 |
Three, I'm going to give them a lot of feedback. 00:20:02.100 |
One, they're going to learn what it's like to give feedback from me. 00:20:09.100 |
And they'll know by the end of the interview if getting feedback from me sucks. 00:20:12.100 |
That's really important for them to make a decision about working for me. 00:20:16.100 |
And then on the flip side, I get to learn what they're like to interact with when I'm giving feedback. 00:20:20.100 |
If they did a really great job, a lot of my feedback will like, this is really cool. 00:20:26.100 |
One of the things that your responsibility is as a leader is to always have feedback, period. 00:20:33.100 |
And so it's your responsibility during the interview process to show them what that experience is going to feel like. 00:20:39.100 |
That's a big part of the matching problem for hiring. 00:20:42.100 |
And then finally, this interview is an opportunity for us to talk about sort of like how do they think through what is the minimum deliverable on some given task. 00:20:56.100 |
If they turn something in that's clearly 15 hours worth of work, that's a red flag. 00:21:02.100 |
If they turn something in that's like really well scoped for four hours, that's a green flag. 00:21:08.100 |
And frankly, if they turn in something that's like overly simplified, I have the opportunity to say, hey, I think this is maybe like a little bit like under what I was expecting. 00:21:19.100 |
And sometimes, and frankly, I've hired one of these people, the feedback was, I really didn't see much value in going any deeper until we reviewed this. 00:21:33.100 |
Like, so that is why I think the style of interviewing is so important. 00:21:38.100 |
And getting that like data intuition out of it is the core goal. 00:21:44.100 |
But this format allows me to sort of like tag on a lot of extra signal. 00:21:48.100 |
That one's a little bit harder for me to get into, but basically we ask you to like design a physical product and they meet with our design team and it's incredible. 00:22:00.100 |
I did it myself and it was my first time ever doing a product design interview. 00:22:03.100 |
And I was like, I want to work for this company. 00:22:21.100 |
I tend to think that security responsibility doesn't lie within the team building the AI capabilities. 00:22:27.100 |
I think that they should be security savvy, but I think ultimately like most organizations should have security professionals that are able to help you make great decisions. 00:22:38.100 |
I do believe a lot and my security friends are going to like roast me for this. 00:22:43.100 |
But like I do believe a lot that like strong engineers, strong software engineers should be constantly thinking about sort of like the adversarial nature of humans interacting with software and be thinking about like where they're bringing up risks. 00:22:57.100 |
But I tend to think a lot about security is lying outside the team. 00:23:02.100 |
You raised your eyebrow in reaction to my comment. 00:23:05.100 |
So I'd like to ask, tell me why you think I'm wrong. 00:23:19.100 |
So where we've had the most success is bringing security people to be part of our internal product teams at the beginning. 00:23:28.100 |
Because we face a federal highly regulated, we're going to get a billion questions about this. 00:23:33.100 |
And we don't have to retrain everybody outside in a separate security team. 00:23:40.100 |
We've seen that pattern work to grab a smart and engaged security person to be part of that effort rather than trying to hold on for a veteran. 00:23:52.100 |
I think that's really like insightful and really like meaningful in sectors that are a little different than mine. 00:24:02.100 |
And like even in my domain where we have a lot of like sensitive customer data, we sign BAs with every like provider that we work with. 00:24:10.100 |
Like I do, I mean bluntly, like I take on a lot of that responsibility personally as the team lead. 00:24:17.100 |
But I can absolutely see value in what you're talking about. 00:24:25.100 |
I would suggest that it's part of the product team though. 00:24:36.100 |
You're talking about creating new teams and hiring teams. 00:24:39.100 |
And now I see another very important aspect is that upskilling or reskilling existing teams. 00:24:47.100 |
What would be your advice as well because in my, in my, I mean, it's, you're not always like creating new teams. 00:24:55.100 |
You know, always doing with like human depth, you know, like existing teams and software engineer. 00:25:00.100 |
And, you know, most of them can be reluctant or can be, you know, this kind of routine software engineer being there for years and years. 00:25:08.100 |
What would be your two cents about like re-screening, upskilling and making all these mayonnaise work with the new team, you know? 00:25:18.100 |
So I'm going to repeat back to the question, make sure I totally understand. 00:25:21.100 |
So your point is I focus a lot on zero to one for teams, but when you want to take an existing product team and you want to like add AI capabilities and make sure that they're set up for success. 00:25:36.100 |
So, um, really, really good point and really important. 00:25:38.100 |
Um, I think, I believe that the AI capabilities at any given company should have at least one team who's responsible for building the infrastructure to make that easy. 00:25:50.100 |
I don't, I know Netflix is like very divergent and thinking from what I'm about to say. 00:25:57.100 |
But what I do want to add is like in most companies, every individual product team should not be going up and setting up the relationship with open AI separately. 00:26:08.100 |
They should not be going and figuring out how to like build a prompt, like, uh, infrastructure in your software. 00:26:15.100 |
They should not be understanding what the evaluation system is going to look like. 00:26:19.100 |
I really think all of that should be coalesced for every given company. 00:26:24.100 |
There should be one team responsible for that. 00:26:26.100 |
And then what I believe works very well is to have other teams like you're talking about product teams then say, we're going to treat you like a different infra team. 00:26:38.100 |
And then we're going to have very similar to the profile that we talked about one person who's really responsible on that product team for interfacing with the platform team. 00:26:48.100 |
I worked at Citrix for a long time where the data scientist had access to a data platform team. 00:26:53.100 |
And the data platform teams charter was do whatever they possibly can so that data scientists can move as fast as possible. 00:27:01.100 |
I really believe in that model and I've seen it personally as a consumer be incredibly powerful. 00:27:07.100 |
We were trying to like build new models, deploy them. 00:27:14.100 |
And so I really believe that like having a centralized like platform team for AI at your company has so much leverage. 00:27:22.100 |
So that's a little bit of my like opinion here. 00:27:25.100 |
The one caveat is again, I know Netflix disagrees with this persona and every individual product team at Netflix has the like open like opportunity. 00:27:34.100 |
If they just want to go do it from scratch, go ahead, have fun. 00:27:37.100 |
But there's a little bit of like reason why that makes sense for Netflix and not for most of us. 00:27:41.100 |
We're at time, but this is going to be our last question. 00:27:48.100 |
What I was hoping to get your opinion on is when you're hiring or maybe not reskilling your team. 00:27:53.100 |
You mentioned a lot of great attributes that you should look at like data literacy, your urgency and your general enthusiasm for I guess generative AI products in general. 00:28:02.100 |
So I was wondering when you're doing hiring, what do you think are some of the attributes that are non-negotiable and what do you think are some attributes that are trainable? 00:28:09.100 |
The sense that if you hire a software engineer who has a lot of enthusiasm and a lot of skills that could benefit but doesn't really have the data literacy, do you think that's okay? 00:28:18.100 |
Or if you hire a data scientist who might not have the urgency but the background? 00:28:21.100 |
I think all three of the attributes I mentioned, there needs to be some kernel there that I can develop. 00:28:30.100 |
But there's one latent feature that I didn't mention that I've never personally successfully trained and is a really big, powerful feature in my model. 00:28:40.100 |
Actually, I think that's a fantastic way to end.