Hiring & Building an AI Engineering Team: Dr. Bryan Bischof

Hex is a data science platform company. What you should think about is sort of like a really powerful Jupyter notebook. I've been building AI capabilities at Hex for about a year and a half. I've been in data science and machine learning for about 12 years. So let's get one thing out of the way.

Why might you care about my opinion? Well, let's take a look at this tweet of mine. This is when I was hiring my first AI engineer. I'll draw your attention to the date. I'll also draw your attention to the date on this very famous blog post. So my tweet was one of the inspirations for the rise of the AI engineer.

That does not mean that I'm fully aligned with that blog post, but it does mean that I have some opinions. What I expect, many of you are expecting, is for me to start talking about what AI engineering is. And so to kind of try try to help with this, I made a little simple chatbot.

I call it AI leader GPT. And I figured, let's have it answer your questions. I'm pretty lazy. I have a lot of stuff on my plate. I figured, just hand it over to AI. That's kind of our goal anyway. So I asked my little chatbot that I want to ask some questions.

I asked it, what does AI engineer mean? Unfortunately, this is one of those moments where, clearly, the intelligence isn't there yet. Despite a lot of improvement on MMLU, we yet can't define this term. So unfortunately, I'll have to do this myself. Okay, so let's start with the question, what?

Building an AI product requires a team. But what does this role really look like? I'm going to tell you what my very first job posting looked like. And I'm going to talk you through it to try to give you some sense of what this actually should be. I was looking for a senior engineer.

That senior engineer could come from SWE or MLE. And we wanted them to rapidly expand our capabilities for greenfield applications. That should sound very normal, should sound very expected. Unfortunately, while we respect ML researchers, we are explicitly not looking for this. This was in the job posting. For those of you that are putting up job postings, I highly recommend that if this is not what you're looking for, that you tell them.

It is both a waste of your time and their time for them to apply hoping to go to ICLR based on the work they're going to work they're going to do at your company. Once again, this is not for lack of a deep interest in their work. This is just not the stage we're at.

I would love to hear from you if you have experience getting ML or AI capabilities into production and serving real users. If you have a lot of that enthusiasm for applications of AI to business problems. And if you have a core understanding of the architectural things, maybe you've read one of the books on MLOps.

Maybe you've read previous discussions about MLOps. Maybe you've worked in some infra adjacent things as a back-end engineer. That all sounds wonderful. And here we start to get a little controversial. You should be comfortable working in both Python and TypeScript. It's okay if you are only strong in one, but you should be open to both.

Our application is built in TypeScript. I need people who are going to be willing to get into the details that get into the nitty-gritty there. I don't need you to come in as an expert on React, but I do need you to be able to interface with those people in a really productive way.

So I went back to my little GPT and I said, I'd really like to understand the when, the why, the who, and the how of all of this process. And so let's go through those things. First up, when? Well, it actually kind of depends. It depends where you are in your journey.

If you're early in your journey, you need SWE skills, you need data profiles, and you need product competency. If you're middle stage, you need SWE some more, probably some more infra. You need data profiles, but you also need some design. This last one is the number one thing that I see teams under-investing in.

If you're later stage, you definitely need infra. You need all the above, but a little bit scaled. And now you need to start actually thinking about machine learning engineers. This is when you want to start. Yeah? Yeah, totally. So data scientists, data analysts, people that have a lot of experience looking at distributions of data and saying, wait a minute, that's strange.

Looking at user output and saying, hmm, this is actually quite different than we were expecting. Looking at sort of like product analytics and saying, you know what? This retention is pretty poor. One thing that I would sort of ask everybody in this room to do for yourself right now is how good should retention be on an AI product?

If you don't know that about your product, both what the comparables are for other products, you're missing an opportunity. You're missing an opportunity to hire people who have been doing this for a long time. They will up-level your team. So in the later stages, now you need to start talking about MLEs.

Maybe because you need to fine-tune a model. Maybe because you need to fine-tune an embedding you should start early. However, it is extremely dangerous to accidentally fall into the trap of the mythical man month. This is true in all software's, all product development. It is somehow magnified in AI.

I see more people overdoing this in AI than I do in other domains. So if the AI demo only takes one week and the AI product clearly takes only four weeks if you throw 20 engineers at it. I don't have 20 engineers, but even if I did, I certainly wouldn't be putting them on the same AI product.

There is a very, very, very important reminder that nine women cannot give birth in a single month. Because all AI products are early by definition, the mythical man month is especially true for early products. You really, really, really need to be careful here. When I speak to my peers, this is the number one mistake that I'm hearing.

The hiring schedule should reflect the development schedule. So you should start with an early product that you get in front of users. Then you should start by building evals. Then you should get user feedback. And then you should iterate. If you want more details about this, you can either consult this paper that we released, what we learned last year of building LLMs, or you can come to our talk later today to get more information on this.

This development schedule, this is not just my opinion. It's not just six of our opinions. This is tried and tested, speaking to a lot of other people. This is what works. This order of operations. So if this order of operations works, then your hiring had better be well aligned to it.

Data needs to come much earlier than in traditional product engineering efforts. You asked the question about, like, what do data profiles look like? One of the things that is very, very true about AI, partly because of this development schedule, but also because of the type of products that we're trying to build, you really need to be looking at your data.

And all of us can look at data, but some people are literally professionals at just looking at the data. That intuition takes a long time to build and will level up your team. So, unfortunately, why is a little bit of a harder question existentially. So we'll try to scope it down a teeny bit to just be why hire for these teams.

Why not just use your existing resources? The hiring theses for this initial team is going to look like the following. And the reason I wanted to kind of give you these theses is because ultimately your leadership is probably asking you, what's your hiring thesis? What's your hiring thesis? We have this person on this other team.

Why can't we just pull them over? If you can pull them over and satisfy these theses, then you don't need to make a hire. If you can't, then you do. So for the full stack engineer, the hiring thesis is they're going to integrate your system with an LLM provider.

Not very thrilling, but key. And build a minimum infrastructure. The hiring thesis for the data scientist is evaluation, quality, and user data. Continuously improving your AI product. I've already talked about this a couple times in this talk already. It is extremely important. A product person. I'm not super specific here that this needs to be a product manager, a program manager, a product developer.

But you need someone whose spike is product. And the reason is because they need to be talking to users. They need to be understanding what the jobs to be done are. If you personally don't know what the jobs to be done are for your application, that's a hole. That's a hole in your team.

A designer. A lot of times we think of designers as coming later in the process. But right now, none of us know what the shape should be. Think about early technology and how different it looks for users from what it eventually becomes after you've been doing it for five years.

All AI products right now are clownish. You want to see a great example of clownish? That's a great example of clownish. And how many applications are we asking people to pay to use that don't look much better than this React? By the way, Claude wrote this React. The reality is your AI application probably looks like shit.

I don't say that in a mean way. I just say that in a way of there's a lot of opportunities. I actually think ChatGPT looks like shit. So, what I challenge you to do is bring in a professional. And finally, when you need an MLE, it's because you need to push your capabilities beyond what is the commodity intelligence.

That delta is what the MLEs are going to bring. Okay. Okay. So, who? In my experience, the attributes that are strongly co-varying with a lot of impact are data intuition. We've already spoken about it a little bit. But there's a big difference between I made a semantic embedding of all of my documents and I made a semantic embedding of all my documents.

And when I looked at the mutual distances, they fall into a very ridge-like, jagged structure. The former, okay, you did it. The latter, your retrieval is going to suck. Product-mindedness. We are still trying to figure out what the actual utility of most of this is. I am incredibly skeptical that we are already at the boundary of the value for these things.

If we believe that there's a lot more juice to squeeze, then we must also accept that we don't know what the right products are right now. I would hope that for most of us, what we're building right now, what we're laser-focused on, what we're telling our investors is the breakout thing.

We look back in two years like, okay, so it was very naive. But I hope that that's the case for all of us. I don't want to be building the same thing in two years, and I hope you don't either. Urgency. This is always the case for engineering teams, that urgency has a really high value.

But when everything changes under your feet every three months, it's even more true. A little ADHD can be useful, too, speaking from personal experience. How? If you are giving leak code interviews for your AI engineering hiring, you are doing yourself a major disservice. I cannot think I've done a lot of leak code interviews.

I'm personally very good at them. It's just like a stupid thing about me. I promise you this is not me, like, coping. I cannot imagine a leak code experience I've ever had that gives signal on what is actually useful for building this shit. So, stop it. Make data intuition part of your hiring loop, and so, too, for product intuition.

My hiring loop includes a take-home. That take-home, ultimately, is a data-cleaning exercise. I've had candidates really surprised. They're like, okay, this seemed really easy. Did I, like, misunderstand the problem? And I'm like, no. You did a lovely job. Thank you. You didn't overcomplicate things. You extracted the meaning from the data.

You were able to look at the data and make some conclusions. That sure sounds a whole hell of a lot like what I need them to do on the job. My coding challenge, most of the people in this room would think is, like, too easy. But I promise you, I get a whole hell of a lot of signal out of it.

Invest in your coding challenge. Invest in data intuition and product intuition. One thing that Hex does that I think is really amazing is we have a product design interview. Not my idea, but damn do I love it. The one that says stupid leak code? Hell yeah. Look for people who are paying attention, but not necessarily just riding the wave.

I understand it's very exciting. I understand that a lot of people are really enthused right now, and they really want to get involved. That's great. What I'm really looking for, though, is people that are going a little bit deeper. They're playing with other AI products, and they're forming opinions about what is good and what is bad.

I recently had the privilege of hiring an AI engineer who had written a blog post about, like, AI design patterns. There was a certain extent to which, just from that blog post alone, I could have predicted that she was going to get hired. That's not to say that, like, I'm hiring based on blog posts, but the amount of awareness codified in that single blog post about design patterns, design thinking, what AI should feel like.

That's a lot of attention. That's a lot of attentiveness. I need that on my team. She also happens to be technically very competent. So, those are my main guidance for hiring these teams, not just the AI engineer profile itself, but more generally, how to build these teams. But I was curious if I could get my chat bot to give us any sort of, like, alpha.

And so, we'll go ahead and ask this question live and see what it says. Oh, it has an opinion. I think it wants to speak to you directly. So, AI leader GPT has some messages for you. Unfortunately, it's not pleased with me taking all of its good ideas and delivering them as if they're my own.

So, this is AI GPT's key alpha. This is your bonus information for the day. It wants you to work with experts. How many of you have worked with experts before in ML and AI teams? How many of you, for whom that you did, worked with that in, like, data labeling?

Maybe human-in-the-loop style? So, this is the key thing that can take what you are building and make it go much more smoothly. Work directly with the people that understand what you want the AI to do. If you're building a customer support bot and you don't have customer support bot and you don't have customer support people using that every day, you're insane.

I'm building a data science co-pilot. I'm building a data science co-pilot. I am a data scientist. I talk to our data scientists every single week without exception. We ask them to use every single thing. So, this is so important. They are the secret to success here. I don't care how smart you are as an engineering leader.

I don't care how smart you are as a machine learning engineer. The only product that you could possibly be building that doesn't require you to work with other experts is if you're building an AI bot for generating fucking ML and AI to products. That is the only one because then you are still the expert.

This is by far and away the most important thing that you should be thinking about beyond hiring. Thanks. Fantastic talk. Thank you so much. I learned a ton. Thank you. Could you maybe at like a high level give an example for data intuition and the product design sort of prompts that you're doing?

Because I find I think the coding one's a little bit more deterministic and easy. With those I'm not really sure where to start but it sounds like an excellent way to conduct the interview. Yeah, so this is specifically for getting data intuition signaled during the interview process. Yeah, so for me it is actually part of the interview process like I'm giving them a large set of data and I'm asking them to like form some opinions about some of the data contained there in.

So, roughly a clustering problem but the secret is there's like no actual good clusters, there's no actual like objective way to cluster that data. So, what I'm hoping that they're going to do is be able to pull out some sort of like latent meaning in that data. So, I do this as a take home because one, I think people, I mean we all know that people program substantially worse during an interview process.

And also like how often is your manager staring over your shoulder when you're doing data analysis? Not that often. So, like I don't see a lot of value in that. I give them seven days to complete a take home challenge, they get to look at the data, they get to write up a little report, and then I do a live interview with them where I give them feedback on their proposal.

So, this gives me a couple things. One, I get to see how well did they do and I get to really like talk to them about it. Two, if I misunderstood something about their approach, them talking through that notebook with me is a really good opportunity for me to say, oh, actually I misunderstood what you did.

Three, I'm going to give them a lot of feedback. This has two important effects. One, they're going to learn what it's like to give feedback from me. Am I an asshole? I guess we'll find out. And they'll know by the end of the interview if getting feedback from me sucks.

That's really important for them to make a decision about working for me. And then on the flip side, I get to learn what they're like to interact with when I'm giving feedback. If they did a really great job, a lot of my feedback will like, this is really cool.

Where could we go next? One of the things that your responsibility is as a leader is to always have feedback, period. And so it's your responsibility during the interview process to show them what that experience is going to feel like. That's a big part of the matching problem for hiring.

And then finally, this interview is an opportunity for us to talk about sort of like how do they think through what is the minimum deliverable on some given task. And we give them four hours over seven days. If they turn something in that's clearly 15 hours worth of work, that's a red flag.

If they turn something in that's like really well scoped for four hours, that's a green flag. And frankly, if they turn in something that's like overly simplified, I have the opportunity to say, hey, I think this is maybe like a little bit like under what I was expecting. What was your logic?

And sometimes, and frankly, I've hired one of these people, the feedback was, I really didn't see much value in going any deeper until we reviewed this. Talk about a green flag. Talk about a green flag. That's a, fuck yeah. Like, get in here. Like, so that is why I think the style of interviewing is so important.

And getting that like data intuition out of it is the core goal. But this format allows me to sort of like tag on a lot of extra signal. That one's a little bit harder for me to get into, but basically we ask you to like design a physical product and they meet with our design team and it's incredible.

I did it myself and it was my first time ever doing a product design interview. And I was like, I want to work for this company. Like, this is so heads up and so clever. This is a great company. I didn't hear a single plug for security. Oh, yeah.

Where does that live on this story arc? Totally valid. I tend to think that security responsibility doesn't lie within the team building the AI capabilities. I think that they should be security savvy, but I think ultimately like most organizations should have security professionals that are able to help you make great decisions.

I do believe a lot and my security friends are going to like roast me for this. But like I do believe a lot that like strong engineers, strong software engineers should be constantly thinking about sort of like the adversarial nature of humans interacting with software and be thinking about like where they're bringing up risks.

But I tend to think a lot about security is lying outside the team. You raised your eyebrow in reaction to my comment. So I'd like to ask, tell me why you think I'm wrong. I've heard of it. It's blowing up lately. So where we've had the most success is bringing security people to be part of our internal product teams at the beginning.

Because we face a federal highly regulated, we're going to get a billion questions about this. And we don't have to retrain everybody outside in a separate security team. So that's why I raised my eyebrow. We've seen that pattern work to grab a smart and engaged security person to be part of that effort rather than trying to hold on for a veteran.

I love that. I think that's really like insightful and really like meaningful in sectors that are a little different than mine. I think that makes 100% of sense. And like even in my domain where we have a lot of like sensitive customer data, we sign BAs with every like provider that we work with.

Like I do, I mean bluntly, like I take on a lot of that responsibility personally as the team lead. But I can absolutely see value in what you're talking about. I think that is completely right. I would suggest that it's part of the product team though. You guys have security as a product.

It's like they can. Yeah. Yeah. Yeah. Yeah. Hi. You're talking about creating new teams and hiring teams. And now I see another very important aspect is that upskilling or reskilling existing teams. What would be your advice as well because in my, in my, I mean, it's, you're not always like creating new teams.

You know, always doing with like human depth, you know, like existing teams and software engineer. And, you know, most of them can be reluctant or can be, you know, this kind of routine software engineer being there for years and years. What would be your two cents about like re-screening, upskilling and making all these mayonnaise work with the new team, you know?

Yeah. I think this is really important. So I'm going to repeat back to the question, make sure I totally understand. So your point is I focus a lot on zero to one for teams, but when you want to take an existing product team and you want to like add AI capabilities and make sure that they're set up for success.

Is that correct? Perfect. Cool. So, um, really, really good point and really important. Um, I think, I believe that the AI capabilities at any given company should have at least one team who's responsible for building the infrastructure to make that easy. I don't, I know Netflix is like very divergent and thinking from what I'm about to say.

So I'll caveat with that. But what I do want to add is like in most companies, every individual product team should not be going up and setting up the relationship with open AI separately. They should not be going and figuring out how to like build a prompt, like, uh, infrastructure in your software.

They should not be understanding what the evaluation system is going to look like. I really think all of that should be coalesced for every given company. There should be one team responsible for that. And then what I believe works very well is to have other teams like you're talking about product teams then say, we're going to treat you like a different infra team.

We're going to ask certain things of you. And then we're going to have very similar to the profile that we talked about one person who's really responsible on that product team for interfacing with the platform team. I worked at Citrix for a long time where the data scientist had access to a data platform team.

And the data platform teams charter was do whatever they possibly can so that data scientists can move as fast as possible. I really believe in that model and I've seen it personally as a consumer be incredibly powerful. We were trying to like build new models, deploy them. And this is not so different.

And so I really believe that like having a centralized like platform team for AI at your company has so much leverage. So that's a little bit of my like opinion here. The one caveat is again, I know Netflix disagrees with this persona and every individual product team at Netflix has the like open like opportunity.

If they just want to go do it from scratch, go ahead, have fun. But there's a little bit of like reason why that makes sense for Netflix and not for most of us. We're at time, but this is going to be our last question. Let's go. Hey, Brian. Thanks for the great talk.

Thanks. What I was hoping to get your opinion on is when you're hiring or maybe not reskilling your team. You mentioned a lot of great attributes that you should look at like data literacy, your urgency and your general enthusiasm for I guess generative AI products in general. So I was wondering when you're doing hiring, what do you think are some of the attributes that are non-negotiable and what do you think are some attributes that are trainable?

The sense that if you hire a software engineer who has a lot of enthusiasm and a lot of skills that could benefit but doesn't really have the data literacy, do you think that's okay? Or if you hire a data scientist who might not have the urgency but the background?

I think all three of the attributes I mentioned, there needs to be some kernel there that I can develop. Zero on any of them scares me. But there's one latent feature that I didn't mention that I've never personally successfully trained and is a really big, powerful feature in my model.

And that's curiosity. Actually, I think that's a fantastic way to end. Give it up for Brian. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you. Thank you.

Thank you. Thank you. We'll be right back.

Hiring & Building an AI Engineering Team: Dr. Bryan Bischof

Transcript