How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

00:00:00.000 | Okay, so welcome to the Latent Space Podcast. This is another remote episode that we're recording.

00:00:05.600 | Actually, this is the first one that we're doing around a guest post. And I'm very honored to have

00:00:12.400 | two of the authors of the post with me, James and Adam from Elicit. Welcome, James. Welcome, Adam.

00:00:16.960 | Thank you. Great to be here. Hey there.

00:00:19.680 | Hey. Okay, so I think I will do this kind of in order. I think James, you're sort of the primary

00:00:27.600 | author. So James, you are head of engineering at Elicit. You also were VP Eng at Teespring and

00:00:34.480 | Spring as well. And you also, you know, you have a long history in sort of engineering. How did you,

00:00:40.480 | you know, find your way into something like Elicit where, you know, you are basically a

00:00:46.960 | traditional sort of VP Eng, VP technology type person moving into more of an AI role?

00:00:53.200 | Yeah, that's right. It definitely was something of a sideways move, if not a left turn. So the

00:01:00.000 | story there was I'd been doing, as you said, VP technology, CTO type stuff for around about 15

00:01:06.160 | years or so. And noticed that there was this crazy explosion of capability and interesting

00:01:11.280 | stuff happening within AI and ML and language models, that kind of thing. I guess this was

00:01:17.760 | in 2019 or so, and decided that I needed to get involved. You know, this is a kind of generational

00:01:23.120 | shift. Spent maybe a year or so trying to get up to speed on the state of the art, reading papers,

00:01:27.360 | reading books, practicing things, that kind of stuff. Was going to found a startup actually

00:01:31.760 | in the space of interpretability and transparency. And through that met Andreas, who has obviously

00:01:38.320 | been on the podcast before, asked him to be an advisor for my startup. And he countered with,

00:01:44.240 | "Maybe you'd like to come and run the engineering team at Elicit," which it turns out was a much

00:01:48.240 | better idea. And yeah, I kind of quickly changed in that direction. So I think some of the stuff

00:01:52.800 | that we're going to be talking about today is how actually a lot of the work when you're building

00:01:58.480 | applications with AI and ML looks and smells and feels much more like conventional software

00:02:04.400 | engineering with a few key differences, rather than really deep ML stuff. And I think that's

00:02:08.400 | one of the reasons why I was able to transfer the skills over from one place to the other.

00:02:12.480 | Yeah, I definitely agree with that. I do often say that I think AI engineering is about 90%

00:02:19.680 | software engineering with the 10% of really strong, really differentiated AI engineering.

00:02:25.600 | And obviously that number might change over time. I want to also welcome Adam onto my podcast,

00:02:32.880 | because you welcomed me onto your podcast two years ago. And I'm really, really glad for that.

00:02:38.400 | That was a fun episode. You famously founded Heroku. You just wrapped up a few years working

00:02:44.960 | on Muse. And now you describe yourself as a journalist, internal journalist working on

00:02:49.920 | Elicit. Yeah, well, I'm kind of a little bit in a wandering phase here and trying to

00:02:55.200 | take this time in between ventures to see what's out there in the world. And

00:03:02.000 | some of my wandering took me to the Elicit team and found that they were some of the folks who

00:03:07.600 | were doing the most interesting, really deep work in terms of taking the capabilities of language

00:03:13.920 | models and applying them to what I feel like are really important problems. So in this case,

00:03:18.080 | science and literature search and that sort of thing. It fits into my general interest in tools

00:03:23.920 | and productivity software. I think of it as a tool for thought in many ways, but a tool for science,

00:03:28.400 | obviously, if we can accelerate that discovery of new medicines and things like that, that's just

00:03:32.400 | so powerful. But to me, it's kind of also an opportunity to learn at the feet of some real

00:03:37.840 | masters in this space, people who have been working on it since before it was cool, if you

00:03:41.920 | want to put it that way. So for me, the last couple of months have been this crash course.

00:03:45.680 | And why I sometimes describe myself as an internal journalist is I'm helping to write some posts,

00:03:51.440 | including supporting James in this article here we're doing for Latent Space, where I'm just

00:03:56.800 | bringing my writing skill and that sort of thing to bear on their very deep domain expertise around

00:04:03.440 | language models and applying them to the real world and kind of surface that in a way that's

00:04:08.480 | accessible, legible, that sort of thing. And so the great benefit to me is I get to learn this stuff

00:04:15.920 | in a way that I don't think I would or I haven't, just kind of tinkering with my own side projects.

00:04:22.480 | Yeah, totally. I forgot to mention that you also run Ink and Switch, which is

00:04:26.640 | one of the leading research labs, in my mind, of the tools for thought productivity space,

00:04:33.120 | whatever people mentioned there, or maybe future programming even, a little bit of that as well.

00:04:38.800 | I think you guys definitely started the local first wave. I think they were just the first

00:04:43.040 | conference that you guys held. I don't know if you were personally involved.

00:04:45.840 | Yeah, I was one of the co-organizers, along with a few other folks,

00:04:50.160 | called Local First Conf here in Berlin. Huge success from my point of view. Local First,

00:04:54.240 | obviously, a whole other topic we can talk about on another day. I think there actually is a lot

00:04:58.880 | more, what would you call it, handshake emoji between language models and the local first

00:05:06.240 | data model. And that was part of the topic of the conference here. But yeah, topic for another day.

00:05:12.560 | Not necessarily. I mean, if I can grab your thoughts at the end on local first and AI,

00:05:19.200 | we can talk about that. I featured, I selected as one of my keynotes, Justine Tunney,

00:05:24.560 | from Llamafile, working at Llamafile in Mozilla, because I think there's a lot of people interested

00:05:30.080 | in that stuff. But we can focus on the headline topic, just to not bury the lead, which is we're

00:05:37.120 | talking about how to hire AI engineers. This is something that I've been looking for a credible

00:05:42.320 | source on for months. People keep asking me for my opinions. I don't feel qualified to give an

00:05:47.920 | opinion, given that I only have so much engineering experience. And it's not like I've defined a

00:05:55.440 | hiring process that I'm super happy with, even though I've worked with a number of AI engineers.

00:05:58.960 | I'll just leave it open to you, James. How was your process of defining your hiring roles?

00:06:05.440 | Yeah. So I think the first thing to say is that we've effectively been hiring for this kind of a

00:06:11.280 | role since before you coined the term and tried to kind of build this understanding of what it was,

00:06:18.160 | which is not a bad thing. It was a concept that was coming to the fore and effectively needed a

00:06:24.240 | name, which is what you did. So the reason I mentioned that is I think it was something that we

00:06:30.480 | kind of backed into, if you will. We didn't sit down and come up with a brand new role from

00:06:36.240 | scratch. This is a completely novel set of responsibilities and skills that this person

00:06:41.120 | would need. However, it is a kind of particular blend of different skills and attitudes and

00:06:49.600 | curiosities, interests, which I think makes sense to kind of bundle together. So in the post, the

00:06:55.920 | three things that we say are most important for a highly effective AI engineer are, first of all,

00:07:00.880 | conventional software engineering skills, which is kind of a given, but definitely worth mentioning.

00:07:06.560 | The second thing is a curiosity and enthusiasm for machine learning and maybe in particular

00:07:12.240 | language models. That's certainly true in our case. And then the third thing is to do with

00:07:16.800 | basically a fault-first mindset, being able to build systems that can handle things going wrong

00:07:24.160 | in some sense. And yeah, I think the kind of middle point, the curiosity about ML and language

00:07:31.200 | models is probably fairly self-evident. They're going to be working with and prompting and dealing

00:07:36.800 | with the responses from these models. So that's clearly relevant. The last point though, maybe

00:07:41.200 | takes the most explaining to do with this fault-first mindset and the ability to build resilient

00:07:47.120 | systems. The reason that is so important is because compared to normal APIs where normal, think of

00:07:54.560 | something like a Stripe API or a search API or something like this, conventional search API,

00:08:01.520 | the latency when you're working with language models is wild. Like you can get 10X variation.

00:08:08.320 | I mean, I was looking at the stats before actually, before the podcast, we do often normally, in fact,

00:08:13.680 | see a 10X variation in the P90 latency over the course of half an hour, an hour, when we're

00:08:20.400 | prompting these models, which is way higher than if you're working with a more kind of conventional,

00:08:24.800 | conventionally backed API. And the responses that you get, the actual content of the responses

00:08:30.240 | are naturally unpredictable as well. They come back with different formats. Maybe you're expecting

00:08:36.000 | JSON. It's not quite JSON. You have to handle this stuff. And also the semantics of the messages are

00:08:42.160 | unpredictable too, which is a good thing. Like this is one of the things that you're looking for

00:08:46.000 | from these language models, but it all adds up to needing to build a resilient, reliable,

00:08:52.080 | solid feeling system on top of this fundamentally, well, certainly currently fundamentally shaky

00:08:59.600 | foundation. The models do not behave in the way that you would like them to and the ability to

00:09:04.960 | structure the code around them such that it does give the user this warm, assuring, nappy,

00:09:10.640 | solid feeling is really what we're driving for there.

00:09:14.800 | Yeah, I think, sorry, go ahead. Go ahead. You can try, man.

00:09:19.120 | Yeah, that really struck me as we were starting to dig on what this article would contain,

00:09:25.360 | kind of language models as this chaotic medium. Sorry. Let me start again.

00:09:31.200 | Yeah, you can edit that.

00:09:32.720 | What really struck me as we dug in on the content for this article was that third point there. The

00:09:38.320 | language models as this kind of chaotic medium, this dragon, this wild horse you're riding and

00:09:45.040 | trying to guide in the direction that is going to be useful and reliable to users. Because I think

00:09:49.200 | so much of software engineering is about making things not only high performance and snappy,

00:09:54.560 | but really just making it stable, reliable, predictable, which is literally the opposite

00:09:58.880 | of what you get from the language models. And yet, yeah, the output is so useful. And indeed,

00:10:04.720 | some of their creativity, if you want to call it that, which is precisely their value. And so you

00:10:11.760 | need to work with this medium. And I guess the nuanced or the thing that came out of Elisa's

00:10:17.440 | experience that I thought was so interesting is quite a lot of working with that is things that

00:10:21.520 | come from distributed systems engineering. But you have really the AI engineers kind of as

00:10:27.920 | sort of as we're defining them or labeling them on the Elisa team is people who are really

00:10:32.240 | application developers. You're building things for end users. You're thinking about, okay,

00:10:35.520 | I need to populate this interface with some response to user input that's useful to the

00:10:40.320 | tasks they're trying to do. But you have this thing, this medium that you're working with,

00:10:45.040 | that in some ways you need to apply some of this chaos engineering, distributed systems

00:10:50.320 | engineering, which typically those people with those engineering skills are not kind of the

00:10:54.960 | application level developers with the product mindset or whatever. They're more deep in the guts

00:10:58.640 | of a system. And so those skills and knowledge do exist throughout the engineering discipline,

00:11:06.240 | but sort of putting them together into one person, that feels like sort of a unique thing. And

00:11:11.760 | working with the folks on the Elisa team who have that skills, I'm quite struck by that unique blend.

00:11:17.280 | I haven't really seen that before in my 30-year career in technology.

00:11:22.640 | Yeah, that's fascinating. I like the reference to chaos engineering. I have some appreciation. I

00:11:29.680 | think when you had me on your podcast, I was still working at Temporal, and that was like a nice

00:11:34.720 | framework. If you live within Temporal's boundaries, you can pretend that all those

00:11:39.760 | faults don't exist, and you can code in a sort of very fault-tolerant way.

00:11:44.960 | What is you guys' solutions around this, actually? I think you're emphasizing having the mindset,

00:11:52.480 | but maybe naming some technologies would help. Not saying that you have to adopt these technologies,

00:11:59.760 | but they're just quick vectors into what you're talking about when you're talking about distributed

00:12:04.880 | systems. That's such a big, chunky word. Are we talking Kubernetes? I suspect we're not.

00:12:11.200 | We're talking something else now. Yeah, that's right. It's more the

00:12:15.680 | application level rather than at the infrastructure level, at least the way that it works for us.

00:12:21.280 | So there's nothing kind of radically novel here. It is more a careful application of existing

00:12:28.800 | concepts. So the kinds of tools that we reach for to handle these kind of slightly chaotic

00:12:33.600 | objects that Adam was just talking about are retries, and fallbacks, and timeouts, and careful

00:12:39.440 | error handling. Yeah, the standard stuff, really. There's also a great degree of dependence. We rely

00:12:46.560 | heavily on parallelization because these language models are not innately very snappy, and

00:12:52.640 | there's just a lot of I/O going back and forth. All these things I'm talking about, when I was

00:12:58.720 | in my earlier stages of a career, these are the things that are the difficult parts that

00:13:04.560 | more senior software engineers will be better at. It is careful error handling, and concurrency,

00:13:09.600 | and fallbacks, and distributed systems, and eventual consistency, and all this kind of stuff.

00:13:15.840 | And as Adam was saying, the kind of person that is deep in the guts of some kind of distributed

00:13:20.880 | systems, a really high-scale back-end kind of a problem, would probably naturally have these kinds

00:13:25.360 | of skills. But you'll find them on day one if you're building an ML-powered app, even if it's

00:13:31.280 | not got massive scale. I think one thing that I would mention that we do do-- yeah, maybe two

00:13:39.200 | related things, actually. The first is we're big fans of strong typing. We share the types all the

00:13:45.280 | way from the back-end Python code all the way to the front-end in TypeScript, and find that is--

00:13:51.280 | I mean, we're probably doing this anyway, but it really helps one reason around the shapes of the

00:13:55.360 | data, which are going to be going back and forth, and that's really important when you can't rely

00:13:59.360 | upon-- you're going to have to coerce the data that you get back from the ML if you want for it

00:14:05.040 | to be structured, basically speaking. And the second thing which is related is we use checked

00:14:10.800 | exceptions inside our Python code base, which means that we can use the type system to make sure

00:14:15.760 | we are handling, properly handling, all of the various things that could be going wrong, all the

00:14:20.720 | different exceptions that could be getting raised. Checked exceptions are not really particularly

00:14:25.200 | popular, actually. There's not many people that are big fans of them. For our particular use case,

00:14:30.320 | to really make sure that we've not just forgotten to handle this particular type of error, we have

00:14:36.000 | found them useful to force us to think about all the different edge cases that could come up.

00:14:40.800 | That's fascinating. Just a quick note of technology. How do you share types from

00:14:46.720 | Python to TypeScript? Do you use GraphQL? Do you use something else?

00:14:50.800 | We don't use GraphQL. So we've got the types defined in Python, that's the source of truth,

00:14:56.560 | and we go from the open API spec, and there's a tool that we can use to generate types dynamically,

00:15:04.160 | like TypeScript types from those open API definitions.

00:15:07.280 | Okay, cool. Sorry for diving into that rabbit hole a little bit. I always like to spell out

00:15:12.560 | technologies for people to dig their teeth into. One thing I'll mention quickly is that a lot of

00:15:18.720 | the stuff that you mentioned is typically not part of the normal interview loop. It's actually

00:15:22.240 | really hard to interview for, because this is the stuff that you polish out as you go into production.

00:15:30.160 | Coding interviews are typically about the happy path. How do we do that? How do we look for a

00:15:37.520 | defensive, fault-first mindset? Because you can defensive code it all day long,

00:15:40.960 | and not add functionality to your application. Yeah, it's a great question, and I think that's

00:15:47.120 | exactly true. Normally, the interview is about the happy path, and then there's maybe a box

00:15:51.760 | checking exercise at the end of the candidate says, "Of course, in reality, I would handle

00:15:55.440 | the edge cases," or something like this. That, unfortunately, isn't quite good enough when the

00:16:01.680 | happy path is very, very narrow, and there's lots of weirdness on either side. Basically speaking,

00:16:09.440 | it's just a case of foregrounding those kind of concerns through the interview process.

00:16:14.240 | There's no magic to it. We talk about this in the post that we're going to be putting up on

00:16:20.320 | LatentSpace, but there's two main technical exercises that we do through our interview

00:16:26.480 | process for this role. The first is more coding-focused, and the second is more system

00:16:30.240 | design-y, whiteboarding a potential solution. Without giving too much away, in the coding

00:16:35.360 | exercise, you do need to think about edge cases. You do need to think about errors.

00:16:43.440 | How best to put this? Yeah, the exercise consists of adding features and fixing bugs inside the

00:16:51.280 | code base. In both of those two cases, it does demand, because of the way that we set the

00:16:56.560 | application up and the interview up, it does demand that you think about something other than

00:17:00.800 | the happy path. But your thinking is the right prompt of how do we get the candidate thinking

00:17:05.920 | outside of the normal sweet spot, smoothly paved path. In terms of the system design interview,

00:17:14.160 | that's a little easier to prompt this fault-first mindset, because it's very easy in that situation

00:17:20.640 | just to say, let's imagine that this node dies. How does the app still work? Let's imagine that

00:17:26.400 | this network is going super slow. Let's imagine that, I don't know, you run out of capacity in

00:17:32.880 | this database that you've sketched out here. How do you handle that sort of stuff? So, in both cases,

00:17:39.200 | they're not firmly anchored to and built specifically around language models and

00:17:44.640 | ways language models can go wrong, but we do exercise the same muscles of thinking defensively

00:17:50.560 | and foregrounding the edge cases, basically. Yeah, any comment there?

00:17:54.960 | Yeah, I guess I wanted to mention too, James, earlier there, you mentioned retries,

00:18:01.600 | and this is something that I think I've seen some interesting debates internally about things

00:18:06.160 | regarding, first of all, retries can be costly, right? In general, this medium, in addition to

00:18:11.840 | having this incredibly high variance and response rate and being non-deterministic,

00:18:17.040 | is actually quite expensive. And so, in many cases, doing a retry when you get a fail

00:18:21.280 | does make sense, but actually that has an impact on cost. And so, there is some sense to which,

00:18:26.720 | at least I've seen the AI engineers on our team worry about that. They worry about, okay, how do

00:18:32.160 | we give the best user experience, but balance that against what the infrastructure is going to cost

00:18:37.920 | our company, which I think is, again, an interesting mix of, yeah, again, it's a little bit the

00:18:43.360 | distributed system mindset, but it's also a product perspective and you're thinking about

00:18:47.760 | the end user experience, but also the bottom line for the business. You're bringing together a lot

00:18:52.400 | of qualities there. And there's also the fallback case, which is kind of a related or adjacent one.

00:18:58.000 | I think there was also a discussion on that internally where, I think it maybe was search,

00:19:01.840 | there was something recently where there was one of the frontline search providers was having some,

00:19:06.720 | yeah, slowness and outages, and essentially then we had a fallback, but essentially that gave people

00:19:11.920 | for a while, especially new users that come in that don't know the difference, they're getting

00:19:16.320 | worse results for their search. And so, then you have this debate about, okay, there's sort of what

00:19:21.920 | is correct to do from an engineering perspective, but then there's also what actually is the best

00:19:28.320 | result for the user. Is giving them a kind of a worse answer to their search result

00:19:32.160 | better, or is it better to kind of give them an error and be like, yeah, sorry, it's not working

00:19:36.480 | right at the moment, try later. Both are obviously non-optimal, but this is the kind of thing I think

00:19:42.640 | that you run into or the kind of thing we need to grapple with a lot more than you would other kinds

00:19:48.400 | of medians. Yeah, that's a really good example. I think it brings to the fore the two different

00:19:55.920 | things that you could be optimizing for of uptime and response at all costs on one end of the

00:20:01.520 | spectrum, and then effectively fragility, but kind of, if you get a response, it's the best response

00:20:06.880 | we can come up with at the other end of the spectrum. And where you want to land there kind

00:20:10.160 | of depends on, well, it certainly depends on the app, obviously depends on the user. I think it

00:20:13.760 | depends on the feature within the app as well. So in the search case that you mentioned there,

00:20:20.320 | in retrospect, we probably didn't want to have the fallback. And we've actually just recently

00:20:24.480 | on Monday changed that to show an error message rather than giving people a kind of degraded

00:20:30.400 | experience. In other situations, we could use, for example, a large language model from provider B

00:20:38.880 | rather than provider A, and get something which is within a few percentage points performance.

00:20:45.120 | And that's just a really different situation. Yeah, like any interesting question, the answer is

00:20:50.000 | it depends. I do hear a lot of people suggesting, let's call this model shadowing as a defensive

00:21:00.560 | technique, which is if open AI happens to be down, which happens more often than people think,

00:21:06.400 | then you fall back to entropic or something. How realistic is that? Don't you have to develop

00:21:11.840 | completely different prompts for different models, and won't the performance of your

00:21:16.800 | application suffer for whatever reason? It maybe calls differently, or it's not maintained in the

00:21:22.080 | same way. I think that people raise this idea of fallbacks to models, but I don't see it practiced

00:21:32.880 | very much. Yeah, it is. You definitely need to have a different prompt if you want to stay within

00:21:39.840 | a few percentage points degradation, like I said before. And that certainly comes at a cost of

00:21:47.120 | fallbacks and backups and things like this. It's really easy for them to go stale and kind of flake

00:21:53.280 | out on you because they're off the beaten track. And in our particular case inside of Elicit,

00:22:02.080 | we do have fallbacks for a number of crucial functions where it's going to be very obvious

00:22:08.880 | if something has gone wrong, but we don't have fallbacks in all cases. It really depends on a

00:22:15.920 | task-to-task basis throughout the app, so I can't give you a single simple rule of thumb for,

00:22:21.440 | in this case, do this, and in the other, do that. But yeah, it's a little bit easier now that

00:22:28.480 | the APIs between the Anthropic models and OpenAI are more similar than they used to be,

00:22:33.440 | so we don't have two totally separate code paths with different protocols, like wire protocols,

00:22:38.240 | to speak, which makes things easier. But you're right, you do need to have different prompts if

00:22:42.480 | you want to have similar performance across the providers. I'll also note, just observing again

00:22:47.040 | as a relative newcomer here, I was surprised, impressed, I'm not sure what the word is for it,

00:22:52.640 | at the blend of different backends that the team is using, and so there's many,

00:22:57.840 | the product presents as kind of one single interface, but there's actually several dozen

00:23:04.240 | kind of main paths. There's like, for example, the search versus a data extraction of a certain type

00:23:09.520 | versus chat with papers versus, and each one of these, you know, the team has worked very hard

00:23:14.640 | to pick the right model for the job and craft the prompt there, but also is constantly testing new

00:23:21.120 | ones. So a new one comes out from either from the big providers, or in some cases, our own models

00:23:28.160 | that are, you know, running on essentially our own infrastructure, and sometimes that's more about

00:23:33.360 | cost or performance, but the point is kind of switching very fluidly between them, and very

00:23:39.520 | quickly, because this field is moving so fast, and there's new ones to choose from all the time,

00:23:43.920 | is like part of the day-to-day, I would say, so it isn't more of a like, there's a main one,

00:23:48.720 | it's been kind of the same for a year, there's a fallback, but it's got cobwebs on it, it's more

00:23:53.280 | like which model and which prompt is changing weekly, and so I think it's quite reasonable to

00:24:00.320 | have a fallback that you can expect might work. I'm curious, because you guys have had experience

00:24:07.840 | working at both, you know, Elicit, which is a smaller operation, and larger companies,

00:24:12.240 | a lot of companies are looking at this with a certain amount of trepidation as, you know,

00:24:17.600 | it's very chaotic. When you have one engineering team that knows everyone else's names, and like,

00:24:25.040 | you know, they meet constantly in Slack and know what's going on, it's easier to sync on

00:24:30.080 | technology choices. When you have 100 teams, all shipping AI products, and all making their own

00:24:34.400 | independent tech choices, it can be very hard to control. One solution I'm hearing from the

00:24:39.920 | sales forces of the world, and Walmarts of the world, is that they are creating their own AI

00:24:44.160 | gateway, right? Internal AI gateway. This is the one model hub that controls all the things,

00:24:48.640 | and has all standards. Is that a feasible thing? Is that something that you would want? Is that

00:24:54.080 | something you have and you're working towards? What are your thoughts on this stuff? Like,

00:24:58.160 | centralization of control, or like an AI platform internally?

00:25:02.000 | Yeah, I think certainly for larger organizations, and organizations that are doing things which

00:25:10.320 | maybe are running into HIPAA compliance, or other legislative tools like that,

00:25:18.160 | it could make a lot of sense. Yeah. I think for the TLDR for something like Elicit is,

00:25:24.480 | we are small enough, as you indicated, and need to have full control over all the levers available,

00:25:32.400 | and switch between different models, and different prompts, and whatnot. As Adam was just saying,

00:25:36.720 | that kind of thing wouldn't work for us. But yeah, I've spoken with and

00:25:40.560 | advised a couple of companies that are trying to sell into that kind of a space, or at a larger

00:25:48.000 | stage, and it does seem to make a lot of sense for them. So, for example, if you're trying to sell

00:25:53.600 | to a large enterprise, and they cannot have any data leaving the EU, then you need to be really

00:26:00.080 | careful about someone just accidentally putting in the sort of US-East-1 GPT4 endpoints, or something

00:26:08.080 | like this. If you're... Do you want to think of a more specific example there? Yeah. I think the...

00:26:15.920 | I'd be interested in understanding better what the specific problem is that they're looking

00:26:22.960 | to solve with that, whether it is to do with data security, or centralization of billing,

00:26:29.360 | or if they have a kind of suite of prompts, or something like this, that people can choose from,

00:26:34.160 | so they don't need to reinvent the wheel again and again. I wouldn't be able to say without

00:26:39.040 | understanding the problems and their proposed solutions, you know, which kind of situations

00:26:42.560 | that'd be better or worse fit for. But yeah, for Elicit, where really the secret sauce,

00:26:50.080 | if there is a secret sauce, is which models we're using, how we're using them, how we're combining

00:26:54.000 | them, how we're thinking about the user problem, how we're thinking about all these pieces coming

00:26:57.440 | together. You really need to have all of the affordances available to you to be able to

00:27:03.200 | experiment with things and iterate rapidly. And generally speaking, whenever you put these kind

00:27:08.800 | of layers of abstraction, and control, and generalization in there, that gets in the way.

00:27:13.840 | So for us, it would not work. Do you feel like there's always a tendency to want to reach for

00:27:19.520 | standardization and abstractions pretty early in a new technology cycle? There's something

00:27:24.720 | comforting there, or you feel like you can see them, or whatever. I feel like there's some of

00:27:28.000 | that discussion around lang chain right now. But yeah, this is not only so early, but also moving

00:27:33.760 | so fast. I think it's tough to ask for that. That's not the space we're in. But yeah, the

00:27:42.160 | larger an organization, the more that's your default is to want to reach for that. It's a

00:27:47.360 | sort of comfort. Yeah, that's interesting. I find it interesting that you would say that

00:27:53.200 | being a founder of Heroku, where you were one of the first platforms as a service that

00:27:58.960 | more or less standardized what that early development experience should have looked like.

00:28:03.840 | And I think basically people are feeling the differences between calling various model lab

00:28:08.800 | APIs and having an actual AI platform where all their development needs are thought of for them.

00:28:19.200 | I define this in my AI engineer post as well. The model labs just see their job ending at

00:28:24.720 | serving models, and that's about it. But actually, the responsibility of the AI engineer has to fill

00:28:29.440 | in a lot of the gaps beyond that. Yeah, that's true. I think a huge part of the exercise with

00:28:36.960 | Heroku, which was largely inspired by Rails, which itself was one of the first frameworks

00:28:42.240 | to standardize the CRUD app with the SQL database, and people have been building apps like that for

00:28:47.600 | many, many years. I had built many apps. I had made my own kind of templates based on that. I

00:28:51.600 | think others had done it. And Rails came along at the right moment, where we had been doing it long

00:28:56.240 | enough that you see the patterns, and then you can say, look, let's extract those into a framework

00:29:01.840 | that's going to make it not only easier to build for the experts, but for people who are relatively

00:29:06.400 | new, the best practices are encoded into that framework, in a model controller, to take one

00:29:13.120 | example. But then, yeah, once you see that, and once you experience the power of a framework,

00:29:17.840 | and again, it's so comforting, and you develop faster, and it's easier to onboard new people

00:29:23.840 | to it because you have these standards and this consistency, then folks want that for something

00:29:30.560 | new that's evolving. Now, here I'm thinking maybe if you fast forward a little to, for example,

00:29:34.080 | when React came on the scene a decade ago or whatever, and then, okay, we need to do state

00:29:39.120 | management, what's that? And then there's a new library every six months. Okay, this is the one,

00:29:43.680 | this is the gold standard. And then six months later, that's deprecated. Because, of course,

00:29:48.560 | it's evolving. You need to figure it out. The tacit knowledge and the experience of putting

00:29:53.040 | it in practice and seeing what those real needs are, are critical. And so it is really about

00:29:59.360 | finding the right time to say, yes, we can generalize, we can make standards and abstractions,

00:30:06.320 | whether it's for a company, whether it's for an open source library, for a whole class of apps,

00:30:11.920 | and it's very much a much more of a judgment call/just a sense of taste or experience to be

00:30:21.280 | able to say, yeah, we're at the right point, we can standardize this. But it's at least my very,

00:30:27.680 | again, and I'm so new to that, this world compared to you both, but my sense is, yeah, still the

00:30:33.360 | Wild West, that's what makes it so exciting and feels kind of too early for too much in the way of

00:30:40.400 | standardized abstractions. Not that it's not interesting to try, but you can't necessarily

00:30:45.440 | get there in the same way Rails did until you've got that decade of experience of whatever building

00:30:50.000 | different classes of apps in that, with that technology. Yeah, it's interesting to think

00:30:56.720 | about what is going to stay more static and what is expected to change over the coming five years,

00:31:02.640 | let's say, which seems like, when I think about it through an ML lens, is an incredibly long time.

00:31:07.360 | And if you just said five years, it doesn't seem that long. I think that kind of talks to part of

00:31:11.520 | the problem here is that things that are moving are moving incredibly quickly. I would expect,

00:31:16.480 | this is my hot take rather than some kind of official carefully thought out position, but

00:31:20.320 | my hot take would be something like, you'll be able to get to good quality apps without doing

00:31:29.360 | really careful prompt engineering. I don't think that prompt engineering is going to be

00:31:33.200 | a kind of durable differential skill that people will hold. I do think that the way that you set

00:31:41.120 | up the ML problem to kind of ask the right questions, if you see what I mean, rather than

00:31:45.440 | the specific phrasing of exactly how you're doing chain of thought or few shot or something in the

00:31:50.880 | prompt, I think the way that you set it up is probably going to remain to be trickier for

00:31:57.280 | longer. And I think some of the operational challenges that we've been talking about of

00:32:00.960 | wild variations in latency and handling the... I mean, one way to think about these models is

00:32:09.120 | the first lesson that you learn when you're an engineer, software engineer, is that you need to

00:32:13.040 | sanitize user input, right? I think it was the top OWASP security threat for a while. You have to

00:32:18.800 | sanitize and validate user input. And we got used to that. And it kind of feels like this is the

00:32:24.800 | shell around the app and then everything else inside you're kind of in control of, and you can

00:32:29.200 | grasp and you can debug, et cetera. And what we've effectively done is through some kind of weird

00:32:35.280 | rear guard action, we now got these slightly chaotic things. I think of them more as complex

00:32:40.080 | adaptive systems, which are related, but a bit different, definitely have some of the same

00:32:44.080 | dynamics. We've injected these into the foundations of the app. And you kind of now need to think

00:32:51.120 | with this defensive mindset downwards as well as upwards, if you see what I mean.

00:32:56.400 | So I think it will take a while for us to truly wrap our heads around that. Also, these kinds of

00:33:04.160 | problems, you have to handle things being unreliable and slow sometimes and whatever else,

00:33:09.920 | even if it doesn't happen very often, there isn't some kind of industry-wide accepted way of

00:33:15.360 | handling that at massive scale. There are definitely patterns and anti-patterns and

00:33:20.560 | tools and whatnot, but it's not like this is a solved problem. So I would expect that

00:33:25.040 | it's not going to go down easily as a solvable problem at the ML scale either.

00:33:30.240 | Yeah, excellent. I would describe in the terminology of the stuff that I've written

00:33:37.120 | in the past, I described this inversion of architecture as sort of LLM at the core versus

00:33:41.760 | LLM or code at the core. We're very used to code at the core. Actually, we can scale that very well.

00:33:47.520 | When we build LLM core apps, we have to realize that the central part of our app that's

00:33:52.320 | orchestrating things is actually prone to prompt injections and non-determinism and all that good

00:33:59.280 | stuff. I did want to move the conversation a little bit from the sort of defensive side of

00:34:04.240 | things to the more offensive or the fun side of things, capabilities side of things, because that

00:34:10.000 | is the other part of the job description that we kind of skimmed over. So I'll repeat what you said

00:34:15.680 | earlier. You want people to have a genuine curiosity and enthusiasm for the capabilities

00:34:20.000 | of language models. We're recording this the day after Anthropic just dropped Cloud 3.5.

00:34:26.720 | I was wondering, maybe this is a good exercise, is how do people have curiosity and enthusiasm

00:34:33.440 | for capabilities and language models when, for example, the research paper for Cloud 3.5 is four

00:34:38.320 | pages? There's not much. Yeah. Well, maybe that's not a bad thing, actually, in this particular

00:34:48.960 | case. So yeah, if you really want to know exactly how the sausage was made, that hasn't been possible

00:34:54.320 | for a few years now, in fact, for these new models. But from our perspective, when we're

00:35:00.560 | building Illicit, what we primarily care about is what can these models do? How do they perform

00:35:05.360 | on the tasks that we already have set up and the evaluations we have in mind? And then on a

00:35:09.760 | slightly more expansive note, what kinds of new capabilities do they seem to have? Can we illicit,

00:35:17.600 | no pun intended, from the models? For example, well, there's very obvious ones like multimodality.

00:35:23.760 | There wasn't that, and then there was that. Or it could be something a bit more subtle,

00:35:28.800 | like it seems to be getting better at reasoning, or it seems to be getting better at metacognition,

00:35:34.560 | or it seems to be getting better at marking its own work and giving calibrated confidence

00:35:39.920 | estimates, things like this. Yeah, there's plenty to be excited about there. It's just that,

00:35:45.600 | yeah, there's rightly or wrongly been this shift over the last few years to not give all the

00:35:52.720 | details. No, but from application development perspective, every time there's a new model

00:35:57.360 | released, there's a flow of activity in our Slack, and we try to figure out what it can do,

00:36:00.560 | what it can't do, run our evaluation frameworks. And yeah, it's always an exciting, happy day.

00:36:05.680 | Yeah, from my perspective, what I'm seeing from the folks on the team is, first of all, just

00:36:13.200 | awareness of the new stuff that's coming out. So that's an enthusiasm for the space and following

00:36:20.080 | along. And then being able to very quickly, partially that's having Slack to do this,

00:36:24.880 | but be able to quickly map that to, okay, what does this do for our specific case?

00:36:30.880 | And the simple version of that is let's run the evaluation framework, which Alyssa has quite a

00:36:38.160 | comprehensive one. I'm actually working on an article on that right now, which I'm very excited

00:36:42.880 | about, because it's a very interesting world of things. But basically you can just try the new

00:36:49.440 | model in the evaluations framework, run it. It has a whole slew of benchmarks, which includes not just

00:36:55.440 | accuracy and confidence, but also things like performance, cost and so on. And all of these

00:37:00.400 | things may trade off against each other. Maybe it's actually, it's very slightly worse, but it's

00:37:05.840 | way faster and way cheaper. So actually this might be a net win, for example, or it's way more

00:37:12.880 | accurate, but that comes at it's slower and higher cost. And so now you need to think about those

00:37:18.560 | trade-offs. And so to me, coming back to the qualities of an AI engineer, especially when

00:37:23.200 | you're trying to hire for them, it is very much an application developer in the sense of a product

00:37:29.280 | mindset of what are our users or our customers trying to do? What problem do they need solved?

00:37:35.360 | Or what does our product solve for them? And how does the capabilities of a particular model

00:37:41.120 | potentially solve that better for them than what exists today? And by the way, what exists today

00:37:46.880 | is becoming an increasingly gigantic cornucopia of things, right? And so you say, okay, this new

00:37:52.800 | model has these capabilities, therefore the simple version of that is plug it into our existing

00:37:57.600 | evaluations and just look at that and see if it seems like it's better for a straight out swap out.

00:38:02.720 | But when you talk about, for example, you have multimodal capability and then you say, okay,

00:38:07.120 | wait a minute, actually maybe there's a new feature or a whole new way we could be using it,

00:38:11.760 | not just a simple model swap out, but actually a different thing we could do that we couldn't do

00:38:16.080 | before that would have been too slow or too inaccurate or something like that, that now

00:38:21.520 | we do have the capability to do. So I think of that as being a kind of core skill. I don't even

00:38:27.040 | know if I want to call it a skill. Maybe it's even like an attitude or a perspective, which is a

00:38:31.360 | desire to both be excited about the new technology, the new models and things as they come along,

00:38:36.400 | but also holding in the mind, what does our product do? Who is our user? And how can we

00:38:43.200 | connect the capabilities of this technology to how we're helping people in whatever it is our

00:38:48.640 | product does? Yeah. I'm just looking at one of our internal Slack channels where we talk about

00:38:54.240 | things like new model releases and that kind of thing. And it is notable looking through these,

00:38:59.920 | the kind of things that people are excited about and not, I don't know, the context, the context

00:39:04.880 | window is much larger or it's look at how many parameters it has or something like this. It's

00:39:09.760 | always framed in terms of maybe this could be applied to that kind of part of Elicit,

00:39:13.280 | or maybe this would open up this new possibility for Elicit. And as Adam was saying, yeah, I don't

00:39:18.080 | think it's really a novel or separate skill. It's the kind of attitude I would like to have all

00:39:24.240 | engineers to have a company our stage actually, and maybe more generally even, which is not just

00:39:32.160 | kind of getting nerd sniped by some kind of technology number, fancy metric or something,

00:39:38.000 | but how is this actually going to be applicable to the thing which matters in the end? How is

00:39:42.720 | this going to help users? How is this going to help move things forward strategically? That kind

00:39:46.000 | of thing. Yeah, applying what you know, I think is the key here. Getting hands on as well. I would

00:39:53.120 | recommend a few resources for people listening along. The first is Elicit's ML reading list,

00:39:58.800 | which I found so delightful after talking with Andreas about it. It looks like that's part of

00:40:04.800 | your onboarding. We've actually set up an asynchronous paper club instead of my discord

00:40:09.120 | for people following on that reading list. I love that you separate things out into tier one and

00:40:12.880 | two and three, and that gives people a factored cognition way of looking into the corpus, right?

00:40:20.320 | Yes, the corpus of things to know is growing and the water is slowly rising as far as what a bar

00:40:26.320 | for a competent AI engineer is, but I think having some structured thought as to what are the big

00:40:32.320 | ones that everyone must know, I think is key. It's something I haven't really defined for people,

00:40:38.000 | and I'm glad that Elicit actually has something out there that people can refer to.

00:40:41.520 | I wouldn't necessarily make it required for the job interview maybe, but it'd be interesting to

00:40:49.760 | see what would be a red flag if some AI engineer would not know. I don't know where we would stoop

00:40:57.840 | to call something required knowledge, or you're not part of the cool kids club, but there increasingly

00:41:04.640 | is something like that, right? Not knowing what context is is a black mark in my opinion, right?

00:41:08.960 | Yeah, I think it does connect back to what we were saying before of this genuine curiosity

00:41:15.200 | about ML. Well, maybe it's actually that combined with something else which is really important,

00:41:19.440 | which is a self-starting bias towards action kind of a mindset, which again-

00:41:23.520 | Everybody needs.

00:41:24.160 | Exactly, yeah. Everyone needs that, so if you put those two together, or if I'm truly curious about

00:41:30.160 | this and I'm going to figure out how to make things happen, then you end up with people reading

00:41:36.400 | reading lists, reading papers, doing side projects, this kind of thing. So it isn't something that we

00:41:42.240 | explicitly include. We don't have an ML-focused interview for the AI engineer role at all,

00:41:47.200 | actually. It doesn't really seem helpful. The skills which we are checking for, as I mentioned

00:41:54.400 | before, this fault-first mindset and conventional software engineering kind of thing, it's point one

00:42:02.160 | and point three on the list that we talked about. In terms of checking for ML curiosity and how

00:42:08.400 | familiar they are with these concepts, that's more through talking interviews and culture fit types of

00:42:14.080 | things. We want for them to have a take on what Elisa is doing, certainly as they progress through

00:42:19.280 | the interview process. They don't need to be completely up-to-date on everything we've ever

00:42:23.360 | done on day zero, although that's always nice when it happens. But for them to really engage

00:42:28.880 | with it, ask interesting questions, and be kind of brought into our view on how we want ML to

00:42:35.840 | proceed, I think that is really important and that would reveal that they have this kind of interest,

00:42:41.440 | this ML curiosity. There's a second aspect to that. I don't know if now's the right time to

00:42:46.160 | talk about it, which is I do think that an ML-first approach to building software is something of a

00:42:52.960 | different mindset. I could describe that a bit now if that seems good, but up to you.

00:42:58.560 | So yeah, I think when I joined Elicit, this was the biggest adjustment that I had to make

00:43:03.680 | personally. So as I said before, I'd been effectively building conventional software

00:43:07.760 | stuff for 15 years or so, something like this, well for longer actually, but professionally for

00:43:11.840 | like 15 years, and had a lot of pattern matching built into my brain and kind of muscle memory for

00:43:19.440 | if you see this kind of a problem, then you do that kind of a thing. And I had to unlearn quite

00:43:23.600 | a lot of that when joining Elicit because we truly are ML-first and try to use ML to the fullest.

00:43:30.400 | And some of the things that that means is this relinquishing of control almost. At some point,

00:43:37.280 | you are calling into this fairly opaque black box thing and hoping it does the right thing,

00:43:43.120 | and dealing with the stuff that it sends back to you. And that's just very different if you're

00:43:46.960 | interacting with, again, APIs and databases, that kind of a thing. You can't just keep on debugging.

00:43:52.720 | At some point, you hit this obscure wall. And I think the second part to this is,

00:43:58.800 | the pattern I was used to is that the external parts of the app are where most of the messiness

00:44:05.920 | is, not necessarily in terms of code, but in terms of degrees of freedom almost. If the user

00:44:12.400 | can and will do anything at any point, and they'll put all sorts of wonky stuff inside of text inputs,

00:44:17.920 | and they'll click buttons you didn't expect them to click, and all this kind of thing.

00:44:21.040 | But then by the time you're down into your SQL queries, for example, as long as you've done your

00:44:25.760 | input validation, things are pretty well defined. And that, as we said before, is not really the

00:44:30.720 | case. When you're working with language models, there is this kind of intrinsic uncertainty when

00:44:36.400 | you get down to the kernel, down to the core. Even beyond that, all that stuff is somewhat

00:44:41.840 | defensive, and these are things to be wary of to some degree. The flip side of that, the really

00:44:47.200 | kind of positive part of taking an ML-first mindset when you're building applications,

00:44:51.520 | is that once you get comfortable taking your hands off the wheel at a certain point, and

00:44:56.560 | relinquishing control, letting go, really kind of unexpected, powerful things can happen if you

00:45:03.200 | lean on the capabilities of the model without trying to overly constrain and slice and dice

00:45:09.280 | problems to the point where you're not really wringing out the most capability from the model

00:45:14.240 | that you might. So, I was trying to think of examples of this earlier, and one that came

00:45:20.640 | to mind was we were working really early, just after I joined Elicit, we were working on something

00:45:27.360 | where we wanted to generate text and include citations embedded within it. So, it'd have a

00:45:31.760 | claim, and then, you know, square brackets, one, in superscript, something like this.

00:45:36.320 | And every fiber in my being was screaming that we should have some way of kind of forcing this

00:45:42.640 | to happen, or structured output, such that we could guarantee that this citation was always

00:45:47.520 | going to be present later on, you know, that the kind of the indication of a footnote would actually

00:45:52.800 | match up with the footnote itself, and kind of went into this symbolic, "I need full control"

00:45:59.440 | kind of mindset. And it was notable that Andreas, who's our CEO, again, has been on the podcast,

00:46:06.240 | was the opposite. He was just kind of, "Give it a couple of examples, and it'll probably be fine,

00:46:10.720 | and then we can kind of figure out with a regular expression at the end." It really did not sit well

00:46:15.440 | with me, to be honest. I was like, "But it could say anything. It could literally say anything."

00:46:19.680 | And I don't know about just using a regex to sort of handle this. This is an important feature of

00:46:23.840 | the app. But, you know, that's my first kind of starkest introduction to this ML-first mindset,

00:46:31.600 | I suppose, which Andreas has been cultivating for much longer than me, much longer than most.

00:46:37.200 | Yeah, there might be some surprises of stuff you get back from the model, but you can also...

00:46:43.360 | it's about finding the sweet spot, I suppose, where you don't want to give a completely open-ended

00:46:50.400 | prompt to the model and expect it to do exactly the right thing. You can ask it too much,

00:46:56.320 | and it gets confused, and starts repeating itself, or goes around in loops, or just goes off in a

00:47:00.240 | random direction, or something like this. But you can also over-constrain the model and not really

00:47:05.520 | make the most of the capabilities. And I think that is a mindset adjustment that most people who

00:47:10.400 | are coming into AI engineering afresh would need to make of giving up control and expecting that

00:47:18.240 | there's going to be a little bit of extra pain and defensive stuff on the tail end. But the

00:47:23.280 | benefits that you get as a result are really striking. That was a brilliant start. The ML-first

00:47:29.760 | mindset, I think, is something that I struggle with as well, because the errors, when they do

00:47:33.120 | happen, are bad. They will hallucinate, and your systems will not catch it sometimes if you don't

00:47:41.760 | have a large enough sample set. I'll leave it open to you, Adam. What else do you think about

00:47:48.640 | when you think about curiosity and exploring capabilities? Are there reliable ways to get

00:47:58.240 | people to push themselves on capabilities? Because I think a lot of times we have this

00:48:02.720 | implicit over-confidence, maybe, of we think we know what it is, what a thing is, when actually

00:48:07.280 | we don't. And we need to keep a more open mind. And I think you do a particularly good job of

00:48:11.760 | always having an open mind. And I want to get that out of more engineers that I talk to,

00:48:16.880 | but I struggle sometimes. And I can scratch that question if nothing comes to mind.

00:48:21.840 | Yeah. I suppose being an engineer is, at its heart, this sort of contradiction of,

00:48:28.640 | on one hand, systematic, almost very literal, wanting to control exactly what James described,

00:48:37.040 | understand everything, model it in your mind, precision, systematizing. But fundamentally,

00:48:47.200 | it is a creative endeavor. At least I got into creating with computers because I saw them as a

00:48:52.160 | canvas for creativity, for making great things, and for making a medium for making things that are

00:48:57.120 | so multidimensional that it goes beyond any medium humanity's ever had for creating things.

00:49:05.200 | So I think or hope that a lot of engineers are drawn to it partially because you need both of

00:49:11.760 | those. You need that systematic, controlling side, and then the creative, open-ended,

00:49:17.840 | almost like artistic side. And I think it is exactly the same here. In fact, if anything,

00:49:22.960 | I feel like there's a theme running through everything James has said here, which is,

00:49:26.800 | in many ways, what we're looking for in an AI engineer is not really all that fundamentally

00:49:31.840 | different from other, call it conventional engineering or other types of engineering,

00:49:38.160 | but working with this strange new medium that has these different qualities. But in the end,

00:49:42.560 | a lot of the things are an amalgamation of past engineering skills. And I think that mix of

00:49:49.200 | curiosity, artistic, open-ended, what can we do with this, with a desire to systematize, control,

00:49:56.080 | make reliable, make repeatable, is the mix you need. And trying to find that balance,

00:50:02.720 | I think, is probably where it's at. Fundamentally, I think people who are getting into this field,

00:50:08.000 | to work on this, is because they're excited by the promise and the potential of the technology.

00:50:14.000 | So to not have that kind of creative, open-ended, curiosity side would be surprising. Why do it

00:50:23.040 | otherwise? So I think that blend is always what you're looking for broadly. But here,

00:50:30.320 | now we're just scoping it to this new world of language models.

00:50:33.520 | And I think the two technical aspects of the... Let me start that again.

00:50:40.160 | I think the fault-first mindset and the ML curiosity attitude could be somewhat in tension,

00:50:50.240 | right? Because, for example, the stereotypical version of someone that is great at building

00:50:56.240 | fault-tolerant systems has probably been doing it for a decade or two. They've been principal

00:51:00.640 | engineer at some massive scale technology company. And that kind of a person might be less

00:51:06.560 | able to turn on a dime and relinquish control and be creative and take on this different mindset.

00:51:14.880 | Whereas someone who's very early in their career is much more able to do that kind of

00:51:19.360 | exploration and follow their curiosity kind of a thing. And they might be a little bit less

00:51:25.040 | practiced in how to serve terabytes of traffic every day, obviously.

00:51:29.520 | Yeah, the stereotype that comes to mind for me with those two you just described is the

00:51:34.960 | principal engineer, fault-tolerance, handle unpredictable, is kind of grumpy and always

00:51:42.080 | skeptical of anything new and it's probably not going to work and that sort of thing. Whereas

00:51:47.360 | that fresh-faced early in their career, maybe more application-focused, and it's always thinking

00:51:52.800 | about the happy path and the optimistic and, "Oh, don't worry about the edge case. That probably

00:51:56.640 | won't happen." I don't write code with bugs, I don't know, whatever, like this. But really need

00:52:03.040 | both together, I think. Both of those attitudes or personalities, if that's even the right way

00:52:08.080 | to put it, together in one is, I think, what's-- Yeah, and I think people can come from either

00:52:12.880 | end of the spectrum, to be clear. Not all grizzled principal engineers are the way that I'm described,

00:52:21.520 | thankfully. Some probably are. And not all junior engineers are allergic to writing careful software

00:52:28.960 | or unable and unexcited to pick that up. Yeah, it could be someone that's in the middle of the

00:52:34.640 | career and naturally has a bit of both, could be someone at either end and just wants to round out

00:52:39.680 | their skill set and lean into the thing that they're a bit weaker on. Any of the above would

00:52:44.400 | work well for us. Okay, lovely. We've covered a fair amount of like-- Actually, I think we've

00:52:51.680 | accidentally defined AI engineering along the way as well, because you kind of have to do that

00:52:55.520 | in order to hire and interview for people. The last piece I wanted to offer to our audience is

00:53:01.760 | sourcing. A very underappreciated part, because people just tend to rely on recruiters and assume

00:53:08.960 | that the candidates fall from the sky. But I think the two of you have had plenty of experience with

00:53:14.320 | really good sourcing, and I just want to leave some time open for what does AI engineer sourcing

00:53:19.440 | look like? Is it being very loud on Twitter? Well, I mean, that definitely helps. I am really

00:53:25.440 | quiet on Twitter, unfortunately, but a lot of my teammates are much more effective on that front,

00:53:29.280 | which is deeply appreciated. I think in terms of-- Maybe I'll focus a little bit more on

00:53:35.920 | active/outbound, if you will, rather than the kind of marketing/branding type of work that

00:53:43.840 | Adam's been really effective with us on. The kinds of things that I'm looking for are certainly side

00:53:48.880 | projects. It's really easy still. We're early enough on in this process that people can still

00:53:54.400 | do interesting-- Pretty much at the cutting edge, not in terms of training whole models, of course,

00:53:59.600 | but in terms of doing AI engineering. You can very much build interesting apps that have interesting

00:54:04.880 | ideas and work well just using a basic open AI API key. People sharing that kind of stuff on

00:54:14.480 | Twitter is always really interesting, or in Discord or Slacks, things like this. In terms of

00:54:19.920 | the kind of caricature of the grizzled principal engineer kind of a person, it's notable. I've

00:54:27.360 | spoken with a bunch of people coming from that kind of perspective. They're fairly easy to find.

00:54:32.160 | They tend to be on LinkedIn. They tend to be really obvious on LinkedIn because they're maybe

00:54:37.680 | a bit more senior. They've got a ton of connections. They're probably expected to post thought

00:54:43.280 | leadership kinds of things on LinkedIn. Everyone's favorite. Some of those people are interested in

00:54:49.280 | picking up new skills and jumping into ML and large language models. Sometimes it's obvious

00:54:54.480 | from a profile. Sometimes you just need to reach out and introduce yourself and say, "Hey, this is

00:54:58.800 | what we're doing. We think we could use your skills." A bunch of them will bite your hand off,

00:55:04.000 | actually, because it is such an interesting area. That's how we've found success at sourcing on the

00:55:11.040 | kind of more experienced end of the spectrum. I think on the less experienced end of the spectrum,

00:55:15.920 | having lots of hooks in the ocean seems to be a good strategy if I think about what's worked for

00:55:21.840 | us. It tends to be much harder to find those people because they have less of an online presence in

00:55:27.520 | terms of active outbound. Things like blog posts, things like hot takes on Twitter, things like

00:55:35.600 | challenges that we might have, those are the kind of vectors through which you can find these keen,

00:55:43.200 | full of energy, less experienced people and bring them towards you.

00:55:47.760 | Adam, do you have anything? You're pretty good on Twitter compared to me, at least. What's your

00:55:54.720 | take on, yeah, the kind of more like bring stuff out there and have people come towards you for

00:55:59.840 | this kind of a role? Yeah, I do typically think of sourcing as being the one-two punch of one,

00:56:05.520 | raise the beacon. Let the world know that you are working on interesting problems and you're

00:56:11.840 | expanding your team and maybe there's a place for someone like them on that team. That could come in

00:56:17.360 | a variety of forms, whether it's going to a job fair and having a booth. Obviously, it's job

00:56:22.800 | descriptions posted to your site. It's obviously things like, in some cases, yeah, blog posts about

00:56:29.680 | stuff you're working on, releasing open source, anything that goes out into the world and people

00:56:33.920 | find out about what you're doing, not at the very surface level of here's what the product is and,

00:56:39.440 | I don't know, we have a couple of job descriptions on the site, but a layer deeper of like here's

00:56:43.520 | the kind, here's what it actually looks like to work on the sort of things we're working on.

00:56:48.960 | So, I think that's one piece of it and then the other piece of it, as you said, is the outbound.

00:56:53.440 | I think it's not enough to, especially when you're small, I think it changes a lot when you're a

00:56:58.400 | bigger company with a strong brand or if the product you're working on is more in a technical

00:57:03.360 | space and so, therefore, maybe there's actually among your customers, there's the sorts of people

00:57:08.240 | that you might like to work for you. I don't know, if you're GitHub, then probably all of your users

00:57:12.960 | and the people you want to hire are among your user base, which is a nice combination, but for

00:57:17.680 | most products, that's not going to be the case. So then, now the outbound is a big piece of it and

00:57:21.840 | part of that is, as you said, getting out into the world, whether it's going to meetups, whether it's

00:57:25.680 | going to conferences, whether it's being on Twitter and just genuinely being out there and part of the

00:57:31.120 | field and having conversations with people and seeing people who are doing interesting things

00:57:34.640 | and making connections with them, hopefully not in a transactional way or you're always just,

00:57:40.000 | you know, sniffing around for who's available to hire, but you just generally, if you like this

00:57:44.160 | work and you want to be part of the field and you want to follow along with people who are doing

00:57:48.400 | interesting things and then, by the way, you will discover when they post, "Oh, I'm wrapping up my

00:57:53.200 | job here and thinking about the next thing," and that's a good time to ping them and be like, "Oh,

00:57:58.640 | cool. Actually, we have maybe some things that you might be interested in here on the team,"

00:58:03.840 | and that kind of outbound. But I think it also pairs well. It's not just that you need both,

00:58:09.680 | it's that they reinforce each other. So, if someone has seen, for example, the open source

00:58:14.560 | project you've released and they're like, "Oh, that's cool," and they briefly look at your

00:58:18.000 | company and then you follow each other on Twitter or whatever and then they post, "Hey, I'm thinking

00:58:22.320 | about my next thing," and you write them and they already have some context of like, "Oh, I like

00:58:26.640 | that project you did and I liked, you know, I kind of have some ambient awareness of what you're

00:58:31.200 | doing. Yeah, let's have a conversation. This isn't totally cold." So, I think those two together are

00:58:36.640 | important. The other footnote I would put, again, on the specifics, that's, I think, general sourcing

00:58:41.360 | for any kind of role, but for AI engineering specifically, you're not looking for professional

00:58:45.680 | experience. At this stage, you're not always looking for professional experience with language

00:58:49.280 | models. It's just too early. So, it's totally fine that someone has the professional experience

00:58:53.600 | with the conventional engineering skills, but, yeah, the interest, the curiosity, that sort of

00:59:01.120 | thing expressed through side projects, hackathons, blog posts, whatever it is. Yeah, absolutely. I

00:59:07.600 | often tell people, a lot of people are asking me for San Francisco AI engineers because they want,

00:59:11.840 | there's this sort of wave or reaction against the remote mindset, which I know that you guys

00:59:17.520 | probably differ in opinion on, but a lot of people are trying to, you know, go back to office.

00:59:21.280 | And so, my only option for people is just find them at the hackathons. Like, you know, the most

00:59:27.040 | self-driven, motivated people who can work on things quickly and ship fast are already in

00:59:31.680 | hackathons, and just go through the list of winners. And then, self-interestedly, you know,

00:59:37.120 | if, for example, someone's hosting an AI conference from June 25th to June 27th in San Francisco,

00:59:43.120 | you might want to show up there and see who might be available. So, like, and that is true. Like,

00:59:50.960 | you know, it's not something I want to advertise to the employers, the people who come, but a lot

00:59:55.360 | of people change jobs at conferences. This is a known thing, so. Yeah, of course. But I think it's

01:00:01.040 | the same as engaging on Twitter, engaging in open source, attending conferences. 100%, this is a

01:00:05.840 | great way both to find new opportunities if you're a job seeker, find people for your team, if you're

01:00:11.280 | a hiring manager, but if you come at it too network-y and transactional, that's just gross

01:00:16.560 | for everyone. Hopefully, we're all people that got into this work largely because we love it,

01:00:21.920 | and it's nice to connect with other people that have the same, you know, skills and struggle with

01:00:26.480 | the same problems in their work, and you make genuine connections, and you learn from each other,

01:00:30.880 | and by the way, from that can come as a, well, not quite a side effect, but an effect on the list

01:00:37.760 | is pairing together people who are looking for opportunities with people who have interesting

01:00:42.000 | problems to work on. Yeah, totally. Yeah, most important part of employer branding, you know,

01:00:46.880 | have a great mission, have great teammates, you know, if you can show that off in whatever way

01:00:52.400 | you can, you'll be starting off on the right foot. On that note, we have been really successful with

01:00:58.480 | hiring a number of people from targeted job boards, maybe is the right way of saying it,

01:01:06.400 | so not some kind of generic indeed.com or something, not to trash them, but something

01:01:11.680 | that's a bit more tied to your mission, tied to what you're doing, something which is really

01:01:15.680 | relevant, something which is going to cut down the search space for what you're looking at,

01:01:19.120 | what the candidate's looking at, so we're definitely affiliated with the safety,

01:01:25.600 | effective altruist kind of movement. We've gone to a few EA globals and have hired people

01:01:33.040 | effectively through the 80,000 hours list as well, so you know, that's not the only reason why people

01:01:38.320 | would want to join illicit, but as an example of if you're interested in AI safety or, you know,

01:01:43.760 | whatever your take is on this stuff, then there's probably something, there's a substack, there's a

01:01:47.600 | podcast, there's a mailing list, there's a job board, there's something which lets you zoom

01:01:52.320 | in on the kind of particular take that you agree with. You brought this up, so I have to ask,

01:01:59.680 | what is the state of EA post-SBF? I don't know if I'm the person to, I don't know if I'm the

01:02:04.000 | spokesman for that. Yeah, I mean, look, it's still going on, there's definitely a period of reflection

01:02:13.120 | and licking of wounds and thinking how did this happen. There's been a few conversations with

01:02:18.080 | people really senior in EA talking about how it was a super difficult time from a personal

01:02:24.880 | perspective and what is this even all about, and I don't know if this is a good thing that I've done

01:02:29.360 | and, you know, quite a sobering moment for everyone, I think. But yeah, you know, it's

01:02:34.960 | definitely still going, EA forum is active, we have people from illicit going to EA global.

01:02:39.920 | Yeah, if anything, from a personal perspective, I hope that it helps us spot blowhards and

01:02:45.680 | charlatans more easily and avoid whatever the kind of massive circumstances were that got us into the

01:02:53.520 | situation with SBF and the kind of unfortunate fallout from that. If it makes us a bit more

01:02:59.920 | able to spot that happening, then all for the better.

01:03:05.120 | Excellent. Cool, I will leave it there. Any last comments about just hiring in general?

01:03:11.280 | Advice to other technology leaders in AI? You know, one thing I'm trying to do for

01:03:17.200 | my conference as well is to create a forum for technology leaders to share thoughts,

01:03:22.480 | right? Like what's an interesting trend? What's an interesting open problem?

01:03:25.440 | What should people contact you on if they're working on something interesting?

01:03:30.080 | Yeah, a couple of thoughts here. So firstly, when I think back to how I was when I was in my

01:03:38.320 | early 20s, when I was at when I was at college, or university, the purity and capabilities and

01:03:45.360 | just kind of general put togetherness of people at that age now is strikingly different to where

01:03:51.200 | I was then. And I think this is not because I was especially sadistical or something when I was when

01:03:58.400 | I was young. I think I hear the same thing echoed in other people about my about my age. So the

01:04:04.640 | takeaway from that is finding a way of presenting yourself to and identifying and bringing in really

01:04:11.760 | high capability young people into your organization. I mean, it's always been true, but I

01:04:16.320 | think it's even more true now that they're not. They're not. They're kind of more professional,

01:04:24.960 | more capable, more committed, more driven, have more of a sense of what they're all about than

01:04:30.160 | certainly I did 20 years ago. So that's, that's the first thing. I think the second thing is in

01:04:35.360 | terms of the interview process. This is somewhat a general take, but it definitely applies to AI

01:04:40.080 | engineer roles, and I think more so to AI engineer roles. I really have a strong dislike and distaste

01:04:46.960 | for interview questions, which are arbitrary and kind of strip away all the context from what it

01:04:53.360 | really is to do the work. We try to make the interview process that's illicit a simulation

01:04:58.640 | of working together. The only people that we go into an interview process with are pretty obviously

01:05:05.040 | extraordinary, really, really capable. They must have done something for them to have moved into

01:05:10.880 | the proper interview process. It is a check on technical capability and in the ways that we've

01:05:16.560 | described, but it's at least as much them sizing us up. Is this something which is worth my time?

01:05:21.840 | Is it something that I'm going to really be able to dedicate myself to? So be able to show them

01:05:26.080 | this is really what it's like working at illicit. This is the people you're going to work with.

01:05:29.680 | These are the kinds of tasks that you're going to be doing. This is the sort of environment that we

01:05:33.200 | work in. These are the tools we use. All that kind of stuff is really, really important from

01:05:36.880 | a candidate experience, but it also gives us a ton more signal as well about, you know,

01:05:42.720 | what is it actually like to work with this person? Not just can they do really well on some kind of

01:05:46.480 | LeetCode style problem. I think the reason that it bears particularly on the AI engineer role

01:05:51.920 | is because it is something of an emerging category, if you will. So there isn't a very kind

01:05:59.280 | of well-established, do these, nobody's written the book yet. Maybe this is the beginning of us

01:06:05.040 | writing the book on how to get hired as an AI engineer, but that book doesn't exist at the

01:06:09.280 | moment. Yeah, you know, it's an empirical job as much as any other kind of software engineering.

01:06:17.120 | It's less about having kind of book learning and more about being able to apply that in a

01:06:20.880 | real world situation. So let's make the interview as close to a real world situation as possible.

01:06:24.720 | Adam, any last thoughts? I think you're muted.

01:06:27.680 | I think it'd be hard to follow that to add on to what James said.

01:06:30.320 | I do co-sign a lot of that. Yeah, I think this is a really great overview of just the sort of

01:06:38.240 | state of hiring AI engineers. And honestly, that's just what AI engineering even is, which

01:06:42.960 | it really is. When I was thinking about this as an industrial movement, it was very much around

01:06:49.680 | the labor market, actually, and these economic forces that give rise to a role like this,

01:06:56.560 | both on the incentives of the model labs, as well as the demand and supply of engineers and the

01:07:01.760 | interest level of companies and the engineers working on these problems. So I definitely see

01:07:08.640 | you guys as pioneers. Thank you so much for putting together this piece, which is something I've been

01:07:13.680 | seeking for a long time. You even shared your job description, your reading list and your interview

01:07:19.920 | loop. So if anyone's looking to hire AI engineers, I expect this to be the definitive piece and

01:07:25.680 | definitive podcast covering it. So thank you so much for taking the time to cover it with me.

01:07:30.640 | It was fun. Thanks. Yeah, thanks a lot. Really enjoyed the conversation. And I appreciate you

01:07:34.480 | naming something which we all had in our heads, but couldn't put a label on.

01:07:38.000 | It was going to be named anyway. So actually, I never actually personally say that I coined the

01:07:44.000 | term because I'm sure someone else used the term before me. All I did was write a popular piece on

01:07:49.200 | it. All right. So I'm happy to help because I know that it contributed to job creation at a bunch of

01:07:56.720 | companies I respect and how people find each other, which is my whole goal here.

01:08:02.000 | So, yeah, thanks for helping me do this.

How To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)

Chapters