back to indexHow To Hire AI Engineers (ft. James Brady and Adam Wiggins of Elicit)
Chapters
0:0 Intros
5:25 Defining the Hiring Process
8:42 Defensive AI Engineering as a chaotic medium
10:26 Tech Choices for Defensive AI Engineering
14:4 How do you Interview for Defensive AI Engineering
19:25 Does Model Shadowing Work?
22:29 Is it too early to standardize Tech stacks?
32:2 Capabilities: Offensive AI Engineering
37:24 AI Engineering Required Knowledge
40:13 ML First Mindset
45:13 AI Engineers and Creativity
47:51 Inside of Me There Are Two Wolves
49:58 Sourcing AI Engineers
58:45 Parting Thoughts
00:00:00.000 |
Okay, so welcome to the Latent Space Podcast. This is another remote episode that we're recording. 00:00:05.600 |
Actually, this is the first one that we're doing around a guest post. And I'm very honored to have 00:00:12.400 |
two of the authors of the post with me, James and Adam from Elicit. Welcome, James. Welcome, Adam. 00:00:19.680 |
Hey. Okay, so I think I will do this kind of in order. I think James, you're sort of the primary 00:00:27.600 |
author. So James, you are head of engineering at Elicit. You also were VP Eng at Teespring and 00:00:34.480 |
Spring as well. And you also, you know, you have a long history in sort of engineering. How did you, 00:00:40.480 |
you know, find your way into something like Elicit where, you know, you are basically a 00:00:46.960 |
traditional sort of VP Eng, VP technology type person moving into more of an AI role? 00:00:53.200 |
Yeah, that's right. It definitely was something of a sideways move, if not a left turn. So the 00:01:00.000 |
story there was I'd been doing, as you said, VP technology, CTO type stuff for around about 15 00:01:06.160 |
years or so. And noticed that there was this crazy explosion of capability and interesting 00:01:11.280 |
stuff happening within AI and ML and language models, that kind of thing. I guess this was 00:01:17.760 |
in 2019 or so, and decided that I needed to get involved. You know, this is a kind of generational 00:01:23.120 |
shift. Spent maybe a year or so trying to get up to speed on the state of the art, reading papers, 00:01:27.360 |
reading books, practicing things, that kind of stuff. Was going to found a startup actually 00:01:31.760 |
in the space of interpretability and transparency. And through that met Andreas, who has obviously 00:01:38.320 |
been on the podcast before, asked him to be an advisor for my startup. And he countered with, 00:01:44.240 |
"Maybe you'd like to come and run the engineering team at Elicit," which it turns out was a much 00:01:48.240 |
better idea. And yeah, I kind of quickly changed in that direction. So I think some of the stuff 00:01:52.800 |
that we're going to be talking about today is how actually a lot of the work when you're building 00:01:58.480 |
applications with AI and ML looks and smells and feels much more like conventional software 00:02:04.400 |
engineering with a few key differences, rather than really deep ML stuff. And I think that's 00:02:08.400 |
one of the reasons why I was able to transfer the skills over from one place to the other. 00:02:12.480 |
Yeah, I definitely agree with that. I do often say that I think AI engineering is about 90% 00:02:19.680 |
software engineering with the 10% of really strong, really differentiated AI engineering. 00:02:25.600 |
And obviously that number might change over time. I want to also welcome Adam onto my podcast, 00:02:32.880 |
because you welcomed me onto your podcast two years ago. And I'm really, really glad for that. 00:02:38.400 |
That was a fun episode. You famously founded Heroku. You just wrapped up a few years working 00:02:44.960 |
on Muse. And now you describe yourself as a journalist, internal journalist working on 00:02:49.920 |
Elicit. Yeah, well, I'm kind of a little bit in a wandering phase here and trying to 00:02:55.200 |
take this time in between ventures to see what's out there in the world. And 00:03:02.000 |
some of my wandering took me to the Elicit team and found that they were some of the folks who 00:03:07.600 |
were doing the most interesting, really deep work in terms of taking the capabilities of language 00:03:13.920 |
models and applying them to what I feel like are really important problems. So in this case, 00:03:18.080 |
science and literature search and that sort of thing. It fits into my general interest in tools 00:03:23.920 |
and productivity software. I think of it as a tool for thought in many ways, but a tool for science, 00:03:28.400 |
obviously, if we can accelerate that discovery of new medicines and things like that, that's just 00:03:32.400 |
so powerful. But to me, it's kind of also an opportunity to learn at the feet of some real 00:03:37.840 |
masters in this space, people who have been working on it since before it was cool, if you 00:03:41.920 |
want to put it that way. So for me, the last couple of months have been this crash course. 00:03:45.680 |
And why I sometimes describe myself as an internal journalist is I'm helping to write some posts, 00:03:51.440 |
including supporting James in this article here we're doing for Latent Space, where I'm just 00:03:56.800 |
bringing my writing skill and that sort of thing to bear on their very deep domain expertise around 00:04:03.440 |
language models and applying them to the real world and kind of surface that in a way that's 00:04:08.480 |
accessible, legible, that sort of thing. And so the great benefit to me is I get to learn this stuff 00:04:15.920 |
in a way that I don't think I would or I haven't, just kind of tinkering with my own side projects. 00:04:22.480 |
Yeah, totally. I forgot to mention that you also run Ink and Switch, which is 00:04:26.640 |
one of the leading research labs, in my mind, of the tools for thought productivity space, 00:04:33.120 |
whatever people mentioned there, or maybe future programming even, a little bit of that as well. 00:04:38.800 |
I think you guys definitely started the local first wave. I think they were just the first 00:04:43.040 |
conference that you guys held. I don't know if you were personally involved. 00:04:45.840 |
Yeah, I was one of the co-organizers, along with a few other folks, 00:04:50.160 |
called Local First Conf here in Berlin. Huge success from my point of view. Local First, 00:04:54.240 |
obviously, a whole other topic we can talk about on another day. I think there actually is a lot 00:04:58.880 |
more, what would you call it, handshake emoji between language models and the local first 00:05:06.240 |
data model. And that was part of the topic of the conference here. But yeah, topic for another day. 00:05:12.560 |
Not necessarily. I mean, if I can grab your thoughts at the end on local first and AI, 00:05:19.200 |
we can talk about that. I featured, I selected as one of my keynotes, Justine Tunney, 00:05:24.560 |
from Llamafile, working at Llamafile in Mozilla, because I think there's a lot of people interested 00:05:30.080 |
in that stuff. But we can focus on the headline topic, just to not bury the lead, which is we're 00:05:37.120 |
talking about how to hire AI engineers. This is something that I've been looking for a credible 00:05:42.320 |
source on for months. People keep asking me for my opinions. I don't feel qualified to give an 00:05:47.920 |
opinion, given that I only have so much engineering experience. And it's not like I've defined a 00:05:55.440 |
hiring process that I'm super happy with, even though I've worked with a number of AI engineers. 00:05:58.960 |
I'll just leave it open to you, James. How was your process of defining your hiring roles? 00:06:05.440 |
Yeah. So I think the first thing to say is that we've effectively been hiring for this kind of a 00:06:11.280 |
role since before you coined the term and tried to kind of build this understanding of what it was, 00:06:18.160 |
which is not a bad thing. It was a concept that was coming to the fore and effectively needed a 00:06:24.240 |
name, which is what you did. So the reason I mentioned that is I think it was something that we 00:06:30.480 |
kind of backed into, if you will. We didn't sit down and come up with a brand new role from 00:06:36.240 |
scratch. This is a completely novel set of responsibilities and skills that this person 00:06:41.120 |
would need. However, it is a kind of particular blend of different skills and attitudes and 00:06:49.600 |
curiosities, interests, which I think makes sense to kind of bundle together. So in the post, the 00:06:55.920 |
three things that we say are most important for a highly effective AI engineer are, first of all, 00:07:00.880 |
conventional software engineering skills, which is kind of a given, but definitely worth mentioning. 00:07:06.560 |
The second thing is a curiosity and enthusiasm for machine learning and maybe in particular 00:07:12.240 |
language models. That's certainly true in our case. And then the third thing is to do with 00:07:16.800 |
basically a fault-first mindset, being able to build systems that can handle things going wrong 00:07:24.160 |
in some sense. And yeah, I think the kind of middle point, the curiosity about ML and language 00:07:31.200 |
models is probably fairly self-evident. They're going to be working with and prompting and dealing 00:07:36.800 |
with the responses from these models. So that's clearly relevant. The last point though, maybe 00:07:41.200 |
takes the most explaining to do with this fault-first mindset and the ability to build resilient 00:07:47.120 |
systems. The reason that is so important is because compared to normal APIs where normal, think of 00:07:54.560 |
something like a Stripe API or a search API or something like this, conventional search API, 00:08:01.520 |
the latency when you're working with language models is wild. Like you can get 10X variation. 00:08:08.320 |
I mean, I was looking at the stats before actually, before the podcast, we do often normally, in fact, 00:08:13.680 |
see a 10X variation in the P90 latency over the course of half an hour, an hour, when we're 00:08:20.400 |
prompting these models, which is way higher than if you're working with a more kind of conventional, 00:08:24.800 |
conventionally backed API. And the responses that you get, the actual content of the responses 00:08:30.240 |
are naturally unpredictable as well. They come back with different formats. Maybe you're expecting 00:08:36.000 |
JSON. It's not quite JSON. You have to handle this stuff. And also the semantics of the messages are 00:08:42.160 |
unpredictable too, which is a good thing. Like this is one of the things that you're looking for 00:08:46.000 |
from these language models, but it all adds up to needing to build a resilient, reliable, 00:08:52.080 |
solid feeling system on top of this fundamentally, well, certainly currently fundamentally shaky 00:08:59.600 |
foundation. The models do not behave in the way that you would like them to and the ability to 00:09:04.960 |
structure the code around them such that it does give the user this warm, assuring, nappy, 00:09:10.640 |
solid feeling is really what we're driving for there. 00:09:14.800 |
Yeah, I think, sorry, go ahead. Go ahead. You can try, man. 00:09:19.120 |
Yeah, that really struck me as we were starting to dig on what this article would contain, 00:09:25.360 |
kind of language models as this chaotic medium. Sorry. Let me start again. 00:09:32.720 |
What really struck me as we dug in on the content for this article was that third point there. The 00:09:38.320 |
language models as this kind of chaotic medium, this dragon, this wild horse you're riding and 00:09:45.040 |
trying to guide in the direction that is going to be useful and reliable to users. Because I think 00:09:49.200 |
so much of software engineering is about making things not only high performance and snappy, 00:09:54.560 |
but really just making it stable, reliable, predictable, which is literally the opposite 00:09:58.880 |
of what you get from the language models. And yet, yeah, the output is so useful. And indeed, 00:10:04.720 |
some of their creativity, if you want to call it that, which is precisely their value. And so you 00:10:11.760 |
need to work with this medium. And I guess the nuanced or the thing that came out of Elisa's 00:10:17.440 |
experience that I thought was so interesting is quite a lot of working with that is things that 00:10:21.520 |
come from distributed systems engineering. But you have really the AI engineers kind of as 00:10:27.920 |
sort of as we're defining them or labeling them on the Elisa team is people who are really 00:10:32.240 |
application developers. You're building things for end users. You're thinking about, okay, 00:10:35.520 |
I need to populate this interface with some response to user input that's useful to the 00:10:40.320 |
tasks they're trying to do. But you have this thing, this medium that you're working with, 00:10:45.040 |
that in some ways you need to apply some of this chaos engineering, distributed systems 00:10:50.320 |
engineering, which typically those people with those engineering skills are not kind of the 00:10:54.960 |
application level developers with the product mindset or whatever. They're more deep in the guts 00:10:58.640 |
of a system. And so those skills and knowledge do exist throughout the engineering discipline, 00:11:06.240 |
but sort of putting them together into one person, that feels like sort of a unique thing. And 00:11:11.760 |
working with the folks on the Elisa team who have that skills, I'm quite struck by that unique blend. 00:11:17.280 |
I haven't really seen that before in my 30-year career in technology. 00:11:22.640 |
Yeah, that's fascinating. I like the reference to chaos engineering. I have some appreciation. I 00:11:29.680 |
think when you had me on your podcast, I was still working at Temporal, and that was like a nice 00:11:34.720 |
framework. If you live within Temporal's boundaries, you can pretend that all those 00:11:39.760 |
faults don't exist, and you can code in a sort of very fault-tolerant way. 00:11:44.960 |
What is you guys' solutions around this, actually? I think you're emphasizing having the mindset, 00:11:52.480 |
but maybe naming some technologies would help. Not saying that you have to adopt these technologies, 00:11:59.760 |
but they're just quick vectors into what you're talking about when you're talking about distributed 00:12:04.880 |
systems. That's such a big, chunky word. Are we talking Kubernetes? I suspect we're not. 00:12:11.200 |
We're talking something else now. Yeah, that's right. It's more the 00:12:15.680 |
application level rather than at the infrastructure level, at least the way that it works for us. 00:12:21.280 |
So there's nothing kind of radically novel here. It is more a careful application of existing 00:12:28.800 |
concepts. So the kinds of tools that we reach for to handle these kind of slightly chaotic 00:12:33.600 |
objects that Adam was just talking about are retries, and fallbacks, and timeouts, and careful 00:12:39.440 |
error handling. Yeah, the standard stuff, really. There's also a great degree of dependence. We rely 00:12:46.560 |
heavily on parallelization because these language models are not innately very snappy, and 00:12:52.640 |
there's just a lot of I/O going back and forth. All these things I'm talking about, when I was 00:12:58.720 |
in my earlier stages of a career, these are the things that are the difficult parts that 00:13:04.560 |
more senior software engineers will be better at. It is careful error handling, and concurrency, 00:13:09.600 |
and fallbacks, and distributed systems, and eventual consistency, and all this kind of stuff. 00:13:15.840 |
And as Adam was saying, the kind of person that is deep in the guts of some kind of distributed 00:13:20.880 |
systems, a really high-scale back-end kind of a problem, would probably naturally have these kinds 00:13:25.360 |
of skills. But you'll find them on day one if you're building an ML-powered app, even if it's 00:13:31.280 |
not got massive scale. I think one thing that I would mention that we do do-- yeah, maybe two 00:13:39.200 |
related things, actually. The first is we're big fans of strong typing. We share the types all the 00:13:45.280 |
way from the back-end Python code all the way to the front-end in TypeScript, and find that is-- 00:13:51.280 |
I mean, we're probably doing this anyway, but it really helps one reason around the shapes of the 00:13:55.360 |
data, which are going to be going back and forth, and that's really important when you can't rely 00:13:59.360 |
upon-- you're going to have to coerce the data that you get back from the ML if you want for it 00:14:05.040 |
to be structured, basically speaking. And the second thing which is related is we use checked 00:14:10.800 |
exceptions inside our Python code base, which means that we can use the type system to make sure 00:14:15.760 |
we are handling, properly handling, all of the various things that could be going wrong, all the 00:14:20.720 |
different exceptions that could be getting raised. Checked exceptions are not really particularly 00:14:25.200 |
popular, actually. There's not many people that are big fans of them. For our particular use case, 00:14:30.320 |
to really make sure that we've not just forgotten to handle this particular type of error, we have 00:14:36.000 |
found them useful to force us to think about all the different edge cases that could come up. 00:14:40.800 |
That's fascinating. Just a quick note of technology. How do you share types from 00:14:46.720 |
Python to TypeScript? Do you use GraphQL? Do you use something else? 00:14:50.800 |
We don't use GraphQL. So we've got the types defined in Python, that's the source of truth, 00:14:56.560 |
and we go from the open API spec, and there's a tool that we can use to generate types dynamically, 00:15:04.160 |
like TypeScript types from those open API definitions. 00:15:07.280 |
Okay, cool. Sorry for diving into that rabbit hole a little bit. I always like to spell out 00:15:12.560 |
technologies for people to dig their teeth into. One thing I'll mention quickly is that a lot of 00:15:18.720 |
the stuff that you mentioned is typically not part of the normal interview loop. It's actually 00:15:22.240 |
really hard to interview for, because this is the stuff that you polish out as you go into production. 00:15:30.160 |
Coding interviews are typically about the happy path. How do we do that? How do we look for a 00:15:37.520 |
defensive, fault-first mindset? Because you can defensive code it all day long, 00:15:40.960 |
and not add functionality to your application. Yeah, it's a great question, and I think that's 00:15:47.120 |
exactly true. Normally, the interview is about the happy path, and then there's maybe a box 00:15:51.760 |
checking exercise at the end of the candidate says, "Of course, in reality, I would handle 00:15:55.440 |
the edge cases," or something like this. That, unfortunately, isn't quite good enough when the 00:16:01.680 |
happy path is very, very narrow, and there's lots of weirdness on either side. Basically speaking, 00:16:09.440 |
it's just a case of foregrounding those kind of concerns through the interview process. 00:16:14.240 |
There's no magic to it. We talk about this in the post that we're going to be putting up on 00:16:20.320 |
LatentSpace, but there's two main technical exercises that we do through our interview 00:16:26.480 |
process for this role. The first is more coding-focused, and the second is more system 00:16:30.240 |
design-y, whiteboarding a potential solution. Without giving too much away, in the coding 00:16:35.360 |
exercise, you do need to think about edge cases. You do need to think about errors. 00:16:43.440 |
How best to put this? Yeah, the exercise consists of adding features and fixing bugs inside the 00:16:51.280 |
code base. In both of those two cases, it does demand, because of the way that we set the 00:16:56.560 |
application up and the interview up, it does demand that you think about something other than 00:17:00.800 |
the happy path. But your thinking is the right prompt of how do we get the candidate thinking 00:17:05.920 |
outside of the normal sweet spot, smoothly paved path. In terms of the system design interview, 00:17:14.160 |
that's a little easier to prompt this fault-first mindset, because it's very easy in that situation 00:17:20.640 |
just to say, let's imagine that this node dies. How does the app still work? Let's imagine that 00:17:26.400 |
this network is going super slow. Let's imagine that, I don't know, you run out of capacity in 00:17:32.880 |
this database that you've sketched out here. How do you handle that sort of stuff? So, in both cases, 00:17:39.200 |
they're not firmly anchored to and built specifically around language models and 00:17:44.640 |
ways language models can go wrong, but we do exercise the same muscles of thinking defensively 00:17:50.560 |
and foregrounding the edge cases, basically. Yeah, any comment there? 00:17:54.960 |
Yeah, I guess I wanted to mention too, James, earlier there, you mentioned retries, 00:18:01.600 |
and this is something that I think I've seen some interesting debates internally about things 00:18:06.160 |
regarding, first of all, retries can be costly, right? In general, this medium, in addition to 00:18:11.840 |
having this incredibly high variance and response rate and being non-deterministic, 00:18:17.040 |
is actually quite expensive. And so, in many cases, doing a retry when you get a fail 00:18:21.280 |
does make sense, but actually that has an impact on cost. And so, there is some sense to which, 00:18:26.720 |
at least I've seen the AI engineers on our team worry about that. They worry about, okay, how do 00:18:32.160 |
we give the best user experience, but balance that against what the infrastructure is going to cost 00:18:37.920 |
our company, which I think is, again, an interesting mix of, yeah, again, it's a little bit the 00:18:43.360 |
distributed system mindset, but it's also a product perspective and you're thinking about 00:18:47.760 |
the end user experience, but also the bottom line for the business. You're bringing together a lot 00:18:52.400 |
of qualities there. And there's also the fallback case, which is kind of a related or adjacent one. 00:18:58.000 |
I think there was also a discussion on that internally where, I think it maybe was search, 00:19:01.840 |
there was something recently where there was one of the frontline search providers was having some, 00:19:06.720 |
yeah, slowness and outages, and essentially then we had a fallback, but essentially that gave people 00:19:11.920 |
for a while, especially new users that come in that don't know the difference, they're getting 00:19:16.320 |
worse results for their search. And so, then you have this debate about, okay, there's sort of what 00:19:21.920 |
is correct to do from an engineering perspective, but then there's also what actually is the best 00:19:28.320 |
result for the user. Is giving them a kind of a worse answer to their search result 00:19:32.160 |
better, or is it better to kind of give them an error and be like, yeah, sorry, it's not working 00:19:36.480 |
right at the moment, try later. Both are obviously non-optimal, but this is the kind of thing I think 00:19:42.640 |
that you run into or the kind of thing we need to grapple with a lot more than you would other kinds 00:19:48.400 |
of medians. Yeah, that's a really good example. I think it brings to the fore the two different 00:19:55.920 |
things that you could be optimizing for of uptime and response at all costs on one end of the 00:20:01.520 |
spectrum, and then effectively fragility, but kind of, if you get a response, it's the best response 00:20:06.880 |
we can come up with at the other end of the spectrum. And where you want to land there kind 00:20:10.160 |
of depends on, well, it certainly depends on the app, obviously depends on the user. I think it 00:20:13.760 |
depends on the feature within the app as well. So in the search case that you mentioned there, 00:20:20.320 |
in retrospect, we probably didn't want to have the fallback. And we've actually just recently 00:20:24.480 |
on Monday changed that to show an error message rather than giving people a kind of degraded 00:20:30.400 |
experience. In other situations, we could use, for example, a large language model from provider B 00:20:38.880 |
rather than provider A, and get something which is within a few percentage points performance. 00:20:45.120 |
And that's just a really different situation. Yeah, like any interesting question, the answer is 00:20:50.000 |
it depends. I do hear a lot of people suggesting, let's call this model shadowing as a defensive 00:21:00.560 |
technique, which is if open AI happens to be down, which happens more often than people think, 00:21:06.400 |
then you fall back to entropic or something. How realistic is that? Don't you have to develop 00:21:11.840 |
completely different prompts for different models, and won't the performance of your 00:21:16.800 |
application suffer for whatever reason? It maybe calls differently, or it's not maintained in the 00:21:22.080 |
same way. I think that people raise this idea of fallbacks to models, but I don't see it practiced 00:21:32.880 |
very much. Yeah, it is. You definitely need to have a different prompt if you want to stay within 00:21:39.840 |
a few percentage points degradation, like I said before. And that certainly comes at a cost of 00:21:47.120 |
fallbacks and backups and things like this. It's really easy for them to go stale and kind of flake 00:21:53.280 |
out on you because they're off the beaten track. And in our particular case inside of Elicit, 00:22:02.080 |
we do have fallbacks for a number of crucial functions where it's going to be very obvious 00:22:08.880 |
if something has gone wrong, but we don't have fallbacks in all cases. It really depends on a 00:22:15.920 |
task-to-task basis throughout the app, so I can't give you a single simple rule of thumb for, 00:22:21.440 |
in this case, do this, and in the other, do that. But yeah, it's a little bit easier now that 00:22:28.480 |
the APIs between the Anthropic models and OpenAI are more similar than they used to be, 00:22:33.440 |
so we don't have two totally separate code paths with different protocols, like wire protocols, 00:22:38.240 |
to speak, which makes things easier. But you're right, you do need to have different prompts if 00:22:42.480 |
you want to have similar performance across the providers. I'll also note, just observing again 00:22:47.040 |
as a relative newcomer here, I was surprised, impressed, I'm not sure what the word is for it, 00:22:52.640 |
at the blend of different backends that the team is using, and so there's many, 00:22:57.840 |
the product presents as kind of one single interface, but there's actually several dozen 00:23:04.240 |
kind of main paths. There's like, for example, the search versus a data extraction of a certain type 00:23:09.520 |
versus chat with papers versus, and each one of these, you know, the team has worked very hard 00:23:14.640 |
to pick the right model for the job and craft the prompt there, but also is constantly testing new 00:23:21.120 |
ones. So a new one comes out from either from the big providers, or in some cases, our own models 00:23:28.160 |
that are, you know, running on essentially our own infrastructure, and sometimes that's more about 00:23:33.360 |
cost or performance, but the point is kind of switching very fluidly between them, and very 00:23:39.520 |
quickly, because this field is moving so fast, and there's new ones to choose from all the time, 00:23:43.920 |
is like part of the day-to-day, I would say, so it isn't more of a like, there's a main one, 00:23:48.720 |
it's been kind of the same for a year, there's a fallback, but it's got cobwebs on it, it's more 00:23:53.280 |
like which model and which prompt is changing weekly, and so I think it's quite reasonable to 00:24:00.320 |
have a fallback that you can expect might work. I'm curious, because you guys have had experience 00:24:07.840 |
working at both, you know, Elicit, which is a smaller operation, and larger companies, 00:24:12.240 |
a lot of companies are looking at this with a certain amount of trepidation as, you know, 00:24:17.600 |
it's very chaotic. When you have one engineering team that knows everyone else's names, and like, 00:24:25.040 |
you know, they meet constantly in Slack and know what's going on, it's easier to sync on 00:24:30.080 |
technology choices. When you have 100 teams, all shipping AI products, and all making their own 00:24:34.400 |
independent tech choices, it can be very hard to control. One solution I'm hearing from the 00:24:39.920 |
sales forces of the world, and Walmarts of the world, is that they are creating their own AI 00:24:44.160 |
gateway, right? Internal AI gateway. This is the one model hub that controls all the things, 00:24:48.640 |
and has all standards. Is that a feasible thing? Is that something that you would want? Is that 00:24:54.080 |
something you have and you're working towards? What are your thoughts on this stuff? Like, 00:24:58.160 |
centralization of control, or like an AI platform internally? 00:25:02.000 |
Yeah, I think certainly for larger organizations, and organizations that are doing things which 00:25:10.320 |
maybe are running into HIPAA compliance, or other legislative tools like that, 00:25:18.160 |
it could make a lot of sense. Yeah. I think for the TLDR for something like Elicit is, 00:25:24.480 |
we are small enough, as you indicated, and need to have full control over all the levers available, 00:25:32.400 |
and switch between different models, and different prompts, and whatnot. As Adam was just saying, 00:25:36.720 |
that kind of thing wouldn't work for us. But yeah, I've spoken with and 00:25:40.560 |
advised a couple of companies that are trying to sell into that kind of a space, or at a larger 00:25:48.000 |
stage, and it does seem to make a lot of sense for them. So, for example, if you're trying to sell 00:25:53.600 |
to a large enterprise, and they cannot have any data leaving the EU, then you need to be really 00:26:00.080 |
careful about someone just accidentally putting in the sort of US-East-1 GPT4 endpoints, or something 00:26:08.080 |
like this. If you're... Do you want to think of a more specific example there? Yeah. I think the... 00:26:15.920 |
I'd be interested in understanding better what the specific problem is that they're looking 00:26:22.960 |
to solve with that, whether it is to do with data security, or centralization of billing, 00:26:29.360 |
or if they have a kind of suite of prompts, or something like this, that people can choose from, 00:26:34.160 |
so they don't need to reinvent the wheel again and again. I wouldn't be able to say without 00:26:39.040 |
understanding the problems and their proposed solutions, you know, which kind of situations 00:26:42.560 |
that'd be better or worse fit for. But yeah, for Elicit, where really the secret sauce, 00:26:50.080 |
if there is a secret sauce, is which models we're using, how we're using them, how we're combining 00:26:54.000 |
them, how we're thinking about the user problem, how we're thinking about all these pieces coming 00:26:57.440 |
together. You really need to have all of the affordances available to you to be able to 00:27:03.200 |
experiment with things and iterate rapidly. And generally speaking, whenever you put these kind 00:27:08.800 |
of layers of abstraction, and control, and generalization in there, that gets in the way. 00:27:13.840 |
So for us, it would not work. Do you feel like there's always a tendency to want to reach for 00:27:19.520 |
standardization and abstractions pretty early in a new technology cycle? There's something 00:27:24.720 |
comforting there, or you feel like you can see them, or whatever. I feel like there's some of 00:27:28.000 |
that discussion around lang chain right now. But yeah, this is not only so early, but also moving 00:27:33.760 |
so fast. I think it's tough to ask for that. That's not the space we're in. But yeah, the 00:27:42.160 |
larger an organization, the more that's your default is to want to reach for that. It's a 00:27:47.360 |
sort of comfort. Yeah, that's interesting. I find it interesting that you would say that 00:27:53.200 |
being a founder of Heroku, where you were one of the first platforms as a service that 00:27:58.960 |
more or less standardized what that early development experience should have looked like. 00:28:03.840 |
And I think basically people are feeling the differences between calling various model lab 00:28:08.800 |
APIs and having an actual AI platform where all their development needs are thought of for them. 00:28:19.200 |
I define this in my AI engineer post as well. The model labs just see their job ending at 00:28:24.720 |
serving models, and that's about it. But actually, the responsibility of the AI engineer has to fill 00:28:29.440 |
in a lot of the gaps beyond that. Yeah, that's true. I think a huge part of the exercise with 00:28:36.960 |
Heroku, which was largely inspired by Rails, which itself was one of the first frameworks 00:28:42.240 |
to standardize the CRUD app with the SQL database, and people have been building apps like that for 00:28:47.600 |
many, many years. I had built many apps. I had made my own kind of templates based on that. I 00:28:51.600 |
think others had done it. And Rails came along at the right moment, where we had been doing it long 00:28:56.240 |
enough that you see the patterns, and then you can say, look, let's extract those into a framework 00:29:01.840 |
that's going to make it not only easier to build for the experts, but for people who are relatively 00:29:06.400 |
new, the best practices are encoded into that framework, in a model controller, to take one 00:29:13.120 |
example. But then, yeah, once you see that, and once you experience the power of a framework, 00:29:17.840 |
and again, it's so comforting, and you develop faster, and it's easier to onboard new people 00:29:23.840 |
to it because you have these standards and this consistency, then folks want that for something 00:29:30.560 |
new that's evolving. Now, here I'm thinking maybe if you fast forward a little to, for example, 00:29:34.080 |
when React came on the scene a decade ago or whatever, and then, okay, we need to do state 00:29:39.120 |
management, what's that? And then there's a new library every six months. Okay, this is the one, 00:29:43.680 |
this is the gold standard. And then six months later, that's deprecated. Because, of course, 00:29:48.560 |
it's evolving. You need to figure it out. The tacit knowledge and the experience of putting 00:29:53.040 |
it in practice and seeing what those real needs are, are critical. And so it is really about 00:29:59.360 |
finding the right time to say, yes, we can generalize, we can make standards and abstractions, 00:30:06.320 |
whether it's for a company, whether it's for an open source library, for a whole class of apps, 00:30:11.920 |
and it's very much a much more of a judgment call/just a sense of taste or experience to be 00:30:21.280 |
able to say, yeah, we're at the right point, we can standardize this. But it's at least my very, 00:30:27.680 |
again, and I'm so new to that, this world compared to you both, but my sense is, yeah, still the 00:30:33.360 |
Wild West, that's what makes it so exciting and feels kind of too early for too much in the way of 00:30:40.400 |
standardized abstractions. Not that it's not interesting to try, but you can't necessarily 00:30:45.440 |
get there in the same way Rails did until you've got that decade of experience of whatever building 00:30:50.000 |
different classes of apps in that, with that technology. Yeah, it's interesting to think 00:30:56.720 |
about what is going to stay more static and what is expected to change over the coming five years, 00:31:02.640 |
let's say, which seems like, when I think about it through an ML lens, is an incredibly long time. 00:31:07.360 |
And if you just said five years, it doesn't seem that long. I think that kind of talks to part of 00:31:11.520 |
the problem here is that things that are moving are moving incredibly quickly. I would expect, 00:31:16.480 |
this is my hot take rather than some kind of official carefully thought out position, but 00:31:20.320 |
my hot take would be something like, you'll be able to get to good quality apps without doing 00:31:29.360 |
really careful prompt engineering. I don't think that prompt engineering is going to be 00:31:33.200 |
a kind of durable differential skill that people will hold. I do think that the way that you set 00:31:41.120 |
up the ML problem to kind of ask the right questions, if you see what I mean, rather than 00:31:45.440 |
the specific phrasing of exactly how you're doing chain of thought or few shot or something in the 00:31:50.880 |
prompt, I think the way that you set it up is probably going to remain to be trickier for 00:31:57.280 |
longer. And I think some of the operational challenges that we've been talking about of 00:32:00.960 |
wild variations in latency and handling the... I mean, one way to think about these models is 00:32:09.120 |
the first lesson that you learn when you're an engineer, software engineer, is that you need to 00:32:13.040 |
sanitize user input, right? I think it was the top OWASP security threat for a while. You have to 00:32:18.800 |
sanitize and validate user input. And we got used to that. And it kind of feels like this is the 00:32:24.800 |
shell around the app and then everything else inside you're kind of in control of, and you can 00:32:29.200 |
grasp and you can debug, et cetera. And what we've effectively done is through some kind of weird 00:32:35.280 |
rear guard action, we now got these slightly chaotic things. I think of them more as complex 00:32:40.080 |
adaptive systems, which are related, but a bit different, definitely have some of the same 00:32:44.080 |
dynamics. We've injected these into the foundations of the app. And you kind of now need to think 00:32:51.120 |
with this defensive mindset downwards as well as upwards, if you see what I mean. 00:32:56.400 |
So I think it will take a while for us to truly wrap our heads around that. Also, these kinds of 00:33:04.160 |
problems, you have to handle things being unreliable and slow sometimes and whatever else, 00:33:09.920 |
even if it doesn't happen very often, there isn't some kind of industry-wide accepted way of 00:33:15.360 |
handling that at massive scale. There are definitely patterns and anti-patterns and 00:33:20.560 |
tools and whatnot, but it's not like this is a solved problem. So I would expect that 00:33:25.040 |
it's not going to go down easily as a solvable problem at the ML scale either. 00:33:30.240 |
Yeah, excellent. I would describe in the terminology of the stuff that I've written 00:33:37.120 |
in the past, I described this inversion of architecture as sort of LLM at the core versus 00:33:41.760 |
LLM or code at the core. We're very used to code at the core. Actually, we can scale that very well. 00:33:47.520 |
When we build LLM core apps, we have to realize that the central part of our app that's 00:33:52.320 |
orchestrating things is actually prone to prompt injections and non-determinism and all that good 00:33:59.280 |
stuff. I did want to move the conversation a little bit from the sort of defensive side of 00:34:04.240 |
things to the more offensive or the fun side of things, capabilities side of things, because that 00:34:10.000 |
is the other part of the job description that we kind of skimmed over. So I'll repeat what you said 00:34:15.680 |
earlier. You want people to have a genuine curiosity and enthusiasm for the capabilities 00:34:20.000 |
of language models. We're recording this the day after Anthropic just dropped Cloud 3.5. 00:34:26.720 |
I was wondering, maybe this is a good exercise, is how do people have curiosity and enthusiasm 00:34:33.440 |
for capabilities and language models when, for example, the research paper for Cloud 3.5 is four 00:34:38.320 |
pages? There's not much. Yeah. Well, maybe that's not a bad thing, actually, in this particular 00:34:48.960 |
case. So yeah, if you really want to know exactly how the sausage was made, that hasn't been possible 00:34:54.320 |
for a few years now, in fact, for these new models. But from our perspective, when we're 00:35:00.560 |
building Illicit, what we primarily care about is what can these models do? How do they perform 00:35:05.360 |
on the tasks that we already have set up and the evaluations we have in mind? And then on a 00:35:09.760 |
slightly more expansive note, what kinds of new capabilities do they seem to have? Can we illicit, 00:35:17.600 |
no pun intended, from the models? For example, well, there's very obvious ones like multimodality. 00:35:23.760 |
There wasn't that, and then there was that. Or it could be something a bit more subtle, 00:35:28.800 |
like it seems to be getting better at reasoning, or it seems to be getting better at metacognition, 00:35:34.560 |
or it seems to be getting better at marking its own work and giving calibrated confidence 00:35:39.920 |
estimates, things like this. Yeah, there's plenty to be excited about there. It's just that, 00:35:45.600 |
yeah, there's rightly or wrongly been this shift over the last few years to not give all the 00:35:52.720 |
details. No, but from application development perspective, every time there's a new model 00:35:57.360 |
released, there's a flow of activity in our Slack, and we try to figure out what it can do, 00:36:00.560 |
what it can't do, run our evaluation frameworks. And yeah, it's always an exciting, happy day. 00:36:05.680 |
Yeah, from my perspective, what I'm seeing from the folks on the team is, first of all, just 00:36:13.200 |
awareness of the new stuff that's coming out. So that's an enthusiasm for the space and following 00:36:20.080 |
along. And then being able to very quickly, partially that's having Slack to do this, 00:36:24.880 |
but be able to quickly map that to, okay, what does this do for our specific case? 00:36:30.880 |
And the simple version of that is let's run the evaluation framework, which Alyssa has quite a 00:36:38.160 |
comprehensive one. I'm actually working on an article on that right now, which I'm very excited 00:36:42.880 |
about, because it's a very interesting world of things. But basically you can just try the new 00:36:49.440 |
model in the evaluations framework, run it. It has a whole slew of benchmarks, which includes not just 00:36:55.440 |
accuracy and confidence, but also things like performance, cost and so on. And all of these 00:37:00.400 |
things may trade off against each other. Maybe it's actually, it's very slightly worse, but it's 00:37:05.840 |
way faster and way cheaper. So actually this might be a net win, for example, or it's way more 00:37:12.880 |
accurate, but that comes at it's slower and higher cost. And so now you need to think about those 00:37:18.560 |
trade-offs. And so to me, coming back to the qualities of an AI engineer, especially when 00:37:23.200 |
you're trying to hire for them, it is very much an application developer in the sense of a product 00:37:29.280 |
mindset of what are our users or our customers trying to do? What problem do they need solved? 00:37:35.360 |
Or what does our product solve for them? And how does the capabilities of a particular model 00:37:41.120 |
potentially solve that better for them than what exists today? And by the way, what exists today 00:37:46.880 |
is becoming an increasingly gigantic cornucopia of things, right? And so you say, okay, this new 00:37:52.800 |
model has these capabilities, therefore the simple version of that is plug it into our existing 00:37:57.600 |
evaluations and just look at that and see if it seems like it's better for a straight out swap out. 00:38:02.720 |
But when you talk about, for example, you have multimodal capability and then you say, okay, 00:38:07.120 |
wait a minute, actually maybe there's a new feature or a whole new way we could be using it, 00:38:11.760 |
not just a simple model swap out, but actually a different thing we could do that we couldn't do 00:38:16.080 |
before that would have been too slow or too inaccurate or something like that, that now 00:38:21.520 |
we do have the capability to do. So I think of that as being a kind of core skill. I don't even 00:38:27.040 |
know if I want to call it a skill. Maybe it's even like an attitude or a perspective, which is a 00:38:31.360 |
desire to both be excited about the new technology, the new models and things as they come along, 00:38:36.400 |
but also holding in the mind, what does our product do? Who is our user? And how can we 00:38:43.200 |
connect the capabilities of this technology to how we're helping people in whatever it is our 00:38:48.640 |
product does? Yeah. I'm just looking at one of our internal Slack channels where we talk about 00:38:54.240 |
things like new model releases and that kind of thing. And it is notable looking through these, 00:38:59.920 |
the kind of things that people are excited about and not, I don't know, the context, the context 00:39:04.880 |
window is much larger or it's look at how many parameters it has or something like this. It's 00:39:09.760 |
always framed in terms of maybe this could be applied to that kind of part of Elicit, 00:39:13.280 |
or maybe this would open up this new possibility for Elicit. And as Adam was saying, yeah, I don't 00:39:18.080 |
think it's really a novel or separate skill. It's the kind of attitude I would like to have all 00:39:24.240 |
engineers to have a company our stage actually, and maybe more generally even, which is not just 00:39:32.160 |
kind of getting nerd sniped by some kind of technology number, fancy metric or something, 00:39:38.000 |
but how is this actually going to be applicable to the thing which matters in the end? How is 00:39:42.720 |
this going to help users? How is this going to help move things forward strategically? That kind 00:39:46.000 |
of thing. Yeah, applying what you know, I think is the key here. Getting hands on as well. I would 00:39:53.120 |
recommend a few resources for people listening along. The first is Elicit's ML reading list, 00:39:58.800 |
which I found so delightful after talking with Andreas about it. It looks like that's part of 00:40:04.800 |
your onboarding. We've actually set up an asynchronous paper club instead of my discord 00:40:09.120 |
for people following on that reading list. I love that you separate things out into tier one and 00:40:12.880 |
two and three, and that gives people a factored cognition way of looking into the corpus, right? 00:40:20.320 |
Yes, the corpus of things to know is growing and the water is slowly rising as far as what a bar 00:40:26.320 |
for a competent AI engineer is, but I think having some structured thought as to what are the big 00:40:32.320 |
ones that everyone must know, I think is key. It's something I haven't really defined for people, 00:40:38.000 |
and I'm glad that Elicit actually has something out there that people can refer to. 00:40:41.520 |
I wouldn't necessarily make it required for the job interview maybe, but it'd be interesting to 00:40:49.760 |
see what would be a red flag if some AI engineer would not know. I don't know where we would stoop 00:40:57.840 |
to call something required knowledge, or you're not part of the cool kids club, but there increasingly 00:41:04.640 |
is something like that, right? Not knowing what context is is a black mark in my opinion, right? 00:41:08.960 |
Yeah, I think it does connect back to what we were saying before of this genuine curiosity 00:41:15.200 |
about ML. Well, maybe it's actually that combined with something else which is really important, 00:41:19.440 |
which is a self-starting bias towards action kind of a mindset, which again- 00:41:24.160 |
Exactly, yeah. Everyone needs that, so if you put those two together, or if I'm truly curious about 00:41:30.160 |
this and I'm going to figure out how to make things happen, then you end up with people reading 00:41:36.400 |
reading lists, reading papers, doing side projects, this kind of thing. So it isn't something that we 00:41:42.240 |
explicitly include. We don't have an ML-focused interview for the AI engineer role at all, 00:41:47.200 |
actually. It doesn't really seem helpful. The skills which we are checking for, as I mentioned 00:41:54.400 |
before, this fault-first mindset and conventional software engineering kind of thing, it's point one 00:42:02.160 |
and point three on the list that we talked about. In terms of checking for ML curiosity and how 00:42:08.400 |
familiar they are with these concepts, that's more through talking interviews and culture fit types of 00:42:14.080 |
things. We want for them to have a take on what Elisa is doing, certainly as they progress through 00:42:19.280 |
the interview process. They don't need to be completely up-to-date on everything we've ever 00:42:23.360 |
done on day zero, although that's always nice when it happens. But for them to really engage 00:42:28.880 |
with it, ask interesting questions, and be kind of brought into our view on how we want ML to 00:42:35.840 |
proceed, I think that is really important and that would reveal that they have this kind of interest, 00:42:41.440 |
this ML curiosity. There's a second aspect to that. I don't know if now's the right time to 00:42:46.160 |
talk about it, which is I do think that an ML-first approach to building software is something of a 00:42:52.960 |
different mindset. I could describe that a bit now if that seems good, but up to you. 00:42:58.560 |
So yeah, I think when I joined Elicit, this was the biggest adjustment that I had to make 00:43:03.680 |
personally. So as I said before, I'd been effectively building conventional software 00:43:07.760 |
stuff for 15 years or so, something like this, well for longer actually, but professionally for 00:43:11.840 |
like 15 years, and had a lot of pattern matching built into my brain and kind of muscle memory for 00:43:19.440 |
if you see this kind of a problem, then you do that kind of a thing. And I had to unlearn quite 00:43:23.600 |
a lot of that when joining Elicit because we truly are ML-first and try to use ML to the fullest. 00:43:30.400 |
And some of the things that that means is this relinquishing of control almost. At some point, 00:43:37.280 |
you are calling into this fairly opaque black box thing and hoping it does the right thing, 00:43:43.120 |
and dealing with the stuff that it sends back to you. And that's just very different if you're 00:43:46.960 |
interacting with, again, APIs and databases, that kind of a thing. You can't just keep on debugging. 00:43:52.720 |
At some point, you hit this obscure wall. And I think the second part to this is, 00:43:58.800 |
the pattern I was used to is that the external parts of the app are where most of the messiness 00:44:05.920 |
is, not necessarily in terms of code, but in terms of degrees of freedom almost. If the user 00:44:12.400 |
can and will do anything at any point, and they'll put all sorts of wonky stuff inside of text inputs, 00:44:17.920 |
and they'll click buttons you didn't expect them to click, and all this kind of thing. 00:44:21.040 |
But then by the time you're down into your SQL queries, for example, as long as you've done your 00:44:25.760 |
input validation, things are pretty well defined. And that, as we said before, is not really the 00:44:30.720 |
case. When you're working with language models, there is this kind of intrinsic uncertainty when 00:44:36.400 |
you get down to the kernel, down to the core. Even beyond that, all that stuff is somewhat 00:44:41.840 |
defensive, and these are things to be wary of to some degree. The flip side of that, the really 00:44:47.200 |
kind of positive part of taking an ML-first mindset when you're building applications, 00:44:51.520 |
is that once you get comfortable taking your hands off the wheel at a certain point, and 00:44:56.560 |
relinquishing control, letting go, really kind of unexpected, powerful things can happen if you 00:45:03.200 |
lean on the capabilities of the model without trying to overly constrain and slice and dice 00:45:09.280 |
problems to the point where you're not really wringing out the most capability from the model 00:45:14.240 |
that you might. So, I was trying to think of examples of this earlier, and one that came 00:45:20.640 |
to mind was we were working really early, just after I joined Elicit, we were working on something 00:45:27.360 |
where we wanted to generate text and include citations embedded within it. So, it'd have a 00:45:31.760 |
claim, and then, you know, square brackets, one, in superscript, something like this. 00:45:36.320 |
And every fiber in my being was screaming that we should have some way of kind of forcing this 00:45:42.640 |
to happen, or structured output, such that we could guarantee that this citation was always 00:45:47.520 |
going to be present later on, you know, that the kind of the indication of a footnote would actually 00:45:52.800 |
match up with the footnote itself, and kind of went into this symbolic, "I need full control" 00:45:59.440 |
kind of mindset. And it was notable that Andreas, who's our CEO, again, has been on the podcast, 00:46:06.240 |
was the opposite. He was just kind of, "Give it a couple of examples, and it'll probably be fine, 00:46:10.720 |
and then we can kind of figure out with a regular expression at the end." It really did not sit well 00:46:15.440 |
with me, to be honest. I was like, "But it could say anything. It could literally say anything." 00:46:19.680 |
And I don't know about just using a regex to sort of handle this. This is an important feature of 00:46:23.840 |
the app. But, you know, that's my first kind of starkest introduction to this ML-first mindset, 00:46:31.600 |
I suppose, which Andreas has been cultivating for much longer than me, much longer than most. 00:46:37.200 |
Yeah, there might be some surprises of stuff you get back from the model, but you can also... 00:46:43.360 |
it's about finding the sweet spot, I suppose, where you don't want to give a completely open-ended 00:46:50.400 |
prompt to the model and expect it to do exactly the right thing. You can ask it too much, 00:46:56.320 |
and it gets confused, and starts repeating itself, or goes around in loops, or just goes off in a 00:47:00.240 |
random direction, or something like this. But you can also over-constrain the model and not really 00:47:05.520 |
make the most of the capabilities. And I think that is a mindset adjustment that most people who 00:47:10.400 |
are coming into AI engineering afresh would need to make of giving up control and expecting that 00:47:18.240 |
there's going to be a little bit of extra pain and defensive stuff on the tail end. But the 00:47:23.280 |
benefits that you get as a result are really striking. That was a brilliant start. The ML-first 00:47:29.760 |
mindset, I think, is something that I struggle with as well, because the errors, when they do 00:47:33.120 |
happen, are bad. They will hallucinate, and your systems will not catch it sometimes if you don't 00:47:41.760 |
have a large enough sample set. I'll leave it open to you, Adam. What else do you think about 00:47:48.640 |
when you think about curiosity and exploring capabilities? Are there reliable ways to get 00:47:58.240 |
people to push themselves on capabilities? Because I think a lot of times we have this 00:48:02.720 |
implicit over-confidence, maybe, of we think we know what it is, what a thing is, when actually 00:48:07.280 |
we don't. And we need to keep a more open mind. And I think you do a particularly good job of 00:48:11.760 |
always having an open mind. And I want to get that out of more engineers that I talk to, 00:48:16.880 |
but I struggle sometimes. And I can scratch that question if nothing comes to mind. 00:48:21.840 |
Yeah. I suppose being an engineer is, at its heart, this sort of contradiction of, 00:48:28.640 |
on one hand, systematic, almost very literal, wanting to control exactly what James described, 00:48:37.040 |
understand everything, model it in your mind, precision, systematizing. But fundamentally, 00:48:47.200 |
it is a creative endeavor. At least I got into creating with computers because I saw them as a 00:48:52.160 |
canvas for creativity, for making great things, and for making a medium for making things that are 00:48:57.120 |
so multidimensional that it goes beyond any medium humanity's ever had for creating things. 00:49:05.200 |
So I think or hope that a lot of engineers are drawn to it partially because you need both of 00:49:11.760 |
those. You need that systematic, controlling side, and then the creative, open-ended, 00:49:17.840 |
almost like artistic side. And I think it is exactly the same here. In fact, if anything, 00:49:22.960 |
I feel like there's a theme running through everything James has said here, which is, 00:49:26.800 |
in many ways, what we're looking for in an AI engineer is not really all that fundamentally 00:49:31.840 |
different from other, call it conventional engineering or other types of engineering, 00:49:38.160 |
but working with this strange new medium that has these different qualities. But in the end, 00:49:42.560 |
a lot of the things are an amalgamation of past engineering skills. And I think that mix of 00:49:49.200 |
curiosity, artistic, open-ended, what can we do with this, with a desire to systematize, control, 00:49:56.080 |
make reliable, make repeatable, is the mix you need. And trying to find that balance, 00:50:02.720 |
I think, is probably where it's at. Fundamentally, I think people who are getting into this field, 00:50:08.000 |
to work on this, is because they're excited by the promise and the potential of the technology. 00:50:14.000 |
So to not have that kind of creative, open-ended, curiosity side would be surprising. Why do it 00:50:23.040 |
otherwise? So I think that blend is always what you're looking for broadly. But here, 00:50:30.320 |
now we're just scoping it to this new world of language models. 00:50:33.520 |
And I think the two technical aspects of the... Let me start that again. 00:50:40.160 |
I think the fault-first mindset and the ML curiosity attitude could be somewhat in tension, 00:50:50.240 |
right? Because, for example, the stereotypical version of someone that is great at building 00:50:56.240 |
fault-tolerant systems has probably been doing it for a decade or two. They've been principal 00:51:00.640 |
engineer at some massive scale technology company. And that kind of a person might be less 00:51:06.560 |
able to turn on a dime and relinquish control and be creative and take on this different mindset. 00:51:14.880 |
Whereas someone who's very early in their career is much more able to do that kind of 00:51:19.360 |
exploration and follow their curiosity kind of a thing. And they might be a little bit less 00:51:25.040 |
practiced in how to serve terabytes of traffic every day, obviously. 00:51:29.520 |
Yeah, the stereotype that comes to mind for me with those two you just described is the 00:51:34.960 |
principal engineer, fault-tolerance, handle unpredictable, is kind of grumpy and always 00:51:42.080 |
skeptical of anything new and it's probably not going to work and that sort of thing. Whereas 00:51:47.360 |
that fresh-faced early in their career, maybe more application-focused, and it's always thinking 00:51:52.800 |
about the happy path and the optimistic and, "Oh, don't worry about the edge case. That probably 00:51:56.640 |
won't happen." I don't write code with bugs, I don't know, whatever, like this. But really need 00:52:03.040 |
both together, I think. Both of those attitudes or personalities, if that's even the right way 00:52:08.080 |
to put it, together in one is, I think, what's-- Yeah, and I think people can come from either 00:52:12.880 |
end of the spectrum, to be clear. Not all grizzled principal engineers are the way that I'm described, 00:52:21.520 |
thankfully. Some probably are. And not all junior engineers are allergic to writing careful software 00:52:28.960 |
or unable and unexcited to pick that up. Yeah, it could be someone that's in the middle of the 00:52:34.640 |
career and naturally has a bit of both, could be someone at either end and just wants to round out 00:52:39.680 |
their skill set and lean into the thing that they're a bit weaker on. Any of the above would 00:52:44.400 |
work well for us. Okay, lovely. We've covered a fair amount of like-- Actually, I think we've 00:52:51.680 |
accidentally defined AI engineering along the way as well, because you kind of have to do that 00:52:55.520 |
in order to hire and interview for people. The last piece I wanted to offer to our audience is 00:53:01.760 |
sourcing. A very underappreciated part, because people just tend to rely on recruiters and assume 00:53:08.960 |
that the candidates fall from the sky. But I think the two of you have had plenty of experience with 00:53:14.320 |
really good sourcing, and I just want to leave some time open for what does AI engineer sourcing 00:53:19.440 |
look like? Is it being very loud on Twitter? Well, I mean, that definitely helps. I am really 00:53:25.440 |
quiet on Twitter, unfortunately, but a lot of my teammates are much more effective on that front, 00:53:29.280 |
which is deeply appreciated. I think in terms of-- Maybe I'll focus a little bit more on 00:53:35.920 |
active/outbound, if you will, rather than the kind of marketing/branding type of work that 00:53:43.840 |
Adam's been really effective with us on. The kinds of things that I'm looking for are certainly side 00:53:48.880 |
projects. It's really easy still. We're early enough on in this process that people can still 00:53:54.400 |
do interesting-- Pretty much at the cutting edge, not in terms of training whole models, of course, 00:53:59.600 |
but in terms of doing AI engineering. You can very much build interesting apps that have interesting 00:54:04.880 |
ideas and work well just using a basic open AI API key. People sharing that kind of stuff on 00:54:14.480 |
Twitter is always really interesting, or in Discord or Slacks, things like this. In terms of 00:54:19.920 |
the kind of caricature of the grizzled principal engineer kind of a person, it's notable. I've 00:54:27.360 |
spoken with a bunch of people coming from that kind of perspective. They're fairly easy to find. 00:54:32.160 |
They tend to be on LinkedIn. They tend to be really obvious on LinkedIn because they're maybe 00:54:37.680 |
a bit more senior. They've got a ton of connections. They're probably expected to post thought 00:54:43.280 |
leadership kinds of things on LinkedIn. Everyone's favorite. Some of those people are interested in 00:54:49.280 |
picking up new skills and jumping into ML and large language models. Sometimes it's obvious 00:54:54.480 |
from a profile. Sometimes you just need to reach out and introduce yourself and say, "Hey, this is 00:54:58.800 |
what we're doing. We think we could use your skills." A bunch of them will bite your hand off, 00:55:04.000 |
actually, because it is such an interesting area. That's how we've found success at sourcing on the 00:55:11.040 |
kind of more experienced end of the spectrum. I think on the less experienced end of the spectrum, 00:55:15.920 |
having lots of hooks in the ocean seems to be a good strategy if I think about what's worked for 00:55:21.840 |
us. It tends to be much harder to find those people because they have less of an online presence in 00:55:27.520 |
terms of active outbound. Things like blog posts, things like hot takes on Twitter, things like 00:55:35.600 |
challenges that we might have, those are the kind of vectors through which you can find these keen, 00:55:43.200 |
full of energy, less experienced people and bring them towards you. 00:55:47.760 |
Adam, do you have anything? You're pretty good on Twitter compared to me, at least. What's your 00:55:54.720 |
take on, yeah, the kind of more like bring stuff out there and have people come towards you for 00:55:59.840 |
this kind of a role? Yeah, I do typically think of sourcing as being the one-two punch of one, 00:56:05.520 |
raise the beacon. Let the world know that you are working on interesting problems and you're 00:56:11.840 |
expanding your team and maybe there's a place for someone like them on that team. That could come in 00:56:17.360 |
a variety of forms, whether it's going to a job fair and having a booth. Obviously, it's job 00:56:22.800 |
descriptions posted to your site. It's obviously things like, in some cases, yeah, blog posts about 00:56:29.680 |
stuff you're working on, releasing open source, anything that goes out into the world and people 00:56:33.920 |
find out about what you're doing, not at the very surface level of here's what the product is and, 00:56:39.440 |
I don't know, we have a couple of job descriptions on the site, but a layer deeper of like here's 00:56:43.520 |
the kind, here's what it actually looks like to work on the sort of things we're working on. 00:56:48.960 |
So, I think that's one piece of it and then the other piece of it, as you said, is the outbound. 00:56:53.440 |
I think it's not enough to, especially when you're small, I think it changes a lot when you're a 00:56:58.400 |
bigger company with a strong brand or if the product you're working on is more in a technical 00:57:03.360 |
space and so, therefore, maybe there's actually among your customers, there's the sorts of people 00:57:08.240 |
that you might like to work for you. I don't know, if you're GitHub, then probably all of your users 00:57:12.960 |
and the people you want to hire are among your user base, which is a nice combination, but for 00:57:17.680 |
most products, that's not going to be the case. So then, now the outbound is a big piece of it and 00:57:21.840 |
part of that is, as you said, getting out into the world, whether it's going to meetups, whether it's 00:57:25.680 |
going to conferences, whether it's being on Twitter and just genuinely being out there and part of the 00:57:31.120 |
field and having conversations with people and seeing people who are doing interesting things 00:57:34.640 |
and making connections with them, hopefully not in a transactional way or you're always just, 00:57:40.000 |
you know, sniffing around for who's available to hire, but you just generally, if you like this 00:57:44.160 |
work and you want to be part of the field and you want to follow along with people who are doing 00:57:48.400 |
interesting things and then, by the way, you will discover when they post, "Oh, I'm wrapping up my 00:57:53.200 |
job here and thinking about the next thing," and that's a good time to ping them and be like, "Oh, 00:57:58.640 |
cool. Actually, we have maybe some things that you might be interested in here on the team," 00:58:03.840 |
and that kind of outbound. But I think it also pairs well. It's not just that you need both, 00:58:09.680 |
it's that they reinforce each other. So, if someone has seen, for example, the open source 00:58:14.560 |
project you've released and they're like, "Oh, that's cool," and they briefly look at your 00:58:18.000 |
company and then you follow each other on Twitter or whatever and then they post, "Hey, I'm thinking 00:58:22.320 |
about my next thing," and you write them and they already have some context of like, "Oh, I like 00:58:26.640 |
that project you did and I liked, you know, I kind of have some ambient awareness of what you're 00:58:31.200 |
doing. Yeah, let's have a conversation. This isn't totally cold." So, I think those two together are 00:58:36.640 |
important. The other footnote I would put, again, on the specifics, that's, I think, general sourcing 00:58:41.360 |
for any kind of role, but for AI engineering specifically, you're not looking for professional 00:58:45.680 |
experience. At this stage, you're not always looking for professional experience with language 00:58:49.280 |
models. It's just too early. So, it's totally fine that someone has the professional experience 00:58:53.600 |
with the conventional engineering skills, but, yeah, the interest, the curiosity, that sort of 00:59:01.120 |
thing expressed through side projects, hackathons, blog posts, whatever it is. Yeah, absolutely. I 00:59:07.600 |
often tell people, a lot of people are asking me for San Francisco AI engineers because they want, 00:59:11.840 |
there's this sort of wave or reaction against the remote mindset, which I know that you guys 00:59:17.520 |
probably differ in opinion on, but a lot of people are trying to, you know, go back to office. 00:59:21.280 |
And so, my only option for people is just find them at the hackathons. Like, you know, the most 00:59:27.040 |
self-driven, motivated people who can work on things quickly and ship fast are already in 00:59:31.680 |
hackathons, and just go through the list of winners. And then, self-interestedly, you know, 00:59:37.120 |
if, for example, someone's hosting an AI conference from June 25th to June 27th in San Francisco, 00:59:43.120 |
you might want to show up there and see who might be available. So, like, and that is true. Like, 00:59:50.960 |
you know, it's not something I want to advertise to the employers, the people who come, but a lot 00:59:55.360 |
of people change jobs at conferences. This is a known thing, so. Yeah, of course. But I think it's 01:00:01.040 |
the same as engaging on Twitter, engaging in open source, attending conferences. 100%, this is a 01:00:05.840 |
great way both to find new opportunities if you're a job seeker, find people for your team, if you're 01:00:11.280 |
a hiring manager, but if you come at it too network-y and transactional, that's just gross 01:00:16.560 |
for everyone. Hopefully, we're all people that got into this work largely because we love it, 01:00:21.920 |
and it's nice to connect with other people that have the same, you know, skills and struggle with 01:00:26.480 |
the same problems in their work, and you make genuine connections, and you learn from each other, 01:00:30.880 |
and by the way, from that can come as a, well, not quite a side effect, but an effect on the list 01:00:37.760 |
is pairing together people who are looking for opportunities with people who have interesting 01:00:42.000 |
problems to work on. Yeah, totally. Yeah, most important part of employer branding, you know, 01:00:46.880 |
have a great mission, have great teammates, you know, if you can show that off in whatever way 01:00:52.400 |
you can, you'll be starting off on the right foot. On that note, we have been really successful with 01:00:58.480 |
hiring a number of people from targeted job boards, maybe is the right way of saying it, 01:01:06.400 |
so not some kind of generic indeed.com or something, not to trash them, but something 01:01:11.680 |
that's a bit more tied to your mission, tied to what you're doing, something which is really 01:01:15.680 |
relevant, something which is going to cut down the search space for what you're looking at, 01:01:19.120 |
what the candidate's looking at, so we're definitely affiliated with the safety, 01:01:25.600 |
effective altruist kind of movement. We've gone to a few EA globals and have hired people 01:01:33.040 |
effectively through the 80,000 hours list as well, so you know, that's not the only reason why people 01:01:38.320 |
would want to join illicit, but as an example of if you're interested in AI safety or, you know, 01:01:43.760 |
whatever your take is on this stuff, then there's probably something, there's a substack, there's a 01:01:47.600 |
podcast, there's a mailing list, there's a job board, there's something which lets you zoom 01:01:52.320 |
in on the kind of particular take that you agree with. You brought this up, so I have to ask, 01:01:59.680 |
what is the state of EA post-SBF? I don't know if I'm the person to, I don't know if I'm the 01:02:04.000 |
spokesman for that. Yeah, I mean, look, it's still going on, there's definitely a period of reflection 01:02:13.120 |
and licking of wounds and thinking how did this happen. There's been a few conversations with 01:02:18.080 |
people really senior in EA talking about how it was a super difficult time from a personal 01:02:24.880 |
perspective and what is this even all about, and I don't know if this is a good thing that I've done 01:02:29.360 |
and, you know, quite a sobering moment for everyone, I think. But yeah, you know, it's 01:02:34.960 |
definitely still going, EA forum is active, we have people from illicit going to EA global. 01:02:39.920 |
Yeah, if anything, from a personal perspective, I hope that it helps us spot blowhards and 01:02:45.680 |
charlatans more easily and avoid whatever the kind of massive circumstances were that got us into the 01:02:53.520 |
situation with SBF and the kind of unfortunate fallout from that. If it makes us a bit more 01:02:59.920 |
able to spot that happening, then all for the better. 01:03:05.120 |
Excellent. Cool, I will leave it there. Any last comments about just hiring in general? 01:03:11.280 |
Advice to other technology leaders in AI? You know, one thing I'm trying to do for 01:03:17.200 |
my conference as well is to create a forum for technology leaders to share thoughts, 01:03:22.480 |
right? Like what's an interesting trend? What's an interesting open problem? 01:03:25.440 |
What should people contact you on if they're working on something interesting? 01:03:30.080 |
Yeah, a couple of thoughts here. So firstly, when I think back to how I was when I was in my 01:03:38.320 |
early 20s, when I was at when I was at college, or university, the purity and capabilities and 01:03:45.360 |
just kind of general put togetherness of people at that age now is strikingly different to where 01:03:51.200 |
I was then. And I think this is not because I was especially sadistical or something when I was when 01:03:58.400 |
I was young. I think I hear the same thing echoed in other people about my about my age. So the 01:04:04.640 |
takeaway from that is finding a way of presenting yourself to and identifying and bringing in really 01:04:11.760 |
high capability young people into your organization. I mean, it's always been true, but I 01:04:16.320 |
think it's even more true now that they're not. They're not. They're kind of more professional, 01:04:24.960 |
more capable, more committed, more driven, have more of a sense of what they're all about than 01:04:30.160 |
certainly I did 20 years ago. So that's, that's the first thing. I think the second thing is in 01:04:35.360 |
terms of the interview process. This is somewhat a general take, but it definitely applies to AI 01:04:40.080 |
engineer roles, and I think more so to AI engineer roles. I really have a strong dislike and distaste 01:04:46.960 |
for interview questions, which are arbitrary and kind of strip away all the context from what it 01:04:53.360 |
really is to do the work. We try to make the interview process that's illicit a simulation 01:04:58.640 |
of working together. The only people that we go into an interview process with are pretty obviously 01:05:05.040 |
extraordinary, really, really capable. They must have done something for them to have moved into 01:05:10.880 |
the proper interview process. It is a check on technical capability and in the ways that we've 01:05:16.560 |
described, but it's at least as much them sizing us up. Is this something which is worth my time? 01:05:21.840 |
Is it something that I'm going to really be able to dedicate myself to? So be able to show them 01:05:26.080 |
this is really what it's like working at illicit. This is the people you're going to work with. 01:05:29.680 |
These are the kinds of tasks that you're going to be doing. This is the sort of environment that we 01:05:33.200 |
work in. These are the tools we use. All that kind of stuff is really, really important from 01:05:36.880 |
a candidate experience, but it also gives us a ton more signal as well about, you know, 01:05:42.720 |
what is it actually like to work with this person? Not just can they do really well on some kind of 01:05:46.480 |
LeetCode style problem. I think the reason that it bears particularly on the AI engineer role 01:05:51.920 |
is because it is something of an emerging category, if you will. So there isn't a very kind 01:05:59.280 |
of well-established, do these, nobody's written the book yet. Maybe this is the beginning of us 01:06:05.040 |
writing the book on how to get hired as an AI engineer, but that book doesn't exist at the 01:06:09.280 |
moment. Yeah, you know, it's an empirical job as much as any other kind of software engineering. 01:06:17.120 |
It's less about having kind of book learning and more about being able to apply that in a 01:06:20.880 |
real world situation. So let's make the interview as close to a real world situation as possible. 01:06:24.720 |
Adam, any last thoughts? I think you're muted. 01:06:27.680 |
I think it'd be hard to follow that to add on to what James said. 01:06:30.320 |
I do co-sign a lot of that. Yeah, I think this is a really great overview of just the sort of 01:06:38.240 |
state of hiring AI engineers. And honestly, that's just what AI engineering even is, which 01:06:42.960 |
it really is. When I was thinking about this as an industrial movement, it was very much around 01:06:49.680 |
the labor market, actually, and these economic forces that give rise to a role like this, 01:06:56.560 |
both on the incentives of the model labs, as well as the demand and supply of engineers and the 01:07:01.760 |
interest level of companies and the engineers working on these problems. So I definitely see 01:07:08.640 |
you guys as pioneers. Thank you so much for putting together this piece, which is something I've been 01:07:13.680 |
seeking for a long time. You even shared your job description, your reading list and your interview 01:07:19.920 |
loop. So if anyone's looking to hire AI engineers, I expect this to be the definitive piece and 01:07:25.680 |
definitive podcast covering it. So thank you so much for taking the time to cover it with me. 01:07:30.640 |
It was fun. Thanks. Yeah, thanks a lot. Really enjoyed the conversation. And I appreciate you 01:07:34.480 |
naming something which we all had in our heads, but couldn't put a label on. 01:07:38.000 |
It was going to be named anyway. So actually, I never actually personally say that I coined the 01:07:44.000 |
term because I'm sure someone else used the term before me. All I did was write a popular piece on 01:07:49.200 |
it. All right. So I'm happy to help because I know that it contributed to job creation at a bunch of 01:07:56.720 |
companies I respect and how people find each other, which is my whole goal here.