back to indexTruly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal
Chapters
0:0 Introductions
2:8 Erik’s OSS work at Spotify: Annoy and Luigi
6:51 Starting Modal
8:36 Vision for a “postmodern data stack”
11:56 Solving container cold start problems
14:13 Designing Modal’s Python SDK
16:50 Self-Revisioning Runtime
21:50 Truly Serverless Infrastructure
23:11 Beyond model inference
24:30 Tricks to maximize GPU utilization
29:28 Differences in AI and data science workloads
31:10 Modal vs Replicate vs Modular and lessons from Heroku’s “graduation problem”
38:3 Creating Erik’s clone “ErikBot”
42:9 Enabling massive parallelism across thousands of GPUs
44:23 The Modal Sandbox for agents
48:56 Thoughts on the AI Inference War
54:53 Erik’s best tweets
57:52 Why buying hardware is a waste of money
60:20 Erik’s competitive programming backgrounds
65:23 Why does Sweden have the best Counter Strike players?
66:18 Never owning a car or TV
66:47 Advice for infrastructure startups
00:00:00.000 |
- Hey everyone, welcome to the Latent Space Podcast. 00:00:22.660 |
And I think you were just making a San Francisco trip 00:00:31.320 |
- Yeah, that's right, we're based in New York. 00:00:34.920 |
to, you know, capital of AI and make a presence. 00:00:48.760 |
Obviously there's a lot more stuff going on here 00:01:03.240 |
So in that sense, like New York is kind of nice. 00:01:06.480 |
it's like five minutes away from my apartment. 00:01:12.420 |
So I'll do a brief bio and then we'll talk about 00:01:14.720 |
anything else that people should know about you. 00:01:17.140 |
So you actually, I was surprised to find this out. 00:01:31.560 |
So I actually studied physics, but I grew up coding 00:01:35.280 |
And then like as I was like thinking about graduating, 00:01:39.480 |
I got in touch with an obscure music streaming startup 00:01:42.480 |
called Spotify, which was then like 30 people. 00:01:47.000 |
why don't I just come and like write a master's thesis 00:01:48.920 |
with you and like I'll do some cool collaborative filtering. 00:01:52.720 |
I sort of, but no one knew anything back then. 00:01:57.960 |
building a prototype of a music recommendation system 00:02:06.920 |
And then, so that was the start of your data career. 00:02:20.640 |
It was a long stint and Spotify was a wild place early on. 00:02:23.680 |
And I mean, the data space is also a wild place. 00:02:36.920 |
And like I was hired to kind of figure out data stuff 00:02:40.240 |
and I started hacking on a recommendation system 00:02:43.560 |
and then got sidetracked in a bunch of other stuff. 00:02:46.240 |
I fixed a bunch of reporting things and set up A/B testing. 00:02:50.520 |
and later got back to music recommendation system. 00:02:52.400 |
And a lot of the infrastructure didn't really exist. 00:02:54.440 |
Like there was like Hadoop back then, which is kind of bad 00:02:57.040 |
and I don't miss it, but spent a lot of time with that. 00:03:00.600 |
As a part of that, I ended up building a workflow engine 00:03:04.720 |
like somewhat like widely ended up being used 00:03:08.600 |
Sort of like, you know, kind of like Airflow, 00:03:11.280 |
I think it did some things better, some things worse. 00:03:22.080 |
like vector database stuff ended up happening. 00:03:30.680 |
I didn't know it was gonna take like nine years 00:03:32.240 |
and then there's gonna suddenly be like 20 startups 00:03:44.040 |
But yeah, that was, yeah, it was a fun seven years 00:03:53.320 |
But like, has anything architecturally changed 00:04:00.760 |
Like I'm actually not following it like super closely. 00:04:05.200 |
some of the best algorithms are still the same 00:04:06.640 |
as like hierarchical, navigable, small world, whatever. 00:04:11.760 |
I think now there's like product quantization. 00:04:14.560 |
that I haven't really followed super closely. 00:04:17.440 |
I mean, obviously like back then it was like, 00:04:40.880 |
Now, of course, like databases are much better 00:04:43.520 |
in the sense like to support like inserts and updates 00:05:13.000 |
It was just kind of like very like level-headed, 00:05:17.080 |
Like never made any like obvious mistakes or, 00:05:20.240 |
that maybe like in hindsight were like a little, 00:05:25.240 |
But overall, I mean, I think he was a great CEO, 00:05:37.440 |
So then you spent six years as CTO of Better. 00:05:42.380 |
and then you scaled up to like 300 engineers. 00:05:44.720 |
- I joined as a CTO when there was like no tech team 00:05:47.640 |
and yeah, that was a wild chapter in my life. 00:05:55.440 |
but yeah, they kind of collapsed and, you know. 00:05:58.880 |
- Yeah, yeah, it was like a bunch of stories. 00:06:00.560 |
Yeah, I mean, the company like grew from like 10 people 00:06:03.120 |
when I joined at 10,000, now it's back to 1,000. 00:06:05.440 |
But yeah, they actually went public a few months ago, 00:06:10.300 |
So yeah, very kind of interesting six years of my life. 00:06:18.640 |
- Like learning a lot of that, like recruiting. 00:06:20.040 |
I spent all my time recruiting and stuff like that. 00:06:24.560 |
like now in a way, like when I'm building my own startup, 00:06:38.160 |
I took a little bit of a time off during the pandemic. 00:06:45.160 |
And then yeah, Modal took form in my head, took shape. 00:06:50.120 |
and maybe we can sort of trade off questions. 00:06:51.920 |
So the quick history is, started Modal in 2021, 00:06:58.960 |
Last year you just announced your Series A with Redpoint. 00:07:05.120 |
- And so like most people I think were expecting you 00:07:15.000 |
so I come from like, you know, Snowflake, BigQuery, 00:07:18.200 |
you know, Fivetrain, Nearby, that kind of stuff. 00:07:37.120 |
No, yeah, it's like, I started Modal because, 00:07:41.480 |
you know, like in a way, like I work with data 00:07:44.920 |
like every different part of the stack, right? 00:07:46.880 |
Like I thought everything was like business analytics 00:07:51.280 |
like building, you know, training neural networks 00:07:57.120 |
and one of the observations I had when I started Modal 00:08:00.520 |
I just wanted to make, build better tools for data teams. 00:08:03.280 |
And like very, like that's sort of an abstract thing, 00:08:05.440 |
but like I find that the data stack is, you know, 00:08:08.440 |
full of like point solutions that don't integrate well 00:08:11.120 |
and still, when you look at like data teams today, 00:08:14.000 |
you know, like every startup ends up building 00:08:15.800 |
their own internal Kubernetes wrapper or whatever. 00:08:18.480 |
And, you know, all the different data engineers 00:08:20.360 |
and machine learning engineers end up kind of struggling 00:08:29.800 |
Like, 'cause you kind of want to like throw out everything 00:08:41.400 |
like more focused on like the human side of like, 00:08:44.400 |
And like, what is the technology tools that they need? 00:08:46.600 |
And like, you know, drew out a lot of charts of like, 00:08:54.480 |
'cause it kind of sits in like a nice sort of, you know, 00:08:56.680 |
it's like a hub in the graph of like data products. 00:09:00.440 |
But it was kind of hard to like kind of do that in a vacuum 00:09:05.320 |
And I got very interested in like the layers below 00:09:08.480 |
at some point and like, at the end of the day, 00:09:11.120 |
like most people have code to have to run somewhere. 00:09:18.960 |
And in particular, like the thing I always like thought 00:09:21.960 |
I think the best way to measure developer productivity 00:09:31.840 |
And at the innermost loop, it's like running some, 00:09:35.680 |
And like, as soon as you start working with the cloud, 00:09:39.520 |
'cause you have to build a fucking Docker container 00:09:41.000 |
and push it to the cloud and like run it, you know? 00:09:44.640 |
It was like, I just wanna solve that problem. 00:09:54.680 |
And in particular, I was quite focused on data teams 00:09:57.200 |
'cause I think they had a couple of unique needs 00:09:59.440 |
that wasn't well served by the infrastructure at that time 00:10:06.160 |
for backend teams, but not so well for data teams. 00:10:09.480 |
And very quickly, I got sucked into like a very deep, 00:10:12.880 |
- Not well for data teams because of burstiness. 00:10:23.920 |
Another thing tends to be like hardware requirements. 00:10:28.320 |
Like you go, you know, data engineers go to like, 00:10:33.200 |
and they're like, "Can we add GPUs to the Kubernetes?" 00:10:35.040 |
They're like, "No, like that's, you know, complex." 00:10:39.720 |
And then like, I mean, I also like data code, 00:10:47.080 |
Like you end up having like a lot of like custom, 00:10:49.160 |
like containers and like environment conflicts. 00:10:51.680 |
And like, so it ends up having a lot of like annoying, 00:10:54.000 |
like it's very hard to set up like a unified container 00:11:00.600 |
because like there's always like packages that break. 00:11:02.440 |
And so I think there's a lot of different reasons, 00:11:05.000 |
why, you know, the technology wasn't well-suited for backend. 00:11:09.840 |
And I think the attitude at that time was often like, 00:11:12.440 |
you know, like you had friction between the data team 00:11:18.240 |
Like, you know, why don't you just like, you know, 00:11:20.800 |
But like, I actually felt like data teams at that point, 00:11:24.440 |
like there's so much, so many people working with data 00:11:28.120 |
like deserve their own tools and their own tool chains. 00:11:32.800 |
So that's sort of like very abstract philosophical reason 00:11:36.600 |
And then I got sucked into this like rabbit hole 00:11:44.840 |
- Yeah, tell people, I think the first time I met you, 00:11:47.280 |
I think you told me some numbers, but I don't remember. 00:11:54.360 |
- Yeah, I mean, like in particular, it was like, 00:11:55.920 |
like how do you, like in order to have that loop, right? 00:12:06.160 |
and maybe like spin up like a hundred containers, 00:12:11.720 |
like from like a developer productivity point of view, 00:12:16.000 |
I wanna take code, I wanna stick it in container, 00:12:17.600 |
I wanna execute in the cloud and like, you know, 00:12:20.800 |
And when you look at like how Docker works for instance, 00:12:23.440 |
like Docker, you have this like fairly convoluted, 00:12:26.720 |
like very resource inefficient way they, you know, 00:12:30.560 |
you build a container, you upload the whole container 00:12:37.200 |
So like, so I started kind of like, you know, 00:12:39.960 |
going a layer deeper, like Docker is actually like, 00:12:41.760 |
you know, there's like a couple of different primitives, 00:12:46.520 |
And I was like, what if I just take the container runner, 00:12:48.840 |
like run C and I point it to like my own root file system. 00:12:58.840 |
And that was like the sort of very crude version of model. 00:13:00.520 |
It's like, now I can actually start containers very quickly 00:13:03.760 |
because it turns out like when you start a Docker container, 00:13:09.400 |
And like 99% of that is never gonna be consumed. 00:13:13.440 |
like time zone information for like Uzbekistan 00:13:33.840 |
And that actually enabled us to like get to this point 00:13:36.320 |
of like you write code and then you can launch it 00:13:42.320 |
And, you know, there's been many optimizations since then, 00:13:47.160 |
- Can we talk about the developer experience as well? 00:13:58.760 |
but then you also have a way to define a full container. 00:14:02.000 |
What were kind of the design decisions that went into it? 00:14:06.400 |
And then maybe how much complexity did you then add on 00:14:13.200 |
I almost feel like it's like almost like two products 00:14:25.680 |
like I think your blog was like the self-provisioning runtime 00:14:27.760 |
was like, to me, always like to sort of like the, 00:14:31.920 |
It's like, so I didn't think about like, I want to-- 00:14:39.480 |
- Yeah, well, I don't know, like convergence of minds. 00:14:41.400 |
Like we're thinking, I guess we're like both thinking. 00:14:43.520 |
Maybe you put, I think, better words than like, 00:14:46.280 |
maybe it's something I was like thinking about 00:14:48.440 |
- Yeah, and I can tell you how I was thinking about it 00:14:51.760 |
Like, and like, to me, like what I always wanted to build 00:14:54.000 |
was like, I don't know, like I don't know if you use 00:14:56.120 |
like Pulumi, like Pulumi is like nice, like in the sense, 00:15:03.460 |
Like finally, I can like, you know, put a for loop 00:15:07.680 |
And I think like Modal sort of goes one step further 00:15:10.040 |
in the sense that like, what if you also put the app code 00:15:12.760 |
inside the infrastructure code and like glue it all together 00:15:16.680 |
that defines everything and it's all programmable. 00:15:20.480 |
Like Modal has like zero config, there's no config. 00:15:29.560 |
I often find that so much of like my time was spent 00:15:42.920 |
like different containers, just like a function call. 00:15:46.920 |
in this container and this other function runs 00:15:52.000 |
Then, you know, I can build this applications 00:15:54.200 |
that may span a lot of different environments. 00:16:00.160 |
You just like have this beautiful kind of nice, 00:16:11.600 |
By the way, we keep changing syntax quite a lot 00:16:13.160 |
'cause I think it's still somewhat exploratory, 00:16:20.120 |
- Yeah, and along the way, with this expressiveness, 00:16:31.160 |
on the function decorator, you're like GPU equals, 00:16:38.280 |
And then you get that GPU and like, you know, 00:16:40.840 |
Like you don't have to, you know, go through hoops 00:16:43.200 |
to, you know, start a EC2 instance or whatever. 00:16:50.280 |
Self-Provisioning Runtimes was I was working at AWS 00:16:53.680 |
and we had AWS CDK, which is kind of like, you know, 00:17:10.680 |
So then you're writing code to define your infrastructure, 00:17:13.720 |
then you're writing code to define your application. 00:17:21.440 |
but like, was it like Sam or Chalice or one of those, 00:17:37.880 |
you know, like historically in order for me to like, 00:17:43.960 |
with developer experience, like, and that's been. 00:18:03.200 |
that you guys had it on your landing page at some point. 00:18:10.840 |
But I definitely got sent a few pitch decks with me, 00:18:16.320 |
This is my first time like kind of putting a name 00:18:21.600 |
for people to just communicate what they're trying to do. 00:18:23.680 |
- Yeah, no, I think it's a beautiful concept, yeah. 00:18:28.440 |
What became more clear in your explanation today 00:18:32.200 |
is that actually you're not that tied to Python. 00:18:34.800 |
- No, I mean, I think that all the like lower level stuff 00:18:40.960 |
and, you know, serving container data and stuff. 00:18:46.640 |
Like one of the benefits of data teams is obviously like, 00:18:51.840 |
I think, you know, if we had focused on other workloads, 00:18:56.080 |
like we've been kind of like half thinking about like CI 00:19:02.080 |
'cause like you also, then you have to be like, 00:19:12.760 |
- I mean, like definitely like in the future, 00:19:16.840 |
JavaScript for sure is the obvious next language, 00:19:28.600 |
I actually am a person who like kind of liked the idea 00:19:39.360 |
But all I saw out of the academic sort of PLT type people 00:19:47.400 |
like one of the core reasons for self-provisioning runtimes 00:20:03.440 |
on the order of like automatic memory management. 00:20:06.280 |
You know, you could sort of make that analogy 00:20:08.160 |
that yeah, you like, maybe you lose some level of control, 00:20:23.480 |
like, you know, it's come in like small increments 00:20:25.520 |
of like, you know, dynamic, like dynamic typing, 00:20:29.880 |
It's not suddenly like for a lot of use cases, 00:20:32.680 |
or better compiler technology or like, you know, 00:20:35.600 |
or new ways to, you know, the cloud or like, you know, 00:20:43.360 |
it's a steadily, you know, it's like, you know, 00:20:48.080 |
like probably 10 X more productive every decade 00:20:54.640 |
we're talking about 10 X or is there a 10,000 X, 00:20:57.400 |
like, you know, improvement in developer productivity. 00:21:04.560 |
Maybe it wasn't even possible in the eighties. 00:21:08.080 |
I think it's going to keep going for the next few decades. 00:21:11.160 |
- Another big thing in the infra 2.0 wishlist 00:21:25.840 |
has always been people really wanted it to be stateful, 00:21:50.600 |
going in the direction of like doing more stateful things 00:21:52.800 |
and working with data and like high IO use cases. 00:21:59.200 |
serendipitous thing that happened like halfway, 00:22:02.920 |
you know, building modal was like Gen AI started exploding. 00:22:19.480 |
and then it sends back like a tiny piece of information. 00:22:22.480 |
And that turns out to be something like, you know, 00:22:41.400 |
like model inference, like it's like clearly a good fit. 00:22:49.000 |
you know, the initial sort of like killer app 00:23:02.760 |
- Yeah, and this was before you started offering 00:23:12.080 |
to be a very general purpose compute platform, 00:23:14.080 |
like something where you could run everything. 00:23:15.200 |
And I used to call modal like a better Kubernetes 00:23:22.000 |
yeah, that's like, you know, a year and a half in, 00:23:25.760 |
And like, we were like, well, maybe we should look 00:23:27.640 |
at like some use case, trying to think of use case. 00:23:29.400 |
And that was around the same time stable diffusion came out. 00:23:32.440 |
And yeah, like, I mean, like the beauty of modal 00:23:35.200 |
is like you can run almost anything on modal, right? 00:23:37.840 |
Like modal inference turned out to be like the place 00:23:39.560 |
where we found initially, well, like clearly this has 00:23:41.880 |
like 10X more ergonomic, like better ergonomics 00:23:53.200 |
What about, you know, end-to-end lifecycle deployment? 00:23:56.120 |
What about, you know, I don't know, real-time streaming? 00:24:04.680 |
I think there's so many things, like kind of going back 00:24:07.320 |
to what I said about like redefining data stack, 00:24:09.200 |
like starting with the foundation of compute, 00:24:12.760 |
like one of the exciting things about modal is like, 00:24:14.720 |
we've sort of, you know, we've been working on that 00:24:20.560 |
like with just like a better compute primitive 00:24:23.760 |
and also go up to stack and like do all this other stuff 00:24:27.200 |
- Yeah, how do you think about, or rather like, 00:24:30.400 |
I would love to learn more about the underlying 00:24:32.520 |
infrastructure and like how you make that happen 00:24:40.440 |
like you exactly know what you're gonna load in memory one 00:24:43.240 |
and it's kind of like a set amount of compute 00:24:45.040 |
versus inference, just like data is like very bursty. 00:24:56.000 |
You know, like what are like some fun technical challenge 00:24:58.460 |
you solve to make sure you get max utilization 00:25:01.520 |
What we hear from people is like, we have GPUs, 00:25:09.340 |
- What's some of the fun stuff you're working on 00:25:16.480 |
like from a cost perspective, like utilization perspective, 00:25:18.840 |
we've seen, you know, like very, very good numbers. 00:25:21.520 |
And in particular, like it's our ability to start containers 00:25:31.600 |
which means like we can always adjust the sort of capacity, 00:25:33.980 |
the number of GPUs running to the exact, you know, 00:25:38.240 |
And so in many cases, like that actually leads 00:25:42.440 |
we obviously run our things on like the public cloud, 00:25:47.320 |
But in many cases, like users who do inference 00:25:53.880 |
even though we charge a slightly higher price per GPU hour, 00:25:58.480 |
a lot of users like moving their large scale inference 00:26:00.600 |
use cases to model, they end up saving a lot of money. 00:26:08.040 |
if you have to constantly adjust the number of machines, 00:26:10.680 |
if you have to start containers, stop containers, 00:26:13.280 |
And that, you know, and starting containers quickly 00:26:17.160 |
I mentioned we had to build our own file system for this. 00:26:21.560 |
We also, you know, built our own container scheduler 00:26:28.160 |
CPU memory checkpointing, so we can take running containers 00:26:31.280 |
and snapshot the entire CPU, like including registers 00:26:34.280 |
and everything, and restore it from that point, 00:26:37.000 |
which means we can restore it from like an initialized state. 00:26:45.120 |
So I think on the inference stuff, on the inference side, 00:26:52.320 |
you can push the frontier of latency versus utilization 00:26:58.240 |
which either ends up being a latency advantage 00:27:02.080 |
On training, it's probably arguably like less 00:27:09.760 |
like, you know, train as much as you can on each machine. 00:27:12.960 |
For that area, like we've seen like, you know, 00:27:17.320 |
But there are always like some interesting use case, 00:27:22.520 |
and they basically like one of the patterns they have 00:27:30.080 |
Like we can start up 100 containers very quickly, 00:27:32.640 |
run a fine tuning training job on each one of them 00:27:35.360 |
for that only runs for, I don't know, 10, 20 minutes. 00:27:37.840 |
And then, you know, you can do hyper parameter tuning 00:27:47.320 |
that's a use case we don't support super well, 00:27:50.920 |
you need to have like infinity band and all these things. 00:27:52.920 |
And those are things we haven't supported yet, 00:28:03.800 |
There's other cloud providers that do custom kernels 00:28:07.680 |
or are you just given that you're not just an AI 00:28:12.440 |
- Yeah, I mean, I think like we wanna support 00:28:14.120 |
like a generic, like general workloads in a sense 00:28:16.200 |
that like we want users to give us a container essentially, 00:28:18.320 |
or a code or code, and then we wanna run that. 00:28:20.960 |
So I think, you know, we benefit from those things 00:28:27.320 |
we can tell our users, you know, to use those things. 00:28:32.120 |
into users containers and like do those things automatically 00:28:37.680 |
we wanna be able to take like arbitrary code and execute it. 00:28:41.720 |
we can tell our users to like use those things. 00:29:14.320 |
and like signups and talks and all that stuff. 00:29:18.160 |
are the ones that actually appealed to engineers. 00:29:20.280 |
And the top usage, the top tool used by far was Modal. 00:29:26.280 |
- Yeah, I mean, it might be also like a terminology question 00:29:29.920 |
Like I've, you know, maybe I'm just like old and jaded, 00:29:32.080 |
but like I've seen so many like different titles. 00:29:36.600 |
I was a data scientist and I was a machine learning engineer 00:29:39.240 |
and then, you know, there was like analytics engineers 00:29:41.280 |
and then it was like an AI engineer, you know? 00:29:43.080 |
So like, to me, it's like, I just like, in my head, 00:29:47.200 |
- Just data, like, or like engineer, you know? 00:29:49.320 |
Like, I don't really, so that's why I've been like, 00:29:52.800 |
But like, of course, like, you know, AI is like, you know, 00:29:55.760 |
like such a massive fraction of our like workloads. 00:29:58.720 |
- It's a different Venn diagram of things you do, right? 00:30:16.440 |
like, we have a lot of users that are like doing stuff 00:30:25.720 |
fire up like a hundred or a thousand containers 00:30:27.960 |
running Chromium and just like render a bunch of webpages 00:30:35.720 |
Like, you know, we have a bunch of users doing that 00:30:37.840 |
or like, you know, in terms of in the realm of biotech, 00:30:44.960 |
to run like large, like mixed integer programming problems, 00:30:47.400 |
like, you know, using Garobi or things like that. 00:30:49.440 |
So video processing is another thing that keeps coming up. 00:30:52.300 |
Like, you know, let's say you have like petabytes of video 00:31:05.440 |
like model is kind of general purpose in that sense. 00:31:11.160 |
and then we'll move on to the other use cases 00:31:13.160 |
of sort of for AI that you want to highlight. 00:31:16.160 |
The other big player in my mind is Replicate. 00:31:22.840 |
They're much more, I guess, custom built for that purpose, 00:31:32.840 |
or are you just heads on competitive competing? 00:31:34.880 |
- I think there's like a tiny sliver of the Venn diagram 00:31:39.520 |
and then like 99% of the area we're not competitive. 00:31:47.240 |
I think that's where like really they found good fit. 00:31:48.880 |
It's like, you know, people who built some cool web app 00:31:52.940 |
and they just, you know, an off the shelf model 00:31:56.520 |
That's like use Replicate, that's great, right? 00:31:59.400 |
Like, I think where we shine is like custom models 00:32:05.400 |
We need to care about utilization, care about costs. 00:32:12.680 |
And, you know, and that's where we're competitive, right? 00:32:14.560 |
Like, you know, and you look at some of our use cases, 00:32:17.960 |
Like they're running like large scale like AI. 00:32:26.880 |
like custom code and custom weights, you know, 00:32:31.100 |
You know, those are the types of use cases that we like, 00:32:40.320 |
Like I think they focus on a very different part 00:32:51.680 |
- No, no, well, no, but yes, the name is very similar. 00:32:54.880 |
I think there's something that might be insightful there 00:33:00.120 |
But no, they have Mojo, the sort of Python SDK. 00:33:03.480 |
And then they have the Modular Inference Engine, 00:33:09.160 |
I don't know if anyone's made the comparison to you before, 00:33:12.520 |
but I see you evolving a little bit in parallel there. 00:33:18.120 |
Like it's not a company I'm like super like familiar, 00:33:21.640 |
but like, I guess they're similar in the sense 00:33:26.920 |
- Yes, they also wanna build very general purpose. 00:33:31.280 |
as like, if you wanna do off the shelf stuff, 00:34:05.680 |
So anyway, I would just make that comparison, 00:34:09.840 |
but it's an interesting way to see the cloud market develop 00:34:17.720 |
and I think your vision is like something slightly different 00:34:21.480 |
and I'd like to see the different takes on it. 00:34:25.440 |
like I've written a bit about it in my blog too, 00:34:27.160 |
it's like I think of us as like a second layer 00:34:30.600 |
I think Snowflake is like kind of a good analogy. 00:34:35.920 |
But they actually run on the like major clouds, right? 00:34:38.520 |
And I mean, like you can like analyze this very deeply, 00:34:41.240 |
but like one of the things I always thought about 00:34:42.400 |
is like why did Snowflake already like win over Redshift? 00:35:01.760 |
a layer up from, you know, the traditional like public clouds 00:35:04.760 |
and in that layer, that's also where I would put Modal. 00:35:08.320 |
It's like, you know, we're building a cloud provider. 00:35:09.880 |
Like we're, you know, we're like a multi-tenant environment 00:35:14.200 |
but also building on top of the public cloud. 00:35:15.720 |
So I think there's a lot of room in that space. 00:35:17.480 |
I think it's very sort of interesting direction. 00:35:30.120 |
- Yeah, I mean, I think those are all like great. 00:35:32.560 |
Like, I think the problem that they all faced 00:35:39.040 |
like also like Heroku, there's like a counterfactual future 00:35:52.560 |
that you couldn't really justify running in Heroku. 00:35:54.880 |
They would just go and like move it to, you know, 00:36:01.520 |
Like, what does that graduation risk look like for modal? 00:36:13.640 |
is you have to appeal to the entire spectrum, right? 00:36:17.920 |
like you have to capture the enterprise market. 00:36:24.320 |
I don't like Datadog or Mongo or something like that, 00:36:26.040 |
where like they both captured like the hobbyists, 00:36:38.400 |
in my opinion, like Heroku struggle was like, 00:37:03.040 |
needs like thousands of GPUs at very, you know, 00:37:05.600 |
like just because we can drive utilization so much better, 00:37:09.520 |
like we, there's actually like a cost advantage 00:37:13.080 |
But yeah, I mean, it's certainly like, you know, 00:37:15.680 |
and then like the fact that VCs like love, you know, 00:37:28.640 |
So in training, I think there's certainly like 00:37:30.000 |
better economics of like buying big clusters. 00:37:32.320 |
But I mean, my hope it's gonna change, right? 00:37:36.520 |
Like I think, you know, we're still pretty early 00:37:38.320 |
in the cycle of like building AI infrastructure. 00:37:41.560 |
And, you know, I think a lot of these companies 00:37:52.200 |
But like everyone else, like some extent, you know, 00:37:54.880 |
I think they're better off like buying platforms. 00:37:57.280 |
And, you know, someone's gonna have to build those platforms. 00:38:08.360 |
You already said that Ramp is like fine tuning 00:38:11.080 |
a hundred models at once simultaneously on model. 00:38:14.400 |
Closer to home, my favorite example is EricBot. 00:38:20.840 |
- Yeah, I mean, it was a prototype thing we built for fun, 00:38:25.200 |
Like we basically built this thing that you can, 00:38:30.960 |
and, you know, fine tunes a model based on a person. 00:38:34.640 |
And so you can like, you know, clone yourself 00:38:41.840 |
Like there's a model app that does everything, right? 00:38:54.120 |
and like, you know, a few hundred lines of code. 00:38:55.360 |
So I think it's sort of a good kind of use case for more, 00:39:07.000 |
- It definitely captured the like, the language. 00:39:10.160 |
Yeah, I mean, I don't know, like the content. 00:39:22.880 |
it's like, yeah, this seems really smart, you know? 00:39:25.280 |
But then you actually like look a little bit deeper. 00:39:30.280 |
And that's like kind of what I felt like, you know, 00:39:33.840 |
Like it like says like things like the grammar is correct. 00:39:36.520 |
Like some of the sentences make a lot of sense, 00:39:42.920 |
I mean, it's like, I got that feeling also with chat TBT 00:39:48.560 |
Yeah, I built this thing called small podcaster 00:39:50.120 |
to automate a lot of our back office work, so to speak. 00:40:02.720 |
but it's like, it's not even the same ballpark 00:40:07.600 |
And it's hard to see how it's gonna get there. 00:40:18.120 |
I don't know if you've read like AI generated books, 00:40:20.560 |
like they just like kind of seem funny, right? 00:40:26.440 |
Like looks correct, but then it's like very weird 00:40:36.200 |
If you go to modal.com, there's a button in the footer. 00:40:41.000 |
And then sometimes, I really like picking Eric Bot, 00:40:52.560 |
just broadening out from like the single use case 00:40:54.840 |
of fine tuning, like what are you seeing people do 00:41:01.960 |
- Yeah, I mean, I think language models is interesting 00:41:08.200 |
and that's, you know, they're just dominating a space 00:41:16.040 |
but like it's just not like a core focus for us. 00:41:19.280 |
it's sort of a question if like there's economics 00:41:21.640 |
But like, so we tend to focus on more like the areas 00:41:25.560 |
Like fine tuning, like another use case we have 00:41:38.040 |
and like parallelize embeddings in 15 minutes 00:41:52.760 |
or things like that we have, you want more control. 00:41:56.640 |
Like those are the things like we see a lot of users 00:42:12.680 |
because I think people don't understand how parallel. 00:42:15.440 |
So like, I think your classic hello world with modal 00:42:18.160 |
is like some kind of Fibonacci function, right? 00:42:26.760 |
at least like a hundred GPUs, like in a few seconds. 00:42:28.880 |
And, you know, if you give it like a couple of minutes, 00:42:39.000 |
many thousands of GPUs at certain points when we need it, 00:42:42.840 |
or some customers had very large compute needs. 00:42:45.680 |
And I mean, that's super useful for a number of things. 00:42:51.000 |
so one of my early interactions with modal as well 00:42:56.880 |
The reason I chose modal was a number of things. 00:43:07.760 |
you could have that sort of local development experience 00:43:12.320 |
but then it would seamlessly translate to a cloud service 00:43:17.760 |
And then it could fan out with concurrency controls. 00:43:21.920 |
the number of times I hit the GPT-3 API at the time 00:43:26.640 |
was gonna be subject to the rate limit from there. 00:43:38.440 |
- Yeah, there's a lot of control there, yeah. 00:43:40.120 |
So like, I just wanted to highlight that to people as like, 00:43:41.840 |
yeah, this is a pretty good use case for like, 00:43:43.960 |
you know, just like writing this kind of LLM application code 00:43:48.440 |
inside of this environment that just understands 00:43:55.720 |
You don't actually have an exposed queue system, 00:44:04.880 |
- So the last part of modal I wanted to touch on, 00:44:10.960 |
was the sandbox that was introduced last year. 00:44:15.320 |
And this is something that I think was inspired 00:44:18.960 |
You can tell me the longer history behind that. 00:44:21.080 |
- Yeah, like we originally built it for the use case. 00:44:28.160 |
and then they wanted, they came to us and asked us, 00:44:33.040 |
And yeah, we spent a lot of time on like container security. 00:44:40.360 |
So we built a product where you can basically 00:44:44.360 |
and monitor its output, or get it back in a safe way. 00:44:49.000 |
I mean, over time, it's evolved into more of like, 00:44:59.720 |
where I think the core container infrastructure we offer 00:45:09.800 |
like we're talking to a couple of like other companies 00:45:13.840 |
that want to run, you know, through their packages, 00:45:21.440 |
So that's actually the direction like Sandbox is going. 00:45:23.400 |
It's like turning into more like a platform for platforms 00:45:25.680 |
is kind of what I've been thinking about it as. 00:45:26.920 |
- Oh boy, platform, that's the old Kubernetes line. 00:45:31.880 |
like having that ability to like programmatically, 00:45:36.280 |
you know, create containers and execute them, 00:45:40.520 |
And I think it opens up a lot of interesting capabilities 00:45:43.360 |
that are sort of separate from the like core Python SDK 00:45:54.840 |
And people are starting to build like kind of crazy things. 00:45:57.720 |
And then, you know, we double down on some of those things 00:46:13.120 |
- Can you be more specific about what you're double down on 00:46:17.440 |
- Yeah, I mean, we're working with like some companies 00:46:20.080 |
that, I mean, without getting into specifics, 00:46:24.440 |
like that need the ability to take their user's code 00:46:35.360 |
like they just want to use modal as a backend, right? 00:46:37.520 |
Like they may already provide like Kubernetes as a backend, 00:46:41.360 |
and now they want to add modal as a backend, right? 00:46:44.960 |
to programmatically define jobs on behalf of their users 00:46:49.240 |
And so I don't know, that's kind of abstract, 00:47:05.240 |
called it sort of functions as a service as a service. 00:47:21.400 |
compute provider like yourself should provide. 00:47:30.240 |
They'd rather build on top of you than compete with you. 00:47:32.920 |
Like the more interesting thing for me is like, 00:47:50.760 |
And I think there's some really interesting people, 00:48:08.000 |
- Obviously there's like safety considerations. 00:48:10.080 |
Maybe you have a API to like restrict access to the web. 00:48:27.920 |
the network restrictions, I think, make a lot of sense. 00:48:33.040 |
like I think there's a lot of interesting use cases 00:48:37.080 |
can like decide I want to install these packages 00:48:40.120 |
And like, obviously, for a lot of those use cases, 00:48:50.320 |
is like it lets you do that in a relatively safe way. 00:48:54.280 |
Do you have any thoughts on the inference wars? 00:48:59.200 |
So a lot of providers are just rushing to the bottom 00:49:08.720 |
There's like the physics of it just don't work out 00:49:20.680 |
versus using lower prices as kind of like a wedge 00:49:32.400 |
- I mean, we focus more on like custom models 00:49:35.200 |
And I think in that space, there's like less competition. 00:49:38.200 |
And I think we can, you know, have a pricing markup, right? 00:49:41.920 |
Like, you know, people will always compare our prices 00:49:44.280 |
to like, you know, the GPU power they can get elsewhere. 00:49:56.120 |
Like the switching costs of LLMs is zero, right? 00:49:58.600 |
Like if all you're doing is like straight up, 00:50:08.560 |
and, you know, some other provider comes along 00:50:12.200 |
So I don't know, to me, that reminds me a lot of like, 00:50:15.040 |
all this like 15 minute delivery wars or like, you know, 00:50:18.040 |
like Uber versus Lyft or Jaffa Kings versus Fanta, 00:50:20.480 |
or like, maybe that's not, but like, you know, 00:50:29.040 |
like, I think I thought a lot about like fiber optics boom 00:50:32.440 |
of like 98, 99, like the other day, or like, you know, 00:50:35.640 |
and also like the over-investment in GPU today. 00:50:50.400 |
Like, someone's like reaping the value of this. 00:50:54.160 |
And that's, I think, an amazing flip side is that, 00:50:56.720 |
you know, we should be very grateful, you know, 00:50:58.480 |
the fact that like VCs wanna subsidize these things, 00:51:01.480 |
which is, you know, like you go back to the fiber optics, 00:51:03.640 |
like there's the extreme like over-investment 00:51:10.960 |
But consumers, you know, got tremendous benefits 00:51:14.800 |
of all the fiber optics cables that were led, 00:51:18.360 |
you know, throughout the country in the decades after. 00:51:20.880 |
I feel something similar about like GPUs today, 00:51:23.680 |
and also like specifically looking more narrowly 00:51:25.560 |
at like LLM in France market, like that's great. 00:51:27.920 |
Like, you know, I'm very happy that, you know, 00:51:32.680 |
Modal is like not necessarily like participating 00:51:36.120 |
Like, I think, you know, it's gonna shake out 00:51:39.040 |
and then they're gonna raise prices or whatever. 00:51:45.560 |
like we're not hyper focused on like serving, 00:51:49.320 |
like here's an end point to an open source model. 00:51:51.920 |
We think the value in Modal comes from all these, 00:52:03.640 |
like outside of LLMs, like we focus a lot more 00:52:08.480 |
'cause that's where there's a lot more proprietary models, 00:52:16.040 |
there's a lot of value in software differentiation. 00:52:20.320 |
developer productivity, that's where I think, 00:52:22.360 |
you know, you can have more of a competitive mode. 00:52:41.000 |
and then, you know, the VC money dries up in a year 00:52:49.400 |
but you also can not really kill your margins 00:52:53.680 |
So I don't know what that's gonna look like, but. 00:52:58.800 |
'cause like GPU prices have to drop eventually, right? 00:53:04.560 |
I still think like prices may not go up that much, 00:53:17.120 |
Like some companies are gonna have to make money 00:53:19.520 |
Otherwise, like they're not gonna provide the service, 00:53:25.160 |
and one or two or three providers make money. 00:53:27.760 |
- Yeah, what else is maybe underrated, immoral, 00:53:32.400 |
something that people don't talk enough about 00:53:35.120 |
or yeah, that we didn't cover in the discussion? 00:53:45.360 |
Working on a lot of like, trying to figure out like, 00:53:49.320 |
like kind of thinking more about the roadmap, 00:53:50.720 |
but like one of the things I'm very excited about 00:53:56.680 |
And so like we're building some like crude stuff right now 00:53:59.680 |
where like you can like create like direct TCP tunnels 00:54:01.800 |
to containers and that lets you like pipe data. 00:54:03.840 |
And like, you know, we haven't really explored this 00:54:06.920 |
but like there's a lot of interesting applications. 00:54:08.560 |
Like you can actually do like kind of real-time video stuff 00:54:11.160 |
in modal now because you can like create a tunnel to, 00:54:15.240 |
You can create a raw TCP socket to a container, 00:54:17.560 |
feed it video and then like, you know, get the video back. 00:54:20.440 |
And I think like, it's still like a little bit like, 00:54:23.240 |
you know, not fully ergonomically like figured out, 00:54:25.560 |
but I think there's a lot of like super cool stuff. 00:54:34.880 |
I think also like, you know, working with large datasets 00:54:37.000 |
or kind of taking the ability to map and fan out 00:54:44.280 |
Like I think there's a lot of like really cool stuff 00:54:46.360 |
you can do, but this is like, maybe like, you know, 00:54:50.320 |
- Yeah, we can just broaden out from modal a little bit, 00:54:55.020 |
but you still have a lot of, you have a lot of great tweets. 00:54:57.060 |
So it's very easy to just kind of go through them. 00:55:10.880 |
but like, I think they're great value for money. 00:55:22.360 |
Like compared, you know, I mean, we love AWS and AGSP too. 00:55:29.400 |
Like, you know, if you told me like three years ago 00:55:41.920 |
- Yeah, great, great machines, good prices, you know. 00:55:50.280 |
- In Europe, people often talk about Hetzner. 00:55:55.840 |
like we've focused on the main clouds, right? 00:56:01.240 |
I think, I mean, there's definitely a long tail of like, 00:56:09.720 |
And like over time, I think we'll look at those too. 00:56:11.720 |
Like, you know, wherever we can get the right, 00:56:19.720 |
Like, I wouldn't want to try to build like a cloud provider. 00:56:22.520 |
You know, it's just, you just have to be like 00:56:24.360 |
incredibly focused on like, you know, efficiency 00:56:30.680 |
- Yeah, and you can ramp up on any of these clouds 00:56:39.760 |
what Modal does is like programmatic, you know, 00:56:44.400 |
So that's like, what's nice about the clouds is, 00:56:47.440 |
you know, they're relatively like immature APIs 00:56:54.360 |
That makes it easier to work with the big clouds. 00:56:58.360 |
like I think, you know, I also expect the smaller clouds 00:57:00.360 |
to like embrace those things in the long run. 00:57:02.440 |
But also think, you know, we can also probably integrate 00:57:05.360 |
with some of the clouds, like even without that. 00:57:11.640 |
Just like script something that launches instances, 00:57:15.000 |
- Yeah, I think a lot of people are always curious 00:57:18.560 |
about whether or not you will buy your own hardware someday. 00:57:21.880 |
I think you're pretty firm in that it's not your interest. 00:57:25.680 |
But like your story and your growth does remind me 00:57:29.840 |
a little bit of Cloudflare, which obviously, you know, 00:57:39.560 |
- They bootstrapped a lot with like agreements 00:58:09.800 |
like do you really want to tie up that much money 00:58:18.160 |
I favor a more capital efficient way of like, 00:58:23.920 |
we want the sort of margin structure to be sort of like 00:58:27.080 |
100% correlated revenue in cogs in the sense that like, 00:58:34.440 |
we immediately incur a cost of like whatever, 00:58:43.360 |
kind of a nice way, you can scale very efficiently. 00:58:51.920 |
Like over time, we've actually started adding 00:58:54.120 |
pretty significant amount of reservations too. 00:58:56.240 |
So I don't know, like reservation is always like 00:59:00.480 |
Like, I don't know, like, do we really want to be, 00:59:02.400 |
you know, thinking about switches and cooling 00:59:08.800 |
- Yeah, like, is that the thing I want to think about? 00:59:10.960 |
Like, I don't know, like I like to make developers happy, 00:59:14.680 |
like, but I don't think it's gonna happen anytime soon. 00:59:22.200 |
but it's interesting to have the devil's advocate 00:59:27.000 |
The main thing you have to do is be confident 00:59:35.040 |
- And so the moment you have a CTO that tells you, 00:59:38.920 |
"No, I think I can make these things last seven years," 00:59:42.600 |
- Yeah, yeah, but you know, are you deluding yourself then? 00:59:50.480 |
Like they had all this like accounting scandal 00:59:55.280 |
like was, they like started assuming their garbage trucks 01:00:03.720 |
You know, the stock went to like, you know, up like, 01:00:09.280 |
and like you can't really depreciate them over 10 years. 01:00:13.360 |
they had to restate all the earnings and leaves. 01:00:23.800 |
which is the International Olympiad in Informatics. 01:00:31.120 |
and like going to change competitive programming? 01:00:33.800 |
Like, do you think people still love the craft? 01:00:49.480 |
- I mean, maybe, but like, I don't know, like, 01:01:06.440 |
there's like probably thousand times more developers today 01:01:11.120 |
and every year there's more and more developers. 01:01:32.000 |
and also being while at the same time being more productive. 01:01:34.560 |
Like, I never understood this, like, you know, 01:01:49.480 |
And that's, I think, the story of software in the world 01:01:52.920 |
So, I mean, I don't know how this like relates 01:02:19.240 |
And, you know, it never loses its grip on them. 01:02:30.280 |
kind of battle off with like other smart kids 01:02:52.720 |
Like, but, although we actually had an intern 01:02:56.000 |
Gold Medal is like the top 20, 30 people roughly. 01:03:12.200 |
- Yeah, I mean, I think humans are the root cause 01:03:17.160 |
Like, you know, bad code is because it's bad human 01:03:21.400 |
So like, I think, you know, like talent density 01:03:23.640 |
is very important and like keeping the bar high 01:03:33.520 |
but we actually end up having a lot of like hard, 01:03:37.960 |
Like, you know, I talked about like the cloud, 01:03:48.280 |
like constantly optimizing how we allocate cloud resources. 01:03:51.440 |
There's a lot of like interesting and like complex, 01:03:55.280 |
how do you do all the bin packing of all the containers? 01:04:05.720 |
- Yeah, and they don't necessarily have to know 01:04:08.880 |
They just need to be very good at algorithms. 01:04:11.920 |
- No, but my feeling is like people who are like 01:04:16.000 |
they can also pick up like other stuff like elsewhere. 01:04:22.160 |
- Yeah, oh yeah, I'm just, I'm interested in that 01:04:24.640 |
just because, you know, like there's competitive 01:04:31.000 |
like competitive speed memorization or whatever. 01:04:34.520 |
And like, you don't really see those transfer. 01:04:39.920 |
that competitive programming is so specialized, 01:04:57.520 |
but just because like it sort of filters for the, 01:04:59.120 |
you know, people who are like willing to go very deep 01:05:06.600 |
a lot of good developers are like talented musicians. 01:05:14.880 |
Like you have to like just hyper focus on something 01:05:21.760 |
- Sweden also had a lot of very good Counter-Strike players. 01:05:24.560 |
I don't know, why does Sweden have fiber optics 01:05:37.760 |
I remember getting online and people in the Nordics 01:05:53.000 |
a bunch of tax rebates for like buying computers. 01:05:55.080 |
And I think there was similar like investments 01:05:58.920 |
I mean, like, and I think like I always think about, 01:06:01.320 |
I still can't use my phone in the subway in New York. 01:06:04.680 |
and that was something I could use in Sweden in '95. 01:06:07.640 |
You know, we're talking like 40 years almost, right? 01:06:11.800 |
And I don't know, like I think certain infrastructure, 01:06:14.000 |
you know, Sweden was just better at, I don't know. 01:06:33.520 |
I mean, like we have, like me and my wife has a car, 01:06:39.840 |
- I mean, it's her name 'cause I don't have a driver's license. 01:06:49.840 |
like the last thing I had on this list was, you know, 01:06:53.280 |
your advice to people thinking about running some 01:06:55.320 |
sort of run code in the cloud startup is only do it 01:06:58.120 |
if you're genuinely excited about spending five years 01:07:02.680 |
So basically like, it sounds like you're summing up 01:07:07.400 |
I mean, like, that's like, like one thing I struggle with, 01:07:10.400 |
like I talked to a lot of people starting companies 01:07:12.840 |
in the data space or like AI space or whatever, 01:07:15.440 |
and they could have sort of come at it at like, 01:07:17.440 |
as, you know, from like an application developer 01:07:19.320 |
point of view, and they're like, I'm gonna make this better. 01:07:21.320 |
But like, guess how you have to make it better? 01:07:25.360 |
And so, and so one of my frustrations has been like, 01:07:34.960 |
And I think, you know, every startup is a wrapper 01:07:37.120 |
to some extent, but like, you need to be like a fat wrapper. 01:07:39.200 |
You need to like go deep and like build some stuff. 01:07:41.240 |
And that's like, you know, if you build a tech company, 01:07:43.280 |
you're gonna wanna have, you're gonna have to spend, 01:07:49.320 |
building the infrastructure you need in order to like, 01:07:51.480 |
make your product truly stand out and be competitive. 01:07:54.360 |
And so, you know, I think that goes for everything. 01:07:56.280 |
I mean, like you're starting a whatever, you know, 01:07:59.000 |
online retailer of, I don't know, bathroom sinks, 01:08:05.440 |
you have to be willing to spend 10 years of your life 01:08:07.840 |
thinking about, you know, whatever, bathroom sinks. 01:08:17.820 |
It's pretty exciting to watch, and it's just the beginning. 01:08:22.080 |
And everyone should sign up and try out modal, modal.com.