back to index

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal


Chapters

0:0 Introductions
2:8 Erik’s OSS work at Spotify: Annoy and Luigi
6:51 Starting Modal
8:36 Vision for a “postmodern data stack”
11:56 Solving container cold start problems
14:13 Designing Modal’s Python SDK
16:50 Self-Revisioning Runtime
21:50 Truly Serverless Infrastructure
23:11 Beyond model inference
24:30 Tricks to maximize GPU utilization
29:28 Differences in AI and data science workloads
31:10 Modal vs Replicate vs Modular and lessons from Heroku’s “graduation problem”
38:3 Creating Erik’s clone “ErikBot”
42:9 Enabling massive parallelism across thousands of GPUs
44:23 The Modal Sandbox for agents
48:56 Thoughts on the AI Inference War
54:53 Erik’s best tweets
57:52 Why buying hardware is a waste of money
60:20 Erik’s competitive programming backgrounds
65:23 Why does Sweden have the best Counter Strike players?
66:18 Never owning a car or TV
66:47 Advice for infrastructure startups

Whisper Transcript | Transcript Only Page

00:00:00.000 | - Hey everyone, welcome to the Latent Space Podcast.
00:00:02.840 | This is Alessio, partner and CTO
00:00:04.640 | of Residence and Decibel Partners.
00:00:06.280 | And I'm joined by my co-host, Swiggs,
00:00:08.120 | founder of Small AI.
00:00:09.480 | - Hey, and today we have in the studio
00:00:11.440 | Eric Bernhardsson for Modo, welcome.
00:00:13.440 | - Hi, it's awesome being here.
00:00:15.520 | - Yeah, awesome seeing you in person.
00:00:17.280 | I've seen you online for a number of years
00:00:20.320 | as you were building on Modo.
00:00:22.660 | And I think you were just making a San Francisco trip
00:00:25.860 | just to see people here, right?
00:00:27.600 | Like I've been to like two Modo events
00:00:30.320 | in San Francisco here.
00:00:31.320 | - Yeah, that's right, we're based in New York.
00:00:32.720 | So I figured sometimes I have to come out
00:00:34.920 | to, you know, capital of AI and make a presence.
00:00:38.040 | - What do you think is the pros and cons
00:00:40.240 | of building in New York?
00:00:41.760 | - I mean, I never built anything elsewhere.
00:00:43.640 | Like I lived in New York last 12 years.
00:00:45.720 | I love the city.
00:00:48.760 | Obviously there's a lot more stuff going on here
00:00:50.240 | and there's a lot more customers
00:00:51.120 | and that's why I'm out here.
00:00:52.440 | I do feel like for me where I am in life,
00:00:54.280 | like I'm a very boring person.
00:00:55.800 | Like I kind of work hard and then I go home
00:00:57.680 | and hang out with my kids.
00:00:58.720 | Like I don't have time to go to like events
00:01:01.520 | and meetups and stuff anyway.
00:01:03.240 | So in that sense, like New York is kind of nice.
00:01:04.840 | Like I walk to work every morning,
00:01:06.480 | it's like five minutes away from my apartment.
00:01:07.880 | It's like very time efficient in that sense.
00:01:09.800 | - Yeah, yeah, it's also a good life.
00:01:12.420 | So I'll do a brief bio and then we'll talk about
00:01:14.720 | anything else that people should know about you.
00:01:17.140 | So you actually, I was surprised to find this out.
00:01:20.680 | You're from Sweden.
00:01:22.720 | You went to college in KTH.
00:01:24.760 | - Yep, yep, Stockholm.
00:01:26.000 | - And your master's was in implementing
00:01:27.760 | a scalable music recommender system.
00:01:29.560 | - Yeah. - I had no idea.
00:01:30.680 | - Yeah, yeah, yeah, yeah.
00:01:31.560 | So I actually studied physics, but I grew up coding
00:01:33.680 | and I did a lot of programming competition.
00:01:35.280 | And then like as I was like thinking about graduating,
00:01:39.480 | I got in touch with an obscure music streaming startup
00:01:42.480 | called Spotify, which was then like 30 people.
00:01:45.660 | And for some reason I convinced them like,
00:01:47.000 | why don't I just come and like write a master's thesis
00:01:48.920 | with you and like I'll do some cool collaborative filtering.
00:01:50.720 | Despite not knowing anything about
00:01:51.880 | collaborative filtering really.
00:01:52.720 | I sort of, but no one knew anything back then.
00:01:54.720 | So I spent six months at Spotify basically
00:01:57.960 | building a prototype of a music recommendation system
00:02:00.720 | and then turned that into a master's thesis.
00:02:02.640 | - Yeah.
00:02:03.480 | - And then later when I graduated,
00:02:04.360 | I joined Spotify full-time.
00:02:05.920 | - Yeah, yeah.
00:02:06.920 | And then, so that was the start of your data career.
00:02:10.320 | You also wrote a couple of popular sort of
00:02:12.240 | open source tooling while you were there.
00:02:16.040 | And then you joined, is that correct or?
00:02:18.560 | - No, that's right.
00:02:19.400 | I mean, I was at Spotify for seven years.
00:02:20.640 | It was a long stint and Spotify was a wild place early on.
00:02:23.680 | And I mean, the data space is also a wild place.
00:02:25.760 | I mean, it was like Hadoop cluster
00:02:27.320 | in the like foosball room on the floor.
00:02:30.320 | And so like, it was a lot of crude,
00:02:33.800 | like very basic infrastructure
00:02:35.300 | and I didn't know anything about it.
00:02:36.920 | And like I was hired to kind of figure out data stuff
00:02:40.240 | and I started hacking on a recommendation system
00:02:43.560 | and then got sidetracked in a bunch of other stuff.
00:02:46.240 | I fixed a bunch of reporting things and set up A/B testing.
00:02:49.240 | I started doing like business analytics
00:02:50.520 | and later got back to music recommendation system.
00:02:52.400 | And a lot of the infrastructure didn't really exist.
00:02:54.440 | Like there was like Hadoop back then, which is kind of bad
00:02:57.040 | and I don't miss it, but spent a lot of time with that.
00:03:00.600 | As a part of that, I ended up building a workflow engine
00:03:03.160 | called Luigi, which is like briefly,
00:03:04.720 | like somewhat like widely ended up being used
00:03:07.520 | by a bunch of companies.
00:03:08.600 | Sort of like, you know, kind of like Airflow,
00:03:10.080 | but like before Airflow.
00:03:11.280 | I think it did some things better, some things worse.
00:03:14.680 | I also built a vector database called Annoy,
00:03:16.240 | which is like for a while,
00:03:17.240 | it was actually quite widely used in 2012.
00:03:20.360 | So it was like way before like all this
00:03:22.080 | like vector database stuff ended up happening.
00:03:24.920 | And funny enough, I was actually obsessed
00:03:26.560 | with like vectors back then.
00:03:27.640 | Like I was like, this is gonna be huge.
00:03:28.880 | Like just give it like a few years.
00:03:30.680 | I didn't know it was gonna take like nine years
00:03:32.240 | and then there's gonna suddenly be like 20 startups
00:03:34.080 | doing vector databases in one year.
00:03:36.640 | So it did happen in that sense, I was right.
00:03:39.080 | I'm glad I didn't start a startup
00:03:40.200 | in the vector database space.
00:03:41.760 | I would have started way too early.
00:03:44.040 | But yeah, that was, yeah, it was a fun seven years
00:03:46.880 | at Spotify.
00:03:47.720 | It was a great culture, a great company.
00:03:49.080 | - Yeah, just to take a quick tangent
00:03:50.800 | on this vector database thing,
00:03:51.920 | 'cause we probably won't revisit it.
00:03:53.320 | But like, has anything architecturally changed
00:03:55.440 | in the last nine years?
00:03:57.000 | Or. (laughs)
00:03:59.920 | - I mean, sort of.
00:04:00.760 | Like I'm actually not following it like super closely.
00:04:03.320 | I think, you know, they're like,
00:04:05.200 | some of the best algorithms are still the same
00:04:06.640 | as like hierarchical, navigable, small world, whatever.
00:04:09.600 | Exactly, yeah, HNSW.
00:04:11.760 | I think now there's like product quantization.
00:04:13.720 | There's like some other stuff
00:04:14.560 | that I haven't really followed super closely.
00:04:17.440 | I mean, obviously like back then it was like,
00:04:18.800 | you know, and always like very simple.
00:04:20.080 | It's like a C++ library with Python bindings
00:04:22.560 | and you could mmap big files into memory
00:04:25.240 | and like they had some lookups.
00:04:26.320 | And I used like this kind of recursive
00:04:29.920 | like hyperspace splitting strategy,
00:04:32.080 | which is not that good,
00:04:33.400 | but it sort of was good enough at that time.
00:04:36.200 | But I think a lot of like HNSW is still like
00:04:38.840 | what people generally use.
00:04:40.880 | Now, of course, like databases are much better
00:04:43.520 | in the sense like to support like inserts and updates
00:04:45.600 | and stuff like that.
00:04:46.440 | I know I never supported that.
00:04:48.000 | Yeah, it's sort of exciting to finally see
00:04:49.680 | like vector databases becoming a thing.
00:04:51.560 | - Yeah, yeah.
00:04:52.960 | And then maybe one takeaway
00:04:54.120 | on most interesting lesson from Daniel Ek.
00:04:56.960 | - I mean, I think Daniel Ek, you know,
00:05:00.400 | he started Spotify very young.
00:05:01.760 | Like he was like 25, something like that.
00:05:04.560 | I don't know if it was like a good lesson,
00:05:05.400 | but like he, in a way, like,
00:05:07.080 | I think he was very good leader.
00:05:08.800 | Like there was never anything like,
00:05:09.840 | and no scandals or like no,
00:05:11.800 | he wasn't very eccentric at all.
00:05:13.000 | It was just kind of like very like level-headed,
00:05:15.400 | like just like ran the company very well.
00:05:17.080 | Like never made any like obvious mistakes or,
00:05:19.240 | I think it was like a few bets
00:05:20.240 | that maybe like in hindsight were like a little,
00:05:22.240 | you know, like took us, you know,
00:05:23.800 | too far in one direction or another.
00:05:25.240 | But overall, I mean, I think he was a great CEO,
00:05:27.480 | like definitely, you know, up there,
00:05:29.200 | like generational CEO,
00:05:30.640 | at least for like Swedish startups.
00:05:32.120 | - Yeah, yeah, for sure.
00:05:33.800 | Okay, we should probably move to,
00:05:35.760 | make our way to his model.
00:05:37.440 | So then you spent six years as CTO of Better.
00:05:40.720 | - Yeah.
00:05:41.560 | - You were an early engineer
00:05:42.380 | and then you scaled up to like 300 engineers.
00:05:44.720 | - I joined as a CTO when there was like no tech team
00:05:47.640 | and yeah, that was a wild chapter in my life.
00:05:49.640 | Like the company did very well for a while
00:05:52.400 | and then like during the pandemic.
00:05:53.760 | - Less well.
00:05:54.600 | - Yeah, it was kind of a weird story,
00:05:55.440 | but yeah, they kind of collapsed and, you know.
00:05:57.000 | - Laid off people poorly.
00:05:58.880 | - Yeah, yeah, it was like a bunch of stories.
00:06:00.560 | Yeah, I mean, the company like grew from like 10 people
00:06:03.120 | when I joined at 10,000, now it's back to 1,000.
00:06:05.440 | But yeah, they actually went public a few months ago,
00:06:06.960 | kind of crazy.
00:06:07.800 | They're still around, like, you know,
00:06:08.620 | they're still, you know, doing stuff.
00:06:10.300 | So yeah, very kind of interesting six years of my life.
00:06:14.280 | For non-technical reasons, mostly like.
00:06:16.320 | But yeah, like I managed like 300, 400.
00:06:17.520 | - Management, scaling.
00:06:18.640 | - Like learning a lot of that, like recruiting.
00:06:20.040 | I spent all my time recruiting and stuff like that.
00:06:21.720 | And so managing at scale, it's like nice,
00:06:24.560 | like now in a way, like when I'm building my own startup,
00:06:26.480 | like that's actually something
00:06:27.320 | I like don't feel nervous about at all.
00:06:29.080 | Like I've managed that scale,
00:06:30.040 | like I feel like I can do it again.
00:06:32.200 | It's like very different things
00:06:33.120 | that I'm nervous about as a startup founder.
00:06:34.960 | But yeah, I started Modal three years ago
00:06:36.480 | after sort of, after leaving Better.
00:06:38.160 | I took a little bit of a time off during the pandemic.
00:06:41.680 | But yeah, pretty quickly I was like,
00:06:43.160 | I gotta build something.
00:06:44.040 | I just wanna, you know.
00:06:45.160 | And then yeah, Modal took form in my head, took shape.
00:06:48.880 | - And as far as I understand,
00:06:50.120 | and maybe we can sort of trade off questions.
00:06:51.920 | So the quick history is, started Modal in 2021,
00:06:54.600 | got your seed with Sarah from Amplify 2022.
00:06:58.960 | Last year you just announced your Series A with Redpoint.
00:07:01.240 | - That's right.
00:07:02.080 | - And that brings us up to mostly today.
00:07:04.000 | - Yeah.
00:07:05.120 | - And so like most people I think were expecting you
00:07:09.280 | to build for the data space.
00:07:11.360 | - But it is the data space.
00:07:12.480 | - It is the data space.
00:07:14.160 | When I think of data space,
00:07:15.000 | so I come from like, you know, Snowflake, BigQuery,
00:07:18.200 | you know, Fivetrain, Nearby, that kind of stuff.
00:07:20.160 | - Yeah.
00:07:21.000 | - And so, you know, what Modal became
00:07:24.560 | is more general purpose than that.
00:07:26.240 | - Yeah, yeah.
00:07:27.760 | I don't know, it was like fun.
00:07:28.680 | I actually ran into like Ido Libert
00:07:30.080 | as CEO of Pinecone like a few weeks ago.
00:07:31.680 | And he was like, I was so afraid
00:07:33.200 | you were building a vector database.
00:07:34.840 | (laughing)
00:07:37.120 | No, yeah, it's like, I started Modal because,
00:07:41.480 | you know, like in a way, like I work with data
00:07:43.480 | like throughout most of my career,
00:07:44.920 | like every different part of the stack, right?
00:07:46.880 | Like I thought everything was like business analytics
00:07:49.360 | to like deep learning, you know,
00:07:51.280 | like building, you know, training neural networks
00:07:53.440 | to scale, like everything in between, right?
00:07:55.520 | And so one of the thoughts like,
00:07:57.120 | and one of the observations I had when I started Modal
00:07:59.360 | or like why I started was like,
00:08:00.520 | I just wanted to make, build better tools for data teams.
00:08:03.280 | And like very, like that's sort of an abstract thing,
00:08:05.440 | but like I find that the data stack is, you know,
00:08:08.440 | full of like point solutions that don't integrate well
00:08:11.120 | and still, when you look at like data teams today,
00:08:14.000 | you know, like every startup ends up building
00:08:15.800 | their own internal Kubernetes wrapper or whatever.
00:08:18.480 | And, you know, all the different data engineers
00:08:20.360 | and machine learning engineers end up kind of struggling
00:08:22.000 | with the same things.
00:08:23.520 | So I started thinking about like how,
00:08:25.560 | how do I build a new data stack,
00:08:28.160 | which is kind of a megalomaniac project.
00:08:29.800 | Like, 'cause you kind of want to like throw out everything
00:08:32.080 | and start over.
00:08:32.920 | - It's almost a modern data stack.
00:08:34.080 | (laughing)
00:08:34.920 | - Yeah, like a postmodern data stack.
00:08:37.480 | And so I started thinking about that
00:08:39.640 | and a lot of it came from like,
00:08:41.400 | like more focused on like the human side of like,
00:08:43.000 | how do I make data things more productive?
00:08:44.400 | And like, what is the technology tools that they need?
00:08:46.600 | And like, you know, drew out a lot of charts of like,
00:08:49.800 | how the data stack looks, you know,
00:08:51.080 | what are the different components.
00:08:52.360 | And it shows actually very interesting,
00:08:53.520 | like workflow scheduling,
00:08:54.480 | 'cause it kind of sits in like a nice sort of, you know,
00:08:56.680 | it's like a hub in the graph of like data products.
00:09:00.440 | But it was kind of hard to like kind of do that in a vacuum
00:09:03.520 | and also to monetize it to some extent.
00:09:05.320 | And I got very interested in like the layers below
00:09:08.480 | at some point and like, at the end of the day,
00:09:11.120 | like most people have code to have to run somewhere.
00:09:14.080 | And I started thinking about like,
00:09:15.760 | okay, well, how do you make that nice?
00:09:17.960 | Like, how do you make that?
00:09:18.960 | And in particular, like the thing I always like thought
00:09:20.600 | about like developer productivity is like,
00:09:21.960 | I think the best way to measure developer productivity
00:09:24.520 | is like in terms of the feedback loops.
00:09:25.680 | Like how quickly when you iterate,
00:09:28.960 | like when you write code,
00:09:30.480 | like how quickly can you get feedback?
00:09:31.840 | And at the innermost loop, it's like running some,
00:09:33.880 | like writing code and then running it.
00:09:35.680 | And like, as soon as you start working with the cloud,
00:09:37.680 | like it's like, takes minutes suddenly
00:09:39.520 | 'cause you have to build a fucking Docker container
00:09:41.000 | and push it to the cloud and like run it, you know?
00:09:42.800 | So that was like the initial focus for me.
00:09:44.640 | It was like, I just wanna solve that problem.
00:09:46.280 | Like I wanna, you know, build something
00:09:48.720 | unless you're on things in the cloud
00:09:49.800 | and like retain this sort of, you know,
00:09:51.240 | the joy of productivity
00:09:53.040 | as when you're running things locally.
00:09:54.680 | And in particular, I was quite focused on data teams
00:09:57.200 | 'cause I think they had a couple of unique needs
00:09:59.440 | that wasn't well served by the infrastructure at that time
00:10:01.960 | or like still isn't.
00:10:03.400 | Like in particular, like Kubernetes,
00:10:04.560 | I feel like it's like kind of worked okay
00:10:06.160 | for backend teams, but not so well for data teams.
00:10:09.480 | And very quickly, I got sucked into like a very deep,
00:10:11.560 | like rabbit hole of like--
00:10:12.880 | - Not well for data teams because of burstiness.
00:10:15.040 | - Burstiness is one thing, yeah, for sure.
00:10:16.760 | So like burstiness is like one thing, right?
00:10:18.280 | Like when you, like, you know,
00:10:19.920 | like you often have this like fan out,
00:10:21.280 | you wanna like apply some function
00:10:22.600 | over very large data sets.
00:10:23.920 | Another thing tends to be like hardware requirements.
00:10:25.720 | Like you need like GPUs.
00:10:26.760 | And like, I've seen this in many companies.
00:10:28.320 | Like you go, you know, data engineers go to like,
00:10:31.080 | or data scientists go to a platform team
00:10:33.200 | and they're like, "Can we add GPUs to the Kubernetes?"
00:10:35.040 | They're like, "No, like that's, you know, complex."
00:10:37.240 | And we're not gonna...
00:10:38.080 | Or like, so like just getting GPU access.
00:10:39.720 | And then like, I mean, I also like data code,
00:10:41.880 | like frankly, or like machine learning code,
00:10:43.840 | like tends to be like super annoying
00:10:46.080 | in terms of like environments.
00:10:47.080 | Like you end up having like a lot of like custom,
00:10:49.160 | like containers and like environment conflicts.
00:10:51.680 | And like, so it ends up having a lot of like annoying,
00:10:54.000 | like it's very hard to set up like a unified container
00:10:58.240 | that like can serve like a data scientist
00:11:00.600 | because like there's always like packages that break.
00:11:02.440 | And so I think there's a lot of different reasons,
00:11:05.000 | why, you know, the technology wasn't well-suited for backend.
00:11:09.840 | And I think the attitude at that time was often like,
00:11:12.440 | you know, like you had friction between the data team
00:11:14.760 | and the platform team.
00:11:15.600 | Like, well, it works for the backend stuff.
00:11:17.360 | Like, why can't you use it?
00:11:18.240 | Like, you know, why don't you just like, you know,
00:11:19.960 | make it work?
00:11:20.800 | But like, I actually felt like data teams at that point,
00:11:23.160 | you know, or at this point now,
00:11:24.440 | like there's so much, so many people working with data
00:11:27.080 | and like they, to some extent,
00:11:28.120 | like deserve their own tools and their own tool chains.
00:11:30.160 | And like optimizing for that
00:11:31.600 | is not something people have done.
00:11:32.800 | So that's sort of like very abstract philosophical reason
00:11:35.760 | why I started Modal.
00:11:36.600 | And then I got sucked into this like rabbit hole
00:11:38.320 | of like container cold start and, you know,
00:11:40.640 | like whatever Linux, page cache, you know,
00:11:43.280 | file system optimizations.
00:11:44.840 | - Yeah, tell people, I think the first time I met you,
00:11:47.280 | I think you told me some numbers, but I don't remember.
00:11:49.360 | Like, what are the main achievements
00:11:50.560 | that you were unhappy with the status quo
00:11:52.320 | and then you built your own container stack?
00:11:54.360 | - Yeah, I mean, like in particular, it was like,
00:11:55.920 | like how do you, like in order to have that loop, right?
00:11:58.600 | Like you wanna be able to start,
00:12:01.080 | like take code on your laptop, whatever,
00:12:03.120 | and like run in the cloud very quickly
00:12:04.800 | and like running in custom containers
00:12:06.160 | and maybe like spin up like a hundred containers,
00:12:08.240 | a thousand, you know, things like that.
00:12:09.520 | And so container cold start was the initial,
00:12:11.720 | like from like a developer productivity point of view,
00:12:13.440 | it was like really what I was focusing on is
00:12:16.000 | I wanna take code, I wanna stick it in container,
00:12:17.600 | I wanna execute in the cloud and like, you know,
00:12:19.040 | make it feel like fast.
00:12:20.800 | And when you look at like how Docker works for instance,
00:12:23.440 | like Docker, you have this like fairly convoluted,
00:12:26.720 | like very resource inefficient way they, you know,
00:12:30.560 | you build a container, you upload the whole container
00:12:32.640 | and then you download it and you run it.
00:12:34.640 | And Kubernetes is also like not very fast
00:12:36.360 | at like starting containers.
00:12:37.200 | So like, so I started kind of like, you know,
00:12:39.960 | going a layer deeper, like Docker is actually like,
00:12:41.760 | you know, there's like a couple of different primitives,
00:12:43.120 | but like a lower level primitive is run C,
00:12:44.960 | which is like a container runner.
00:12:46.520 | And I was like, what if I just take the container runner,
00:12:48.840 | like run C and I point it to like my own root file system.
00:12:52.520 | And then I built like my own file system
00:12:54.280 | that like virtual file system
00:12:55.600 | that exposes files over network instead.
00:12:58.840 | And that was like the sort of very crude version of model.
00:13:00.520 | It's like, now I can actually start containers very quickly
00:13:03.760 | because it turns out like when you start a Docker container,
00:13:05.960 | like first of all, like most Docker images
00:13:08.360 | are like several gigabytes.
00:13:09.400 | And like 99% of that is never gonna be consumed.
00:13:12.000 | Like there's a bunch of like, you know,
00:13:13.440 | like time zone information for like Uzbekistan
00:13:15.800 | or whatever, like no one's gonna read it.
00:13:17.560 | And then there's a very high overlap
00:13:19.240 | between the files that are gonna be read.
00:13:20.640 | There's gonna be like LibTorch or whatever,
00:13:22.000 | like it's gonna be read.
00:13:22.840 | So you can also cache it very well.
00:13:24.120 | So that was like the first sort of stuff
00:13:26.520 | we started working on was like,
00:13:27.520 | let's build this like container file system
00:13:30.320 | and, you know, couple with like, you know,
00:13:31.960 | just using Run-C directly.
00:13:33.840 | And that actually enabled us to like get to this point
00:13:36.320 | of like you write code and then you can launch it
00:13:38.320 | in the cloud within like a second or two,
00:13:40.360 | like something like that.
00:13:42.320 | And, you know, there's been many optimizations since then,
00:13:44.960 | but that was sort of starting point.
00:13:47.160 | - Can we talk about the developer experience as well?
00:13:50.480 | I think one of the magic things about Modo
00:13:52.560 | is at the very basic layers,
00:13:54.640 | like a Python function decorator,
00:13:56.560 | it's just like stub and whatnot,
00:13:58.760 | but then you also have a way to define a full container.
00:14:02.000 | What were kind of the design decisions that went into it?
00:14:04.080 | Where did you start?
00:14:05.040 | How easy did you want it to be?
00:14:06.400 | And then maybe how much complexity did you then add on
00:14:09.360 | to make sure that every use case fit?
00:14:11.640 | - Yeah, like, I mean, Modo,
00:14:13.200 | I almost feel like it's like almost like two products
00:14:15.400 | kind of glued together.
00:14:16.240 | I mean, like there's like the low level,
00:14:17.280 | like container runtime, like file system,
00:14:18.920 | all that stuff like in Rust.
00:14:19.920 | And then there's like the Python SDK, right?
00:14:22.240 | Like how do you express applications?
00:14:23.840 | And I think, I mean, Swix,
00:14:25.680 | like I think your blog was like the self-provisioning runtime
00:14:27.760 | was like, to me, always like to sort of like the,
00:14:30.720 | for me, like an eye-opening thing.
00:14:31.920 | It's like, so I didn't think about like, I want to--
00:14:34.040 | - You wrote your post four months before me.
00:14:36.200 | - Yeah?
00:14:37.040 | - The software 2.0, Infra 2.0.
00:14:39.480 | - Yeah, well, I don't know, like convergence of minds.
00:14:41.400 | Like we're thinking, I guess we're like both thinking.
00:14:43.520 | Maybe you put, I think, better words than like,
00:14:46.280 | maybe it's something I was like thinking about
00:14:47.600 | for a long time.
00:14:48.440 | - Yeah, and I can tell you how I was thinking about it
00:14:50.080 | on my end, but I want to hear yours.
00:14:50.920 | - Yeah, yeah, I would love it.
00:14:51.760 | Like, and like, to me, like what I always wanted to build
00:14:54.000 | was like, I don't know, like I don't know if you use
00:14:56.120 | like Pulumi, like Pulumi is like nice, like in the sense,
00:14:58.040 | like it's like Pulumi is like,
00:14:59.280 | you describe infrastructure in code, right?
00:15:01.560 | And to me, that was like so nice.
00:15:03.460 | Like finally, I can like, you know, put a for loop
00:15:05.760 | that creates S3 buckets or whatever.
00:15:07.680 | And I think like Modal sort of goes one step further
00:15:10.040 | in the sense that like, what if you also put the app code
00:15:12.760 | inside the infrastructure code and like glue it all together
00:15:15.200 | and then like you only have one single place
00:15:16.680 | that defines everything and it's all programmable.
00:15:19.120 | You don't have any config files.
00:15:20.480 | Like Modal has like zero config, there's no config.
00:15:23.200 | It's all code.
00:15:24.480 | And so that was like the goal that I wanted,
00:15:26.200 | like part of that.
00:15:27.720 | And then the other part was like,
00:15:29.560 | I often find that so much of like my time was spent
00:15:32.080 | on like the plumbing between containers.
00:15:35.240 | And so my thing was like, well,
00:15:36.840 | if I just build this like Python SDK,
00:15:39.640 | then and make it possible to like bridge
00:15:42.920 | like different containers, just like a function call.
00:15:44.720 | Like, and I can say, oh, this function runs
00:15:46.920 | in this container and this other function runs
00:15:48.640 | in this container and I can just call it
00:15:50.480 | just like a normal function.
00:15:52.000 | Then, you know, I can build this applications
00:15:54.200 | that may span a lot of different environments.
00:15:56.360 | Maybe the fan out start other containers,
00:15:58.560 | but it's all just like inside Python.
00:16:00.160 | You just like have this beautiful kind of nice,
00:16:02.120 | like DSL almost for like, you know,
00:16:04.200 | how to control infrastructure in the cloud.
00:16:06.440 | So that was sort of like how we ended up
00:16:08.200 | with the Python SDK as it is,
00:16:10.320 | which is still evolving all the time.
00:16:11.600 | By the way, we keep changing syntax quite a lot
00:16:13.160 | 'cause I think it's still somewhat exploratory,
00:16:15.880 | but we're starting to converge on something
00:16:18.200 | that feels like reasonably good now.
00:16:20.120 | - Yeah, and along the way, with this expressiveness,
00:16:25.040 | you enabled the ability to, for example,
00:16:27.720 | attach a GPU to a function.
00:16:29.280 | - Totally, yeah.
00:16:30.120 | It's like, you just like say, you know,
00:16:31.160 | on the function decorator, you're like GPU equals,
00:16:33.240 | you know, A100 and then, or like GPU equals,
00:16:35.760 | you know, A10 or T4 or something like that.
00:16:38.280 | And then you get that GPU and like, you know,
00:16:39.680 | you just run the code and it runs.
00:16:40.840 | Like you don't have to, you know, go through hoops
00:16:43.200 | to, you know, start a EC2 instance or whatever.
00:16:46.520 | - Yeah. - So it's all code.
00:16:47.680 | - Yeah, so on my end, the reason I wrote
00:16:50.280 | Self-Provisioning Runtimes was I was working at AWS
00:16:53.680 | and we had AWS CDK, which is kind of like, you know,
00:16:57.040 | the Amazon basics blew me.
00:16:58.160 | - Yeah, totally.
00:16:59.000 | - And then like, but you know, it creates,
00:17:02.960 | it compiles the cloud formation.
00:17:04.560 | - Yeah. - And then on the other side,
00:17:05.600 | you have to like get all the config stuff
00:17:07.400 | and then put it into your application code
00:17:08.880 | and make sure that they line up.
00:17:10.680 | So then you're writing code to define your infrastructure,
00:17:13.720 | then you're writing code to define your application.
00:17:15.520 | And I was just like, this is like obvious
00:17:17.560 | that it's gonna convert, right?
00:17:18.480 | - Yeah, totally.
00:17:19.760 | But isn't there, it might be wrong,
00:17:21.440 | but like, was it like Sam or Chalice or one of those,
00:17:23.880 | like, isn't that like an AWS thing
00:17:25.400 | that where actually they kind of did that?
00:17:27.480 | I feel like there's like one problem.
00:17:28.320 | - Sam, yeah, yeah, yeah, yeah.
00:17:30.080 | Still very clunky.
00:17:32.400 | - Okay.
00:17:33.240 | - It's not as arrogant as modal.
00:17:34.800 | - I love AWS for like the stuff it's built,
00:17:37.880 | you know, like historically in order for me to like,
00:17:39.800 | you know, like what it enables me to build.
00:17:42.480 | But like AWS has always like struggled
00:17:43.960 | with developer experience, like, and that's been.
00:17:46.600 | I mean, they have to not break things.
00:17:49.400 | - Yeah, yeah, and totally.
00:17:50.440 | And they have to, you know, build products
00:17:52.000 | for a very wide range of use cases.
00:17:54.400 | And I think that's hard.
00:17:55.240 | - Yeah, yeah, so it's easier to design for.
00:17:57.440 | Yeah, so anyway, I was pretty convinced
00:17:59.920 | that this would happen.
00:18:00.880 | I wrote that thing.
00:18:01.960 | And then, you know, imagine my surprise
00:18:03.200 | that you guys had it on your landing page at some point.
00:18:05.680 | I think Akshad was just like,
00:18:07.200 | I just throw that in there.
00:18:08.480 | - Did you trademark it?
00:18:09.640 | - No, I didn't.
00:18:10.840 | But I definitely got sent a few pitch decks with me,
00:18:13.000 | with like my post on there.
00:18:14.560 | And it was like really interesting.
00:18:16.320 | This is my first time like kind of putting a name
00:18:18.080 | to a phenomenon.
00:18:18.920 | And I think that's a useful skill
00:18:21.600 | for people to just communicate what they're trying to do.
00:18:23.680 | - Yeah, no, I think it's a beautiful concept, yeah.
00:18:25.920 | - Yeah, yeah.
00:18:26.880 | But I mean, obviously you implemented it.
00:18:28.440 | What became more clear in your explanation today
00:18:32.200 | is that actually you're not that tied to Python.
00:18:34.800 | - No, I mean, I think that all the like lower level stuff
00:18:37.720 | is, you know, just running containers
00:18:39.560 | and like scheduling things
00:18:40.960 | and, you know, serving container data and stuff.
00:18:43.480 | So, I mean, I think Python is a great place.
00:18:46.640 | Like one of the benefits of data teams is obviously like,
00:18:48.360 | they're all like using Python, right?
00:18:50.240 | And so that made it a lot easier.
00:18:51.840 | I think, you know, if we had focused on other workloads,
00:18:54.960 | like, you know, for various things,
00:18:56.080 | like we've been kind of like half thinking about like CI
00:18:59.640 | or like things like that.
00:19:00.480 | But like, in a way that's like harder
00:19:02.080 | 'cause like you also, then you have to be like,
00:19:03.840 | you know, multiple SDKs.
00:19:06.560 | Whereas, you know, focusing on data teams,
00:19:08.040 | you can only, you know,
00:19:09.120 | Python like covers like 95% of all teams.
00:19:11.920 | So that made it a lot easier.
00:19:12.760 | - I mean, like definitely like in the future,
00:19:14.360 | we're gonna have others support,
00:19:15.480 | like supporting other languages.
00:19:16.840 | JavaScript for sure is the obvious next language,
00:19:20.320 | but, you know, who knows?
00:19:21.160 | Like, you know, Rust, Go, R,
00:19:23.640 | like whatever, PHP, Haskell, I don't know.
00:19:26.360 | - Yeah, and, you know, I think for me,
00:19:28.600 | I actually am a person who like kind of liked the idea
00:19:34.520 | of programming language advancements
00:19:36.680 | being improvements in developer experience.
00:19:39.360 | But all I saw out of the academic sort of PLT type people
00:19:43.440 | is just type level improvements.
00:19:45.440 | And I always think like, for me,
00:19:47.400 | like one of the core reasons for self-provisioning runtimes
00:19:49.880 | and then why I like Modo is like,
00:19:51.560 | this is actually a productivity increase.
00:19:53.600 | - Totally.
00:19:54.440 | - Like it's a language level thing.
00:19:55.440 | You know, you managed to stick it
00:19:56.680 | on top of an existing language,
00:19:57.880 | but it is your own language.
00:19:59.240 | - Yeah.
00:20:00.080 | - DSL on top of Python.
00:20:00.920 | - Yeah.
00:20:02.160 | - It's a language level increase
00:20:03.440 | on the order of like automatic memory management.
00:20:06.280 | You know, you could sort of make that analogy
00:20:08.160 | that yeah, you like, maybe you lose some level of control,
00:20:12.240 | but most of the time you're okay
00:20:13.640 | with whatever Modo gives you.
00:20:15.240 | And like, that's fine.
00:20:16.240 | - Yeah, yeah.
00:20:17.400 | I mean, that's how I look at about it too.
00:20:18.840 | Like, I think, you know,
00:20:20.200 | you look at developer productivity
00:20:21.440 | over the last number of decades,
00:20:23.480 | like, you know, it's come in like small increments
00:20:25.520 | of like, you know, dynamic, like dynamic typing,
00:20:28.920 | or like, it's like one thing.
00:20:29.880 | It's not suddenly like for a lot of use cases,
00:20:31.320 | you don't need to care about type systems
00:20:32.680 | or better compiler technology or like, you know,
00:20:35.600 | or new ways to, you know, the cloud or like, you know,
00:20:37.800 | relational databases.
00:20:38.880 | And, you know, I think, you know,
00:20:40.120 | you look at like that, you know, history,
00:20:43.360 | it's a steadily, you know, it's like, you know,
00:20:46.320 | you look at the developers have been getting
00:20:48.080 | like probably 10 X more productive every decade
00:20:50.520 | for the last four decades or something.
00:20:52.760 | That was kind of crazy.
00:20:53.600 | Like on an exponential scale,
00:20:54.640 | we're talking about 10 X or is there a 10,000 X,
00:20:57.400 | like, you know, improvement in developer productivity.
00:20:59.040 | What we can build today, you know,
00:21:00.760 | is arguably like, you know,
00:21:01.840 | fraction of the cost of what it, you know,
00:21:03.200 | took to build it in the eighties.
00:21:04.560 | Maybe it wasn't even possible in the eighties.
00:21:05.920 | So to me, like, that's like so fascinating.
00:21:08.080 | I think it's going to keep going for the next few decades.
00:21:10.320 | - Yeah, yeah.
00:21:11.160 | - Another big thing in the infra 2.0 wishlist
00:21:14.840 | was truly serverless infrastructure.
00:21:17.520 | The other, on your landing page,
00:21:19.240 | you called them native cloud functions,
00:21:22.040 | something like that.
00:21:23.680 | I think the issue I've seen with serverless
00:21:25.840 | has always been people really wanted it to be stateful,
00:21:29.640 | even though stateless was much easier to do.
00:21:32.000 | And I think now with AI,
00:21:33.720 | most model inference is like stateless,
00:21:36.320 | you know, outside of the context.
00:21:37.560 | So that's kind of made it a lot easier
00:21:39.720 | to just put a model, a model,
00:21:41.760 | like a AI model on model to run.
00:21:45.040 | How do you think about how that changes
00:21:46.840 | how people think about infrastructure too?
00:21:48.760 | - Yeah, I mean, I think modal is definitely
00:21:50.600 | going in the direction of like doing more stateful things
00:21:52.800 | and working with data and like high IO use cases.
00:21:55.760 | I do think one like massive, like,
00:21:59.200 | serendipitous thing that happened like halfway,
00:22:01.200 | you know, a year and a half into like the,
00:22:02.920 | you know, building modal was like Gen AI started exploding.
00:22:05.520 | And like the IO pattern of Gen AI is like,
00:22:08.880 | fits the serverless model like so well,
00:22:11.360 | because like, it's like, you know,
00:22:13.080 | you send this tiny piece of information,
00:22:14.480 | like a prompt, right?
00:22:15.440 | Or something like that.
00:22:16.520 | And then like, you have this GPU
00:22:17.680 | that does like trillions of flops,
00:22:19.480 | and then it sends back like a tiny piece of information.
00:22:21.640 | Right?
00:22:22.480 | And that turns out to be something like, you know,
00:22:23.600 | if you can get serverless working with GPU,
00:22:25.600 | that just like works really well, right?
00:22:27.440 | So I think from that point of view,
00:22:28.640 | like serverless always, to me,
00:22:30.160 | felt like a little bit of like a solution
00:22:31.640 | when looking for a problem.
00:22:32.920 | I don't know, I don't actually like,
00:22:34.520 | don't think like backend is like
00:22:36.400 | the problem that needs to serve it.
00:22:37.640 | Or like not as much, but I look at data,
00:22:40.120 | and in particular like things like Gen AI,
00:22:41.400 | like model inference, like it's like clearly a good fit.
00:22:44.000 | So I think that is, you know,
00:22:46.560 | to a large extent explains like why we saw,
00:22:49.000 | you know, the initial sort of like killer app
00:22:51.600 | for modal being model inference,
00:22:53.160 | which actually wasn't like necessarily
00:22:54.360 | what we're focused on.
00:22:55.480 | But that's where we've seen like by far
00:22:58.320 | the most usage and growth.
00:22:59.880 | - And was that stable diffusion?
00:23:01.600 | - Stable diffusion in particular, yeah.
00:23:02.760 | - Yeah, and this was before you started offering
00:23:05.840 | like fine tuning of language models.
00:23:07.080 | It was mostly stable diffusion.
00:23:09.800 | - Yeah, yeah.
00:23:10.640 | I mean, like modal, like I always built it
00:23:12.080 | to be a very general purpose compute platform,
00:23:14.080 | like something where you could run everything.
00:23:15.200 | And I used to call modal like a better Kubernetes
00:23:17.440 | for data team for a long time.
00:23:19.600 | And what we realized was like,
00:23:22.000 | yeah, that's like, you know, a year and a half in,
00:23:23.520 | like we barely had any users or any revenue.
00:23:25.760 | And like, we were like, well, maybe we should look
00:23:27.640 | at like some use case, trying to think of use case.
00:23:29.400 | And that was around the same time stable diffusion came out.
00:23:32.440 | And yeah, like, I mean, like the beauty of modal
00:23:35.200 | is like you can run almost anything on modal, right?
00:23:37.840 | Like modal inference turned out to be like the place
00:23:39.560 | where we found initially, well, like clearly this has
00:23:41.880 | like 10X more ergonomic, like better ergonomics
00:23:44.120 | than anything else.
00:23:45.480 | But we're also like, you know,
00:23:46.800 | going back to my original vision,
00:23:48.040 | like we're thinking a lot about, you know,
00:23:49.800 | now, okay, now we do inference really well.
00:23:51.520 | Like what about training?
00:23:52.360 | What about fine tuning?
00:23:53.200 | What about, you know, end-to-end lifecycle deployment?
00:23:54.840 | What about data pre-processing?
00:23:56.120 | What about, you know, I don't know, real-time streaming?
00:23:58.520 | What about, you know, large data munging?
00:24:02.400 | Like there's just data observability.
00:24:04.680 | I think there's so many things, like kind of going back
00:24:07.320 | to what I said about like redefining data stack,
00:24:09.200 | like starting with the foundation of compute,
00:24:12.760 | like one of the exciting things about modal is like,
00:24:14.720 | we've sort of, you know, we've been working on that
00:24:16.760 | for three years and it's maturing.
00:24:18.120 | But like, this is so many things you can do,
00:24:20.560 | like with just like a better compute primitive
00:24:23.760 | and also go up to stack and like do all this other stuff
00:24:25.920 | on top of it.
00:24:27.200 | - Yeah, how do you think about, or rather like,
00:24:30.400 | I would love to learn more about the underlying
00:24:32.520 | infrastructure and like how you make that happen
00:24:34.980 | because with fine tuning and training,
00:24:37.580 | it's a static memory that you're gonna,
00:24:40.440 | like you exactly know what you're gonna load in memory one
00:24:43.240 | and it's kind of like a set amount of compute
00:24:45.040 | versus inference, just like data is like very bursty.
00:24:48.480 | How do you make batches work
00:24:51.520 | with a serverless developer experience?
00:24:56.000 | You know, like what are like some fun technical challenge
00:24:58.460 | you solve to make sure you get max utilization
00:25:00.680 | on this GPUs?
00:25:01.520 | What we hear from people is like, we have GPUs,
00:25:03.760 | but we can really only get like, you know,
00:25:05.640 | 30, 40, 50% maybe utilization.
00:25:08.500 | - Yeah.
00:25:09.340 | - What's some of the fun stuff you're working on
00:25:12.080 | to get a higher number there?
00:25:13.420 | - Yeah, I think on the inference side,
00:25:14.760 | like that's where we like, you know,
00:25:16.480 | like from a cost perspective, like utilization perspective,
00:25:18.840 | we've seen, you know, like very, very good numbers.
00:25:21.520 | And in particular, like it's our ability to start containers
00:25:23.640 | and stop containers very quickly.
00:25:25.480 | And that means that we can, you know,
00:25:28.280 | we can auto scale extremely fast
00:25:29.920 | and scale down very quickly,
00:25:31.600 | which means like we can always adjust the sort of capacity,
00:25:33.980 | the number of GPUs running to the exact, you know,
00:25:36.520 | the traffic volume.
00:25:38.240 | And so in many cases, like that actually leads
00:25:41.320 | to a sort of interesting thing where like,
00:25:42.440 | we obviously run our things on like the public cloud,
00:25:44.280 | like AWS GCP, we're on an Oracle.
00:25:47.320 | But in many cases, like users who do inference
00:25:50.680 | on those platforms or those clouds,
00:25:53.880 | even though we charge a slightly higher price per GPU hour,
00:25:58.480 | a lot of users like moving their large scale inference
00:26:00.600 | use cases to model, they end up saving a lot of money.
00:26:02.680 | 'Cause we only charge for like with the time
00:26:04.320 | the GPU is actually running.
00:26:05.780 | And that's a hard problem, right?
00:26:07.200 | Like if you go, you know,
00:26:08.040 | if you have to constantly adjust the number of machines,
00:26:10.680 | if you have to start containers, stop containers,
00:26:12.080 | like that's a very hard problem.
00:26:13.280 | And that, you know, and starting containers quickly
00:26:15.640 | is a very difficult thing.
00:26:17.160 | I mentioned we had to build our own file system for this.
00:26:21.560 | We also, you know, built our own container scheduler
00:26:24.720 | for that.
00:26:26.240 | We're looking, we've implemented recently
00:26:28.160 | CPU memory checkpointing, so we can take running containers
00:26:31.280 | and snapshot the entire CPU, like including registers
00:26:34.280 | and everything, and restore it from that point,
00:26:37.000 | which means we can restore it from like an initialized state.
00:26:42.000 | We're looking at GPU checkpointing next,
00:26:43.920 | it's like a very interesting thing.
00:26:45.120 | So I think on the inference stuff, on the inference side,
00:26:47.680 | like that's where serverless really shines,
00:26:50.840 | because you can drive, you know,
00:26:52.320 | you can push the frontier of latency versus utilization
00:26:56.080 | quite substantially, you know,
00:26:58.240 | which either ends up being a latency advantage
00:26:59.960 | or a cost advantage, or both, right?
00:27:02.080 | On training, it's probably arguably like less
00:27:03.760 | of an advantage doing serverless, frankly,
00:27:06.640 | 'cause, you know, you can just like spin up
00:27:07.960 | a bunch of machines and try to satisfy,
00:27:09.760 | like, you know, train as much as you can on each machine.
00:27:12.960 | For that area, like we've seen like, you know,
00:27:14.880 | arguably like less usage, like for modal.
00:27:17.320 | But there are always like some interesting use case,
00:27:18.760 | like we do have a couple of customers,
00:27:20.000 | like RAM, for instance,
00:27:20.840 | like they do fine tuning with modal,
00:27:22.520 | and they basically like one of the patterns they have
00:27:24.600 | is like very bursty type fine tuning,
00:27:26.200 | where they fine tune 100 models in parallel.
00:27:28.120 | And that's like a separate thing
00:27:29.120 | that modal does really well, right?
00:27:30.080 | Like we can start up 100 containers very quickly,
00:27:32.640 | run a fine tuning training job on each one of them
00:27:35.360 | for that only runs for, I don't know, 10, 20 minutes.
00:27:37.840 | And then, you know, you can do hyper parameter tuning
00:27:40.400 | in that sense, like just pick the best model
00:27:41.880 | and things like that.
00:27:42.720 | So there are like interesting training.
00:27:44.360 | Like I think when you get to like training
00:27:45.840 | like very large foundational models,
00:27:47.320 | that's a use case we don't support super well,
00:27:48.880 | 'cause that's very high IO, you know,
00:27:50.920 | you need to have like infinity band and all these things.
00:27:52.920 | And those are things we haven't supported yet,
00:27:55.120 | and might take a while to get to that.
00:27:57.080 | So that's like probably like an area
00:27:58.360 | where like we're relatively weakened.
00:27:59.720 | - Yeah, have you cared at all
00:28:01.080 | about lower level model optimization?
00:28:03.800 | There's other cloud providers that do custom kernels
00:28:06.580 | to get better performance,
00:28:07.680 | or are you just given that you're not just an AI
00:28:10.840 | compute company?
00:28:12.440 | - Yeah, I mean, I think like we wanna support
00:28:14.120 | like a generic, like general workloads in a sense
00:28:16.200 | that like we want users to give us a container essentially,
00:28:18.320 | or a code or code, and then we wanna run that.
00:28:20.960 | So I think, you know, we benefit from those things
00:28:25.320 | in the sense that like we, you know,
00:28:27.320 | we can tell our users, you know, to use those things.
00:28:30.480 | But I don't know if we wanna like poke
00:28:32.120 | into users containers and like do those things automatically
00:28:34.480 | that's sort of, I think a little bit tricky
00:28:36.120 | from the outside to do, 'cause, you know,
00:28:37.680 | we wanna be able to take like arbitrary code and execute it.
00:28:40.760 | But certainly like, you know,
00:28:41.720 | we can tell our users to like use those things.
00:28:44.240 | - Yeah, I may have betrayed my own biases
00:28:48.640 | because I don't really think about Modal
00:28:50.640 | as four data teams anymore.
00:28:52.840 | I think you started that way.
00:28:54.400 | I think you're much more for AI engineers.
00:28:56.480 | And, you know, one of my favorite anecdotes,
00:29:01.200 | which I think you know, but I don't know
00:29:03.280 | if you directly experienced it.
00:29:06.280 | I went through the Vercel AI Accelerator,
00:29:07.800 | which you supported.
00:29:08.880 | - Yeah.
00:29:09.720 | - And in the Vercel AI Accelerator,
00:29:12.640 | a bunch of startups gave like free credits
00:29:14.320 | and like signups and talks and all that stuff.
00:29:16.880 | The only ones that stuck are the people,
00:29:18.160 | are the ones that actually appealed to engineers.
00:29:20.280 | And the top usage, the top tool used by far was Modal.
00:29:23.400 | - Hmm, that's awesome.
00:29:24.240 | - For people building with AI apps.
00:29:26.280 | - Yeah, I mean, it might be also like a terminology question
00:29:28.560 | like the AI versus data, right?
00:29:29.920 | Like I've, you know, maybe I'm just like old and jaded,
00:29:32.080 | but like I've seen so many like different titles.
00:29:34.160 | Like for a while it was like, you know,
00:29:36.600 | I was a data scientist and I was a machine learning engineer
00:29:39.240 | and then, you know, there was like analytics engineers
00:29:41.280 | and then it was like an AI engineer, you know?
00:29:43.080 | So like, to me, it's like, I just like, in my head,
00:29:45.520 | that's to me just like-
00:29:46.360 | - Just engineer.
00:29:47.200 | - Just data, like, or like engineer, you know?
00:29:49.320 | Like, I don't really, so that's why I've been like,
00:29:50.960 | you know, just calling it data teams.
00:29:52.800 | But like, of course, like, you know, AI is like, you know,
00:29:55.760 | like such a massive fraction of our like workloads.
00:29:58.720 | - It's a different Venn diagram of things you do, right?
00:30:01.880 | So the stuff that you're talking about
00:30:03.280 | where you need like infinity bands
00:30:05.120 | for like highly parallel training,
00:30:07.280 | that's not, that's more of the ML engineer
00:30:08.960 | and that's more of the research scientist.
00:30:10.400 | - Yeah, yeah.
00:30:11.240 | - And less of the AI engineer,
00:30:12.520 | which is more sort of trying to put,
00:30:14.480 | work at the application.
00:30:15.320 | - Yeah, I mean, to be fair to it,
00:30:16.440 | like, we have a lot of users that are like doing stuff
00:30:18.840 | that I don't think fits neatly into like AI.
00:30:21.680 | Like, we have a lot of people
00:30:22.520 | using like model for web scraping.
00:30:23.840 | Like, it's kind of nice.
00:30:24.680 | Like, you can just like, you know,
00:30:25.720 | fire up like a hundred or a thousand containers
00:30:27.960 | running Chromium and just like render a bunch of webpages
00:30:30.120 | and it takes, you know, whatever.
00:30:31.400 | Or like, you know, protein folding.
00:30:33.880 | Is that, I mean, maybe that's, I don't know.
00:30:35.720 | Like, you know, we have a bunch of users doing that
00:30:37.840 | or like, you know, in terms of in the realm of biotech,
00:30:41.000 | like sequence alignment, like people using,
00:30:43.040 | or like a couple of people using like model
00:30:44.960 | to run like large, like mixed integer programming problems,
00:30:47.400 | like, you know, using Garobi or things like that.
00:30:49.440 | So video processing is another thing that keeps coming up.
00:30:52.300 | Like, you know, let's say you have like petabytes of video
00:30:55.000 | and you want to just like transcoded,
00:30:56.160 | like, or you can fire up a lot of containers
00:30:57.920 | and just run FFmpeg or like,
00:31:00.040 | so there are those things too.
00:31:01.120 | Like, I mean, like that being said,
00:31:02.520 | like AI is by far our biggest use case,
00:31:04.020 | but, you know, like again,
00:31:05.440 | like model is kind of general purpose in that sense.
00:31:07.520 | - Yeah, well maybe,
00:31:09.200 | so I'll stick to the stable diffusion thing
00:31:11.160 | and then we'll move on to the other use cases
00:31:13.160 | of sort of for AI that you want to highlight.
00:31:16.160 | The other big player in my mind is Replicate.
00:31:20.640 | - Yeah.
00:31:21.480 | - In this era.
00:31:22.840 | They're much more, I guess, custom built for that purpose,
00:31:25.840 | whereas you're more general purpose.
00:31:27.600 | How do you position yourself with them?
00:31:31.240 | Are they just for like different audiences
00:31:32.840 | or are you just heads on competitive competing?
00:31:34.880 | - I think there's like a tiny sliver of the Venn diagram
00:31:38.480 | where we're competitive
00:31:39.520 | and then like 99% of the area we're not competitive.
00:31:43.440 | I mean, I think for people who,
00:31:46.200 | if you think of like front engineers,
00:31:47.240 | I think that's where like really they found good fit.
00:31:48.880 | It's like, you know, people who built some cool web app
00:31:50.960 | and they want some sort of AI capability
00:31:52.940 | and they just, you know, an off the shelf model
00:31:55.160 | is like perfect for them.
00:31:56.520 | That's like use Replicate, that's great, right?
00:31:59.400 | Like, I think where we shine is like custom models
00:32:02.180 | or custom workflows, you know,
00:32:04.040 | running things at very large scale.
00:32:05.400 | We need to care about utilization, care about costs.
00:32:07.760 | You know, we have much lower prices
00:32:09.520 | 'cause we spent a lot more time
00:32:10.520 | optimizing our infrastructure.
00:32:12.680 | And, you know, and that's where we're competitive, right?
00:32:14.560 | Like, you know, and you look at some of our use cases,
00:32:16.200 | like Suno is a big user.
00:32:17.960 | Like they're running like large scale like AI.
00:32:19.720 | - We're talking with Mikey in a month.
00:32:22.120 | - Yeah, so I mean, they're using Model
00:32:23.880 | for like production infrastructure.
00:32:24.980 | Like they have their own like custom model,
00:32:26.880 | like custom code and custom weights, you know,
00:32:28.460 | for AI generated music, Suno.ai.
00:32:31.100 | You know, those are the types of use cases that we like,
00:32:33.520 | you know, things that are like very custom
00:32:35.120 | or like it's like, you know,
00:32:36.680 | and those are the things like
00:32:37.640 | it's very hard to run and replicate, right?
00:32:39.480 | And that's fine.
00:32:40.320 | Like I think they focus on a very different part
00:32:42.240 | of the stack in that sense.
00:32:43.520 | - And then the other company pattern
00:32:46.840 | that I pattern match you to is Modular.
00:32:49.340 | I don't know if you-- - 'Cause of the names?
00:32:51.680 | - No, no, well, no, but yes, the name is very similar.
00:32:54.880 | I think there's something that might be insightful there
00:32:58.680 | from a linguistics point of view.
00:33:00.120 | But no, they have Mojo, the sort of Python SDK.
00:33:03.480 | And then they have the Modular Inference Engine,
00:33:04.960 | which is their cloud stack,
00:33:07.200 | their sort of compute inference stack.
00:33:09.160 | I don't know if anyone's made the comparison to you before,
00:33:12.520 | but I see you evolving a little bit in parallel there.
00:33:16.160 | - No, I mean, maybe, yeah.
00:33:18.120 | Like it's not a company I'm like super like familiar,
00:33:20.320 | like, I mean, I know the basics,
00:33:21.640 | but like, I guess they're similar in the sense
00:33:23.400 | like they wanna like do a lot of, you know,
00:33:24.920 | they have sort of big picture vision.
00:33:26.920 | - Yes, they also wanna build very general purpose.
00:33:28.720 | - Yeah. - And they also are--
00:33:29.800 | - Which I admire. - Marketing themselves
00:33:31.280 | as like, if you wanna do off the shelf stuff,
00:33:33.760 | go somewhere else.
00:33:35.000 | If you wanna do custom stuff,
00:33:36.000 | we're the best place to do it.
00:33:37.040 | - Yeah, yeah.
00:33:38.360 | - There is some overlap there.
00:33:39.640 | There's not overlap in the sense
00:33:40.880 | that you are a closed source platform,
00:33:45.080 | people have to host their code on you.
00:33:47.120 | - That's true.
00:33:48.160 | - Whereas for them, they're very insistent
00:33:51.480 | on not running their own cloud service.
00:33:53.800 | - Yeah. - They're a box software.
00:33:55.400 | - Yeah, yeah. - They're licensed software.
00:33:57.680 | - I'm sure their VCs at some point
00:33:59.080 | can have forced them to reconsider.
00:34:00.800 | - No, no, Chris is very, very insistent
00:34:02.960 | and very convincing. (laughs)
00:34:05.680 | So anyway, I would just make that comparison,
00:34:08.440 | let people make the links if they want to,
00:34:09.840 | but it's an interesting way to see the cloud market develop
00:34:13.000 | from my point of view 'cause--
00:34:14.320 | - Yeah. - I came up in this field
00:34:16.680 | thinking cloud is one thing
00:34:17.720 | and I think your vision is like something slightly different
00:34:21.480 | and I'd like to see the different takes on it.
00:34:23.440 | - Yeah, and like one thing I've, you know,
00:34:25.440 | like I've written a bit about it in my blog too,
00:34:27.160 | it's like I think of us as like a second layer
00:34:29.360 | of cloud provider in the sense that like,
00:34:30.600 | I think Snowflake is like kind of a good analogy.
00:34:32.680 | Like Snowflake, you know,
00:34:33.800 | is infrastructure as a service, right?
00:34:35.920 | But they actually run on the like major clouds, right?
00:34:38.520 | And I mean, like you can like analyze this very deeply,
00:34:41.240 | but like one of the things I always thought about
00:34:42.400 | is like why did Snowflake already like win over Redshift?
00:34:44.720 | And I think Snowflake, you know, to me,
00:34:47.560 | one, because like, I mean, in the end,
00:34:49.120 | like AWS makes all the money anyway,
00:34:50.560 | like and like Snowflake just had the ability
00:34:53.000 | to like focus on like developer experience
00:34:55.920 | or like, you know, user experience.
00:34:57.440 | And to me, like really proved
00:34:59.480 | that you can build a cloud provider,
00:35:01.760 | a layer up from, you know, the traditional like public clouds
00:35:04.760 | and in that layer, that's also where I would put Modal.
00:35:08.320 | It's like, you know, we're building a cloud provider.
00:35:09.880 | Like we're, you know, we're like a multi-tenant environment
00:35:12.520 | that runs the user code,
00:35:14.200 | but also building on top of the public cloud.
00:35:15.720 | So I think there's a lot of room in that space.
00:35:17.480 | I think it's very sort of interesting direction.
00:35:20.000 | - Yeah, how do you think of that
00:35:22.040 | compared to the traditional past history?
00:35:25.280 | Like, you know, AWS, then you had Heroku,
00:35:27.760 | then you had Render Railway.
00:35:30.120 | - Yeah, I mean, I think those are all like great.
00:35:32.560 | Like, I think the problem that they all faced
00:35:34.720 | was like the graduation problem, right?
00:35:36.920 | Like, you know, Heroku or like, I mean,
00:35:39.040 | like also like Heroku, there's like a counterfactual future
00:35:41.880 | of like what would have happened
00:35:43.160 | if Salesforce didn't buy them, right?
00:35:44.560 | Like, that's a sort of separate thing.
00:35:45.840 | But like, I think what Heroku,
00:35:48.040 | I think always struggled with was like,
00:35:49.680 | eventually companies would get big enough
00:35:52.560 | that you couldn't really justify running in Heroku.
00:35:54.880 | They would just go and like move it to, you know,
00:35:56.800 | whatever AWS or, you know, in particular.
00:35:59.440 | And, you know, that's something
00:36:00.560 | that keeps me up at night too.
00:36:01.520 | Like, what does that graduation risk look like for modal?
00:36:05.760 | I always think like the only way to do,
00:36:08.000 | to build infrastructure,
00:36:10.080 | to build a successful infrastructure company
00:36:11.480 | in the long run in the cloud today
00:36:13.640 | is you have to appeal to the entire spectrum, right?
00:36:16.480 | Or at least like the enterprise,
00:36:17.920 | like you have to capture the enterprise market.
00:36:19.960 | And, but the truly good companies
00:36:21.720 | capture the whole spectrum, right?
00:36:22.840 | Like I think of companies like,
00:36:24.320 | I don't like Datadog or Mongo or something like that,
00:36:26.040 | where like they both captured like the hobbyists,
00:36:28.440 | like and acquire them,
00:36:32.000 | but also like, you know,
00:36:33.360 | have very large enterprise customers.
00:36:35.280 | So I think that arguably was like where I,
00:36:38.400 | in my opinion, like Heroku struggle was like,
00:36:41.120 | how do you maintain the customers
00:36:42.840 | as they get more and more advanced?
00:36:44.160 | I don't know what the solution is,
00:36:45.320 | but I think there's, you know,
00:36:47.480 | that's something I would have thought deeply
00:36:48.800 | if I was at Heroku at that time.
00:36:50.760 | - What's the AI graduation problem?
00:36:52.640 | Is it, I need to fine tune the model,
00:36:55.120 | I need better economics,
00:36:56.440 | any insights from customer discussions?
00:36:58.600 | - Yeah, I mean, better economics certainly,
00:37:00.160 | but although like I would say like,
00:37:01.560 | even for people who like, you know,
00:37:03.040 | needs like thousands of GPUs at very, you know,
00:37:05.600 | like just because we can drive utilization so much better,
00:37:09.520 | like we, there's actually like a cost advantage
00:37:11.880 | of staying on model.
00:37:13.080 | But yeah, I mean, it's certainly like, you know,
00:37:15.680 | and then like the fact that VCs like love, you know,
00:37:17.840 | throwing money at least used to, you know,
00:37:19.880 | at companies who need it to buy GPUs.
00:37:21.720 | I think that didn't help the problem.
00:37:23.560 | Yeah, and in training, I think, you know,
00:37:26.720 | there's less software differentiation.
00:37:28.640 | So in training, I think there's certainly like
00:37:30.000 | better economics of like buying big clusters.
00:37:32.320 | But I mean, my hope it's gonna change, right?
00:37:36.520 | Like I think, you know, we're still pretty early
00:37:38.320 | in the cycle of like building AI infrastructure.
00:37:41.560 | And, you know, I think a lot of these companies
00:37:44.560 | over in the long run, like, you know,
00:37:46.280 | they're accepted maybe super big ones,
00:37:48.240 | like, you know, the Facebook and Google,
00:37:51.160 | they're always gonna build their own ones.
00:37:52.200 | But like everyone else, like some extent, you know,
00:37:54.880 | I think they're better off like buying platforms.
00:37:57.280 | And, you know, someone's gonna have to build those platforms.
00:37:59.960 | - Yeah, cool.
00:38:02.120 | Let's move on to language models.
00:38:04.320 | And just specifically that workload,
00:38:06.720 | just to flesh it out a little bit.
00:38:08.360 | You already said that Ramp is like fine tuning
00:38:11.080 | a hundred models at once simultaneously on model.
00:38:14.400 | Closer to home, my favorite example is EricBot.
00:38:19.320 | Maybe you wanna tell that story.
00:38:20.840 | - Yeah, I mean, it was a prototype thing we built for fun,
00:38:24.280 | but it was pretty cool.
00:38:25.200 | Like we basically built this thing that you can,
00:38:27.600 | it like hooks up to Slack.
00:38:28.920 | It like downloads all the Slack history
00:38:30.960 | and, you know, fine tunes a model based on a person.
00:38:33.120 | And then you can chat with that.
00:38:34.640 | And so you can like, you know, clone yourself
00:38:36.320 | and like talk to yourself on Slack.
00:38:37.720 | I mean, it's like nice, like demo.
00:38:39.120 | And it's just like, I think like it's like
00:38:40.720 | fully contained model.
00:38:41.840 | Like there's a model app that does everything, right?
00:38:44.080 | Like it downloads Slack, you know,
00:38:45.600 | integrates with the Slack API,
00:38:46.840 | like downloads the stuff, the data,
00:38:48.520 | like just runs the fine tuning.
00:38:50.000 | And then like creates like dynamically
00:38:51.720 | an inference endpoint.
00:38:53.040 | And it's all like self-contained
00:38:54.120 | and like, you know, a few hundred lines of code.
00:38:55.360 | So I think it's sort of a good kind of use case for more,
00:38:58.400 | or like it kind of demonstrates
00:38:59.800 | a lot of the capabilities of model.
00:39:01.440 | - Yeah, on a more personal side,
00:39:03.280 | how close did you feel EricBot was to you?
00:39:07.000 | - It definitely captured the like, the language.
00:39:10.160 | Yeah, I mean, I don't know, like the content.
00:39:14.120 | I mean, like, it's like,
00:39:15.240 | I always feel this way about like AI
00:39:17.040 | and it's gotten better,
00:39:17.880 | but like you look at like AI output of text,
00:39:20.520 | like, and it's like, when you glance at it,
00:39:22.880 | it's like, yeah, this seems really smart, you know?
00:39:25.280 | But then you actually like look a little bit deeper.
00:39:26.800 | It's like, what does this mean?
00:39:28.240 | What does this person say?
00:39:29.080 | It's like kind of vacuous, right?
00:39:30.280 | And that's like kind of what I felt like, you know,
00:39:32.000 | talking to like my clone version.
00:39:33.840 | Like it like says like things like the grammar is correct.
00:39:36.520 | Like some of the sentences make a lot of sense,
00:39:38.280 | but like, what are you trying to say?
00:39:40.360 | Like, there's no content here.
00:39:42.080 | I don't know.
00:39:42.920 | I mean, it's like, I got that feeling also with chat TBT
00:39:45.000 | in the like early versions, right?
00:39:46.120 | Now it's like better, but.
00:39:47.720 | - That's funny.
00:39:48.560 | Yeah, I built this thing called small podcaster
00:39:50.120 | to automate a lot of our back office work, so to speak.
00:39:53.440 | And it's great at transcript.
00:39:55.440 | It's great at doing chapters.
00:39:57.440 | And then I was like, okay,
00:39:58.560 | how about you come up with a short summary?
00:40:00.600 | And it's like, it sounds good,
00:40:02.720 | but it's like, it's not even the same ballpark
00:40:05.560 | as like what we end up writing.
00:40:07.600 | And it's hard to see how it's gonna get there.
00:40:10.720 | - Oh, I have ideas.
00:40:12.080 | - I'm certain it's gonna get there,
00:40:15.320 | but like, I agree with you, right?
00:40:17.000 | And like, I have the same thing.
00:40:18.120 | I don't know if you've read like AI generated books,
00:40:20.560 | like they just like kind of seem funny, right?
00:40:22.600 | Like there's off, right?
00:40:23.720 | But like you glance at it and it's like,
00:40:25.160 | oh, it's kind of cool.
00:40:26.440 | Like looks correct, but then it's like very weird
00:40:28.520 | when you actually read them.
00:40:29.920 | - Well, so for what it's worth,
00:40:32.800 | I think anyone can join the modal slack.
00:40:34.200 | Is it open to the public?
00:40:35.360 | - Yeah, totally.
00:40:36.200 | If you go to modal.com, there's a button in the footer.
00:40:39.040 | - Yeah, and then you can talk to Eric Bot.
00:40:41.000 | And then sometimes, I really like picking Eric Bot,
00:40:43.280 | and then you answer afterwards,
00:40:44.680 | but then you're like.
00:40:45.520 | - Really?
00:40:46.360 | - Yeah, I don't know if that's correct
00:40:47.200 | or like whatever.
00:40:48.240 | - Cool.
00:40:49.480 | - No, so, okay.
00:40:50.480 | Any other broader lessons, you know,
00:40:52.560 | just broadening out from like the single use case
00:40:54.840 | of fine tuning, like what are you seeing people do
00:40:58.040 | with fine tuning or just language models
00:41:00.800 | on modal in general?
00:41:01.960 | - Yeah, I mean, I think language models is interesting
00:41:03.960 | because so many people get started with APIs
00:41:08.200 | and that's, you know, they're just dominating a space
00:41:10.480 | in particular OpenAI, right?
00:41:11.880 | And that's not necessarily like a place
00:41:13.880 | where we aim to compete.
00:41:15.200 | I mean, maybe at some point,
00:41:16.040 | but like it's just not like a core focus for us.
00:41:17.760 | And I think sort of separately,
00:41:19.280 | it's sort of a question if like there's economics
00:41:20.800 | in that long term.
00:41:21.640 | But like, so we tend to focus on more like the areas
00:41:24.120 | like around it, right?
00:41:25.560 | Like fine tuning, like another use case we have
00:41:28.320 | is a bunch of people, Ramp included,
00:41:30.200 | is doing batch embeddings on modal.
00:41:32.160 | So let's say, you know, you have like a,
00:41:34.440 | actually we're like writing a blog post,
00:41:35.720 | like where we take all of Wikipedia
00:41:38.040 | and like parallelize embeddings in 15 minutes
00:41:41.080 | and produce vectors for each article.
00:41:43.720 | So those types of use cases,
00:41:45.040 | I think modal suits really well for.
00:41:47.400 | I think also a lot of like custom inference,
00:41:49.400 | like you have like, you know,
00:41:50.320 | structured output guided generation
00:41:52.760 | or things like that we have, you want more control.
00:41:56.640 | Like those are the things like we see a lot of users
00:41:58.400 | using modal for.
00:41:59.720 | But for a lot of people it's like, you know,
00:42:01.080 | just go use like GPT-4 and like, you know,
00:42:02.880 | that's like a great starting point
00:42:04.040 | and we're not trying to compete necessarily
00:42:05.480 | like directly with that.
00:42:06.600 | - Yeah, when you say parallelize,
00:42:09.320 | I think you should give people an idea
00:42:10.920 | of the order of magnitude of parallelism
00:42:12.680 | because I think people don't understand how parallel.
00:42:15.440 | So like, I think your classic hello world with modal
00:42:18.160 | is like some kind of Fibonacci function, right?
00:42:20.960 | - Yeah, we have a bunch of different ones.
00:42:21.800 | - Some recursive function.
00:42:22.640 | - Yeah, yeah, I mean, like, yeah,
00:42:23.880 | I mean, it's like pretty easy in modal,
00:42:25.040 | like fan out to like, you know,
00:42:26.760 | at least like a hundred GPUs, like in a few seconds.
00:42:28.880 | And, you know, if you give it like a couple of minutes,
00:42:30.800 | like we can, you know, you can fan out
00:42:32.360 | to like thousands of GPUs.
00:42:33.560 | Like we run it relatively large scale
00:42:36.200 | and yeah, we've run, you know,
00:42:39.000 | many thousands of GPUs at certain points when we need it,
00:42:41.680 | you know, big backfills
00:42:42.840 | or some customers had very large compute needs.
00:42:44.840 | - Yeah, yeah.
00:42:45.680 | And I mean, that's super useful for a number of things.
00:42:49.160 | One of the reasons actually I,
00:42:51.000 | so one of my early interactions with modal as well
00:42:53.200 | was with a small developer,
00:42:54.640 | which is my sort of coding agent.
00:42:56.880 | The reason I chose modal was a number of things.
00:42:58.880 | One, I just wanted to try it out.
00:43:00.120 | I just had an excuse to try it.
00:43:01.960 | Akshay offered to onboard me.
00:43:03.600 | - Yeah, good excuse.
00:43:04.600 | - But the most interesting thing was that
00:43:07.760 | you could have that sort of local development experience
00:43:10.880 | like I was running on my laptop,
00:43:12.320 | but then it would seamlessly translate to a cloud service
00:43:15.400 | or like a cloud hosted environment.
00:43:17.760 | And then it could fan out with concurrency controls.
00:43:20.160 | So I could say like, because like, you know,
00:43:21.920 | the number of times I hit the GPT-3 API at the time
00:43:26.640 | was gonna be subject to the rate limit from there.
00:43:29.680 | But I wanted to fan out
00:43:30.720 | without worrying about that kind of stuff.
00:43:32.760 | With modal, I can just kind of declare
00:43:34.320 | that in my config and that's it.
00:43:36.440 | - Oh, like a concurrency limit?
00:43:37.600 | - Yeah.
00:43:38.440 | - Yeah, there's a lot of control there, yeah.
00:43:39.280 | - Yeah, yeah, yeah.
00:43:40.120 | So like, I just wanted to highlight that to people as like,
00:43:41.840 | yeah, this is a pretty good use case for like,
00:43:43.960 | you know, just like writing this kind of LLM application code
00:43:48.440 | inside of this environment that just understands
00:43:50.960 | fan out and rate limiting natively.
00:43:55.720 | You don't actually have an exposed queue system,
00:43:57.640 | but you have it under the hood, you know,
00:43:59.400 | that kind of stuff.
00:44:00.480 | - It's a self-provisioning.
00:44:02.080 | (laughing)
00:44:04.880 | - So the last part of modal I wanted to touch on,
00:44:08.000 | and obviously feel free,
00:44:08.920 | I know you're working on new features,
00:44:10.960 | was the sandbox that was introduced last year.
00:44:15.320 | And this is something that I think was inspired
00:44:18.120 | by Code Interpreter.
00:44:18.960 | You can tell me the longer history behind that.
00:44:21.080 | - Yeah, like we originally built it for the use case.
00:44:24.120 | Like, there was a bunch of customers
00:44:25.760 | who looked into code generation applications
00:44:28.160 | and then they wanted, they came to us and asked us,
00:44:30.440 | is there a safe way to execute code?
00:44:33.040 | And yeah, we spent a lot of time on like container security.
00:44:35.320 | We used GeoVisor, for instance,
00:44:36.680 | which is a Google product that provides
00:44:38.280 | pretty strong isolation of code.
00:44:40.360 | So we built a product where you can basically
00:44:42.440 | run arbitrary code inside a container
00:44:44.360 | and monitor its output, or get it back in a safe way.
00:44:49.000 | I mean, over time, it's evolved into more of like,
00:44:52.840 | I think the long-term direction
00:44:54.160 | is actually, I think, more interesting,
00:44:55.360 | which is that I think modal as a platform
00:44:59.720 | where I think the core container infrastructure we offer
00:45:04.120 | could actually be like, you know,
00:45:05.160 | unbundled from like the client SDK
00:45:08.040 | and offered to like other, you know,
00:45:09.800 | like we're talking to a couple of like other companies
00:45:13.840 | that want to run, you know, through their packages,
00:45:16.600 | like run, execute jobs on modal,
00:45:19.560 | like kind of programmatically.
00:45:21.440 | So that's actually the direction like Sandbox is going.
00:45:23.400 | It's like turning into more like a platform for platforms
00:45:25.680 | is kind of what I've been thinking about it as.
00:45:26.920 | - Oh boy, platform, that's the old Kubernetes line.
00:45:30.040 | - Yeah, yeah, yeah, but it's like, you know,
00:45:31.880 | like having that ability to like programmatically,
00:45:36.280 | you know, create containers and execute them,
00:45:38.640 | I think is really cool.
00:45:40.520 | And I think it opens up a lot of interesting capabilities
00:45:43.360 | that are sort of separate from the like core Python SDK
00:45:47.040 | in modal.
00:45:47.920 | So I'm really excited about C.
00:45:49.240 | I mean, it's like one of those features
00:45:50.320 | that we kind of released in like, you know,
00:45:52.520 | then we kind of look at like
00:45:53.600 | what users actually build with it.
00:45:54.840 | And people are starting to build like kind of crazy things.
00:45:57.720 | And then, you know, we double down on some of those things
00:46:00.000 | 'cause when we see like, you know,
00:46:02.000 | potential new product features.
00:46:03.880 | And so Sandbox, I think in that sense,
00:46:05.400 | it's like kind of in that direction,
00:46:07.200 | we found a lot of like interesting use cases
00:46:09.440 | in the direction of like,
00:46:10.560 | it's like platformized container runner.
00:46:13.120 | - Can you be more specific about what you're double down on
00:46:15.480 | after seeing users in action?
00:46:17.440 | - Yeah, I mean, we're working with like some companies
00:46:20.080 | that, I mean, without getting into specifics,
00:46:24.440 | like that need the ability to take their user's code
00:46:30.320 | and then launch containers on modal.
00:46:33.640 | And it's not about security necessarily,
00:46:35.360 | like they just want to use modal as a backend, right?
00:46:37.520 | Like they may already provide like Kubernetes as a backend,
00:46:40.360 | Lambda as a backend,
00:46:41.360 | and now they want to add modal as a backend, right?
00:46:43.400 | And so, you know, they need a way
00:46:44.960 | to programmatically define jobs on behalf of their users
00:46:48.400 | and execute them.
00:46:49.240 | And so I don't know, that's kind of abstract,
00:46:51.480 | but does that make sense?
00:46:52.400 | - Yeah, I totally get it.
00:46:53.240 | It's sort of one level of recursion
00:46:55.640 | to sort of be the modal for their customers.
00:46:59.040 | - Exactly, yeah, exactly.
00:47:00.320 | - And CloudFlare has done this, you know,
00:47:02.520 | Kenton Vardar from CloudFlare,
00:47:03.720 | who's like the tech lead on this thing,
00:47:05.240 | called it sort of functions as a service as a service.
00:47:07.400 | - Yeah, that's exactly right.
00:47:09.560 | Fast sass.
00:47:10.960 | - Fast sass.
00:47:11.800 | - Fast sass.
00:47:12.640 | - Yeah, like, I mean, like that,
00:47:13.920 | I think any base layer, second layer,
00:47:18.520 | like cloud provider like yourself,
00:47:21.400 | compute provider like yourself should provide.
00:47:24.440 | It is like very, very, you know,
00:47:26.520 | it's a marker of maturity and success
00:47:28.360 | that people just trust you to do that.
00:47:30.240 | They'd rather build on top of you than compete with you.
00:47:32.920 | Like the more interesting thing for me is like,
00:47:35.520 | what does it mean to serve a computer,
00:47:38.480 | like a LLM customer developer,
00:47:41.560 | rather than a human developer, right?
00:47:42.840 | Like, that's what a sandbox is to me.
00:47:44.720 | - Yeah, for sure.
00:47:45.560 | - That you have to sort of redefine modal
00:47:47.240 | to serve a different non-human audience.
00:47:49.760 | - Yeah, yeah, yeah.
00:47:50.760 | And I think there's some really interesting people,
00:47:52.520 | you know, building very cool things.
00:47:53.920 | - Yeah, so I don't have an answer,
00:47:55.680 | but, you know, I imagine things like,
00:47:58.040 | hey, the way you give feedback is different.
00:48:00.640 | Maybe you have to like stream errors,
00:48:03.880 | log errors differently.
00:48:05.880 | I don't really know. (laughs)
00:48:07.120 | - Yeah.
00:48:08.000 | - Obviously there's like safety considerations.
00:48:10.080 | Maybe you have a API to like restrict access to the web.
00:48:13.120 | - Yeah.
00:48:13.960 | - I don't think anyone would use it,
00:48:15.960 | but it's there if you want it.
00:48:17.040 | - Yeah, yeah.
00:48:18.080 | - Any other sort of design considerations?
00:48:21.440 | I have no idea.
00:48:22.720 | - With sandboxes?
00:48:23.560 | - Yeah, open-ended question here.
00:48:26.120 | Yeah, I mean, no, I think, yeah,
00:48:27.920 | the network restrictions, I think, make a lot of sense.
00:48:31.440 | Yeah, I mean, I think, you know, long-term,
00:48:33.040 | like I think there's a lot of interesting use cases
00:48:34.600 | where like the LLM instead, in itself,
00:48:37.080 | can like decide I want to install these packages
00:48:39.200 | and like run this thing.
00:48:40.120 | And like, obviously, for a lot of those use cases,
00:48:42.160 | like you want to have some sort of control
00:48:43.960 | that it doesn't like install malicious stuff
00:48:45.760 | and steal your secrets and things like that.
00:48:47.360 | But I think that's what's exciting
00:48:49.360 | about the sandbox primitive,
00:48:50.320 | is like it lets you do that in a relatively safe way.
00:48:52.560 | - Yeah, cool.
00:48:54.280 | Do you have any thoughts on the inference wars?
00:48:59.200 | So a lot of providers are just rushing to the bottom
00:49:02.440 | to get the lowest price per million tokens.
00:49:04.800 | Some of them, you know, the Sean Randomat,
00:49:07.560 | they're just losing money.
00:49:08.720 | There's like the physics of it just don't work out
00:49:11.880 | for them to make any money on it.
00:49:13.520 | How do you think about your pricing
00:49:16.640 | and like how much premium you can get
00:49:19.560 | and you can kind of command
00:49:20.680 | versus using lower prices as kind of like a wedge
00:49:24.440 | into getting there,
00:49:25.280 | especially once you have model instrumented?
00:49:28.160 | Yeah, what are the trade-offs
00:49:29.680 | and any thoughts on strategies that work?
00:49:32.400 | - I mean, we focus more on like custom models
00:49:34.160 | and custom code.
00:49:35.200 | And I think in that space, there's like less competition.
00:49:38.200 | And I think we can, you know, have a pricing markup, right?
00:49:41.920 | Like, you know, people will always compare our prices
00:49:44.280 | to like, you know, the GPU power they can get elsewhere.
00:49:46.320 | And so how big can that markup be?
00:49:48.400 | Like it never can be, you know,
00:49:49.400 | we can never charge like 10X more,
00:49:51.000 | but we can certainly charge a premium.
00:49:52.240 | And like, you know, for that reason,
00:49:53.160 | like we can have pretty good margins.
00:49:54.680 | The LLM space is like the opposite.
00:49:56.120 | Like the switching costs of LLMs is zero, right?
00:49:58.600 | Like if all you're doing is like straight up,
00:50:00.400 | like at least like open source, right?
00:50:02.160 | Like if all you're doing is like, you know,
00:50:03.840 | using some, you know, inference endpoint
00:50:06.560 | that serves an open source model
00:50:08.560 | and, you know, some other provider comes along
00:50:10.120 | and like offers a lower price,
00:50:11.360 | you're just gonna switch, right?
00:50:12.200 | So I don't know, to me, that reminds me a lot of like,
00:50:15.040 | all this like 15 minute delivery wars or like, you know,
00:50:18.040 | like Uber versus Lyft or Jaffa Kings versus Fanta,
00:50:20.480 | or like, maybe that's not, but like, you know,
00:50:22.360 | and like maybe going back even further,
00:50:23.760 | like I think a lot about like the sort of,
00:50:25.520 | you know, flip side of this is like,
00:50:27.040 | this actually positive side of it is like,
00:50:29.040 | like, I think I thought a lot about like fiber optics boom
00:50:32.440 | of like 98, 99, like the other day, or like, you know,
00:50:35.640 | and also like the over-investment in GPU today.
00:50:37.920 | Like, yeah, like, you know, I don't know.
00:50:40.360 | Like in the end, like I don't think VCs
00:50:42.080 | will have the return they expected,
00:50:44.040 | like, you know, in these things,
00:50:45.880 | but guess who's gonna benefit?
00:50:47.120 | Like, you know, it's the consumers, right?
00:50:50.400 | Like, someone's like reaping the value of this.
00:50:54.160 | And that's, I think, an amazing flip side is that,
00:50:56.720 | you know, we should be very grateful, you know,
00:50:58.480 | the fact that like VCs wanna subsidize these things,
00:51:01.480 | which is, you know, like you go back to the fiber optics,
00:51:03.640 | like there's the extreme like over-investment
00:51:06.000 | in fiber optics network in the 99, like 98,
00:51:08.760 | and no one made money who did that.
00:51:10.960 | But consumers, you know, got tremendous benefits
00:51:14.800 | of all the fiber optics cables that were led,
00:51:18.360 | you know, throughout the country in the decades after.
00:51:20.880 | I feel something similar about like GPUs today,
00:51:23.680 | and also like specifically looking more narrowly
00:51:25.560 | at like LLM in France market, like that's great.
00:51:27.920 | Like, you know, I'm very happy that, you know,
00:51:31.040 | there's a price war.
00:51:32.680 | Modal is like not necessarily like participating
00:51:35.280 | in that price war, right?
00:51:36.120 | Like, I think, you know, it's gonna shake out
00:51:37.720 | and then someone's gonna win
00:51:39.040 | and then they're gonna raise prices or whatever.
00:51:40.560 | Like, we'll see how that works out.
00:51:42.200 | But it's not, for that reason,
00:51:44.480 | like we're not like focused,
00:51:45.560 | like we're not hyper focused on like serving,
00:51:48.040 | you know, just like straight up,
00:51:49.320 | like here's an end point to an open source model.
00:51:51.920 | We think the value in Modal comes from all these,
00:51:54.120 | you know, the other use cases,
00:51:56.000 | the more custom stuff like fine tuning
00:51:57.640 | and very complex, you know, guided output,
00:52:01.320 | like type stuff, or like also like in other,
00:52:03.640 | like outside of LLMs, like we focus a lot more
00:52:07.120 | on like image, audio, video stuff,
00:52:08.480 | 'cause that's where there's a lot more proprietary models,
00:52:10.960 | there's a lot more like custom workflows,
00:52:12.520 | and that's where I think, you know,
00:52:14.200 | Modal is more, you know,
00:52:16.040 | there's a lot of value in software differentiation.
00:52:18.520 | I think focusing on developer experience,
00:52:20.320 | developer productivity, that's where I think,
00:52:22.360 | you know, you can have more of a competitive mode.
00:52:25.320 | - Yeah.
00:52:26.160 | I'm curious what the difference is gonna be
00:52:28.520 | now that it's an enterprise.
00:52:29.640 | So like with DoorDash, Uber,
00:52:32.360 | they're gonna charge you more.
00:52:33.320 | And like, as a customer,
00:52:34.360 | like you can decide to not take Uber,
00:52:36.520 | but if you're a company building AI features
00:52:38.600 | in your product using the subsidized prices,
00:52:41.000 | and then, you know, the VC money dries up in a year
00:52:44.080 | and like prices go up, it's like,
00:52:46.120 | you can't really take the features back
00:52:48.360 | without a lot of backlash,
00:52:49.400 | but you also can not really kill your margins
00:52:51.680 | by paying the new price.
00:52:53.680 | So I don't know what that's gonna look like, but.
00:52:55.640 | - But like margins are gonna go up for sure,
00:52:57.280 | but I don't know if prices will go up,
00:52:58.800 | 'cause like GPU prices have to drop eventually, right?
00:53:02.640 | So like, you know, like in the long run,
00:53:04.560 | I still think like prices may not go up that much,
00:53:07.800 | but certainly margins will go up.
00:53:09.000 | Like, I think you said, Svek,
00:53:10.240 | that margins are negative right now.
00:53:11.480 | Like, you know, obviously-
00:53:12.880 | - For some people.
00:53:13.720 | - That's not sustainable.
00:53:15.680 | So certainly margins will have to go up.
00:53:17.120 | Like some companies are gonna have to make money
00:53:18.680 | in this space.
00:53:19.520 | Otherwise, like they're not gonna provide the service,
00:53:21.360 | but that's the equilibrium too, right?
00:53:22.600 | Like at some point, like, you know,
00:53:23.960 | that it sort of stabilizes
00:53:25.160 | and one or two or three providers make money.
00:53:27.760 | - Yeah, what else is maybe underrated, immoral,
00:53:32.400 | something that people don't talk enough about
00:53:35.120 | or yeah, that we didn't cover in the discussion?
00:53:37.880 | - Yeah, I think what are some other things?
00:53:41.440 | We talked about a lot of stuff.
00:53:42.520 | Like we have the bursty parallelism.
00:53:44.000 | I think that's pretty cool.
00:53:45.360 | Working on a lot of like, trying to figure out like,
00:53:49.320 | like kind of thinking more about the roadmap,
00:53:50.720 | but like one of the things I'm very excited about
00:53:52.280 | is building primitives for like more like
00:53:55.160 | IO intensive workloads.
00:53:56.680 | And so like we're building some like crude stuff right now
00:53:59.680 | where like you can like create like direct TCP tunnels
00:54:01.800 | to containers and that lets you like pipe data.
00:54:03.840 | And like, you know, we haven't really explored this
00:54:06.080 | as much as we should,
00:54:06.920 | but like there's a lot of interesting applications.
00:54:08.560 | Like you can actually do like kind of real-time video stuff
00:54:11.160 | in modal now because you can like create a tunnel to,
00:54:14.400 | yeah, exactly.
00:54:15.240 | You can create a raw TCP socket to a container,
00:54:17.560 | feed it video and then like, you know, get the video back.
00:54:20.440 | And I think like, it's still like a little bit like,
00:54:23.240 | you know, not fully ergonomically like figured out,
00:54:25.560 | but I think there's a lot of like super cool stuff.
00:54:28.120 | Like when we start enabling those more like
00:54:30.280 | high IO workloads, I'm super excited about.
00:54:34.880 | I think also like, you know, working with large datasets
00:54:37.000 | or kind of taking the ability to map and fan out
00:54:39.880 | and like building more like higher level,
00:54:41.320 | like functional primitives,
00:54:42.360 | like filters and group buys and joins.
00:54:44.280 | Like I think there's a lot of like really cool stuff
00:54:46.360 | you can do, but this is like, maybe like, you know,
00:54:49.480 | years out like.
00:54:50.320 | - Yeah, we can just broaden out from modal a little bit,
00:54:55.020 | but you still have a lot of, you have a lot of great tweets.
00:54:57.060 | So it's very easy to just kind of go through them.
00:55:00.160 | Why is Oracle underrated?
00:55:02.800 | - I love Oracle's GPUs.
00:55:05.260 | I mean, like, I don't know why, you know,
00:55:08.760 | what the economics looks like for Oracle,
00:55:10.880 | but like, I think they're great value for money.
00:55:14.520 | Like we run a bunch of stuff in Oracle
00:55:16.240 | and they have bare metal machines,
00:55:18.440 | like two terabytes of RAM.
00:55:19.560 | They're like super fast SSDs and yeah.
00:55:22.360 | Like compared, you know, I mean, we love AWS and AGSP too.
00:55:25.240 | We have great relationships with them,
00:55:27.720 | but I think Oracle's surprising.
00:55:29.400 | Like, you know, if you told me like three years ago
00:55:31.340 | that I would be using Oracle cloud,
00:55:32.480 | like what, wait, why?
00:55:34.240 | But now I'm, you know, I'm a happy customer.
00:55:37.040 | - And it's a combination of pricing
00:55:38.800 | and the kinds of SKUs, I guess, they offer.
00:55:41.920 | - Yeah, great, great machines, good prices, you know.
00:55:45.000 | - That's it. - Yeah, yeah.
00:55:46.440 | - That's all I care about.
00:55:47.280 | - Yeah, the sales team is pretty fun too.
00:55:48.520 | Like, I like them.
00:55:50.280 | - In Europe, people often talk about Hetzner.
00:55:52.920 | - Yeah, I'm not, I don't know, like Sue,
00:55:55.840 | like we've focused on the main clouds, right?
00:55:58.040 | Like we've, you know, Oracle, AWS, GCP,
00:55:59.840 | we'll probably add Azure at some point.
00:56:01.240 | I think, I mean, there's definitely a long tail of like,
00:56:04.000 | you know, CoreWeave, Hetzner,
00:56:06.200 | like Lambda, like all these things.
00:56:09.720 | And like over time, I think we'll look at those too.
00:56:11.720 | Like, you know, wherever we can get the right,
00:56:13.400 | you know, GPUs at the right price.
00:56:15.840 | Yeah, I mean, I think it's fascinating.
00:56:17.040 | Like, it's a tough business.
00:56:19.720 | Like, I wouldn't want to try to build like a cloud provider.
00:56:22.520 | You know, it's just, you just have to be like
00:56:24.360 | incredibly focused on like, you know, efficiency
00:56:27.160 | and margins and things like that.
00:56:28.400 | But I mean, I'm glad people are trying.
00:56:30.680 | - Yeah, and you can ramp up on any of these clouds
00:56:33.400 | very quickly, right?
00:56:34.240 | 'Cause it's-- - Yeah, I mean, yeah.
00:56:35.880 | Like, I think so.
00:56:36.760 | Like, we, like a lot of, you know,
00:56:39.760 | what Modal does is like programmatic, you know,
00:56:42.840 | launching and termination of machines.
00:56:44.400 | So that's like, what's nice about the clouds is,
00:56:47.440 | you know, they're relatively like immature APIs
00:56:49.680 | for doing that, as well as like, you know,
00:56:51.680 | support for Terraform for all the networking
00:56:53.520 | and all this stuff.
00:56:54.360 | That makes it easier to work with the big clouds.
00:56:57.040 | But yeah, I mean, some of those things,
00:56:58.360 | like I think, you know, I also expect the smaller clouds
00:57:00.360 | to like embrace those things in the long run.
00:57:02.440 | But also think, you know, we can also probably integrate
00:57:05.360 | with some of the clouds, like even without that.
00:57:08.440 | There's always an HTML API that you can use.
00:57:11.640 | Just like script something that launches instances,
00:57:14.160 | like through the web.
00:57:15.000 | - Yeah, I think a lot of people are always curious
00:57:18.560 | about whether or not you will buy your own hardware someday.
00:57:21.880 | I think you're pretty firm in that it's not your interest.
00:57:25.680 | But like your story and your growth does remind me
00:57:29.840 | a little bit of Cloudflare, which obviously, you know,
00:57:33.080 | invests a lot in its own physical network.
00:57:35.120 | - Yeah, I don't remember like early days.
00:57:37.080 | Like, did they have their own hardware or?
00:57:39.560 | - They bootstrapped a lot with like agreements
00:57:41.680 | through other, you know, providers.
00:57:44.760 | - Yeah, okay, interesting.
00:57:45.640 | - But now it's all their own hardware.
00:57:48.000 | - Yeah.
00:57:48.840 | - So I understand.
00:57:50.080 | - Yeah, I mean, my feeling is that
00:57:52.560 | when you're a venture funded startup,
00:57:54.160 | like buying physical hardware is maybe not
00:57:57.640 | the best use of the money.
00:57:59.800 | - No, I really wanted to put you in a room
00:58:01.520 | with Isocat from Poolside.
00:58:03.920 | - Yeah.
00:58:04.760 | - Totally opposite view.
00:58:05.640 | - Yeah.
00:58:06.480 | - This is great, yeah.
00:58:07.320 | - I mean, I don't, I just think for like
00:58:08.560 | a capital efficiency point of view,
00:58:09.800 | like do you really want to tie up that much money
00:58:11.440 | in like, you know, physical hardware
00:58:12.760 | and think about depreciation and like,
00:58:14.600 | like as much as possible, like I, you know,
00:58:18.160 | I favor a more capital efficient way of like,
00:58:20.520 | we don't want to own the hardware
00:58:21.640 | 'cause then, and ideally we want to,
00:58:23.920 | we want the sort of margin structure to be sort of like
00:58:27.080 | 100% correlated revenue in cogs in the sense that like,
00:58:30.080 | you know, when someone comes and pays us,
00:58:32.400 | you know, $1 for compute, like, you know,
00:58:34.440 | we immediately incur a cost of like whatever,
00:58:36.960 | 70 cents, 80 cents, you know,
00:58:38.800 | and there's like complete correlation
00:58:40.360 | between cost and revenue.
00:58:41.760 | 'Cause then you can leverage up in like a,
00:58:43.360 | kind of a nice way, you can scale very efficiently.
00:58:45.480 | You know, like that's not, you know,
00:58:47.680 | turns out like that's hard to do.
00:58:49.360 | Like you can't just only use like spotting
00:58:51.080 | on demand instances.
00:58:51.920 | Like over time, we've actually started adding
00:58:54.120 | pretty significant amount of reservations too.
00:58:56.240 | So I don't know, like reservation is always like
00:58:58.120 | one step towards owning your own hardware.
00:59:00.480 | Like, I don't know, like, do we really want to be,
00:59:02.400 | you know, thinking about switches and cooling
00:59:05.520 | and HVAC and like power supplies.
00:59:07.440 | - Disaster recovery.
00:59:08.800 | - Yeah, like, is that the thing I want to think about?
00:59:10.960 | Like, I don't know, like I like to make developers happy,
00:59:13.200 | but who knows, like maybe one day,
00:59:14.680 | like, but I don't think it's gonna happen anytime soon.
00:59:17.440 | - Yeah, obviously for what it's worth,
00:59:19.600 | obviously I'm a believer in cloud,
00:59:22.200 | but it's interesting to have the devil's advocate
00:59:25.320 | on the other side.
00:59:27.000 | The main thing you have to do is be confident
00:59:28.760 | that you can manage your depreciation
00:59:30.440 | better than the typical assumption,
00:59:32.440 | which is two to three years.
00:59:34.160 | - Yeah, yeah.
00:59:35.040 | - And so the moment you have a CTO that tells you,
00:59:38.920 | "No, I think I can make these things last seven years,"
00:59:41.760 | then it changes the math.
00:59:42.600 | - Yeah, yeah, but you know, are you deluding yourself then?
00:59:46.240 | That's the question, right?
00:59:47.080 | - Yeah, yeah.
00:59:47.920 | - It's like the waste management scandal.
00:59:49.640 | Do you know about that?
00:59:50.480 | Like they had all this like accounting scandal
00:59:52.680 | back in the '90s, like this garbage company,
00:59:55.280 | like was, they like started assuming their garbage trucks
00:59:59.520 | had a 10-year depreciation schedule,
01:00:02.040 | booked like a massive profit.
01:00:03.720 | You know, the stock went to like, you know, up like,
01:00:05.600 | you know, and then it turns out actually
01:00:07.240 | all those garbage trucks broke down
01:00:09.280 | and like you can't really depreciate them over 10 years.
01:00:11.560 | And so, so then the whole company, you know,
01:00:13.360 | they had to restate all the earnings and leaves.
01:00:16.480 | - Nice.
01:00:17.320 | Let's go into some personal nuggets.
01:00:21.400 | You received the IOI gold medal,
01:00:23.800 | which is the International Olympiad in Informatics.
01:00:27.120 | - 20 years ago.
01:00:28.280 | - Yeah.
01:00:29.120 | - How have these models
01:00:31.120 | and like going to change competitive programming?
01:00:33.800 | Like, do you think people still love the craft?
01:00:37.080 | I feel like over time, we're kind of like,
01:00:39.280 | programming has kind of lost
01:00:41.360 | maybe a little bit of its luster
01:00:43.240 | in the eyes of a lot of people.
01:00:45.200 | Yeah, I'm curious to see what you think.
01:00:49.480 | - I mean, maybe, but like, I don't know, like,
01:00:53.520 | you know, I've been coding for almost 30,
01:00:55.000 | or more than 30 years.
01:00:56.320 | And like, I feel like, you know,
01:00:58.480 | you look at like programming and, you know,
01:01:00.760 | where it is today versus where it was,
01:01:03.280 | you know, 30, 40, 50 years ago,
01:01:06.440 | there's like probably thousand times more developers today
01:01:10.120 | than, you know, so like,
01:01:11.120 | and every year there's more and more developers.
01:01:13.000 | And at the same time,
01:01:13.920 | developer productivity keeps going up.
01:01:15.760 | And so like, I actually don't expect,
01:01:17.480 | like, and when I look at the real world,
01:01:19.560 | I just think there's so much software
01:01:21.440 | that's still waiting to be built.
01:01:23.360 | Like, I think we can, you know,
01:01:25.080 | 10X the amount of developers
01:01:26.720 | and still, you know, have a lot of people
01:01:29.080 | making a lot of money, you know,
01:01:30.440 | building amazing software,
01:01:32.000 | and also being while at the same time being more productive.
01:01:34.560 | Like, I never understood this, like, you know,
01:01:36.480 | AI is gonna, you know, replace engineers.
01:01:38.840 | That's very rarely how this actually works.
01:01:41.440 | When AI makes engineers more productive,
01:01:44.840 | like the demand actually goes up
01:01:46.440 | because the cost of engineers goes down
01:01:47.960 | because you can build software more cheaply.
01:01:49.480 | And that's, I think, the story of software in the world
01:01:51.640 | over the last few decades.
01:01:52.920 | So, I mean, I don't know how this like relates
01:01:55.120 | to, like, competitive programming is a,
01:01:57.240 | like, I don't know.
01:01:58.400 | Kind of going back to your question,
01:01:59.720 | competitive programming to me was always
01:02:01.080 | kind of a weird kind of, you know, niche,
01:02:03.440 | like kind of, I don't know, I loved it.
01:02:05.000 | It's like puzzle solving.
01:02:07.360 | And like, my experience is like, you know,
01:02:09.640 | half of competitive programmers
01:02:11.680 | are able to translate that to actual, like,
01:02:14.040 | building cool stuff in the world.
01:02:16.000 | Half just like get really in, you know,
01:02:17.880 | sucked into this like puzzle stuff.
01:02:19.240 | And, you know, it never loses its grip on them.
01:02:23.840 | But like, for me, it was an amazing way
01:02:26.080 | to get started with coding and,
01:02:27.640 | or get very deep into coding and, you know,
01:02:30.280 | kind of battle off with like other smart kids
01:02:32.760 | and traveling to different countries
01:02:34.640 | when I was a teenager.
01:02:35.880 | - Yeah.
01:02:37.280 | There was another, oh, sorry.
01:02:38.920 | I was just going to mention, like,
01:02:40.200 | it's not just that he personally
01:02:41.800 | is a competitive programmer.
01:02:43.280 | Like, I think a lot of people at Modal
01:02:45.320 | are competitive programmers.
01:02:46.800 | I think you met Akshat through--
01:02:47.960 | - Akshat, co-founder is also III Gold Medal.
01:02:50.600 | By the way, gold medal doesn't mean you win.
01:02:52.720 | Like, but, although we actually had an intern
01:02:54.800 | that won Iowa.
01:02:56.000 | Gold Medal is like the top 20, 30 people roughly.
01:02:58.720 | - Yeah.
01:02:59.840 | And so like, obviously it's very hard
01:03:01.840 | to get hired at Modal,
01:03:02.840 | but like, what is it like to work
01:03:07.280 | with like such a talent density?
01:03:09.000 | Like, you know, how is that contributing
01:03:10.960 | to the culture at Modal?
01:03:12.200 | - Yeah, I mean, I think humans are the root cause
01:03:15.040 | of like everything at a company, right?
01:03:17.160 | Like, you know, bad code is because it's bad human
01:03:20.120 | or like whatever, you know, bad culture.
01:03:21.400 | So like, I think, you know, like talent density
01:03:23.640 | is very important and like keeping the bar high
01:03:25.400 | and like hiring smart people.
01:03:26.680 | And, you know, it's not always like the case
01:03:28.400 | that like high and competitive programmers,
01:03:29.880 | it's the right strategy, right?
01:03:31.040 | If you're building something very different,
01:03:32.320 | like you may not, you know,
01:03:33.520 | but we actually end up having a lot of like hard,
01:03:36.400 | you know, complex challenges.
01:03:37.960 | Like, you know, I talked about like the cloud,
01:03:40.480 | you know, the resource allocation,
01:03:42.960 | like turns out like that actually,
01:03:44.240 | like you can phrase that as a mixed integer
01:03:45.920 | programming problem.
01:03:46.800 | Like we now have that running in production,
01:03:48.280 | like constantly optimizing how we allocate cloud resources.
01:03:51.440 | There's a lot of like interesting and like complex,
01:03:53.360 | like scheduling problems and like,
01:03:55.280 | how do you do all the bin packing of all the containers?
01:03:57.480 | Like, so I, you know, I think for, you know,
01:04:00.600 | for what we're building, you know,
01:04:02.000 | it makes a lot of sense to hire these people
01:04:03.480 | who like those very hard problems.
01:04:05.720 | - Yeah, and they don't necessarily have to know
01:04:07.760 | the details of the stack.
01:04:08.880 | They just need to be very good at algorithms.
01:04:11.920 | - No, but my feeling is like people who are like
01:04:14.400 | pretty good at competitive programming,
01:04:16.000 | they can also pick up like other stuff like elsewhere.
01:04:19.160 | Not always the case, but you know,
01:04:20.920 | there's definitely a high correlation.
01:04:22.160 | - Yeah, oh yeah, I'm just, I'm interested in that
01:04:24.640 | just because, you know, like there's competitive
01:04:28.400 | mental talents in other areas,
01:04:31.000 | like competitive speed memorization or whatever.
01:04:34.520 | And like, you don't really see those transfer.
01:04:37.320 | And I always assumed in my narrow perception
01:04:39.920 | that competitive programming is so specialized,
01:04:42.560 | it's so obscure, even like so divorced
01:04:45.440 | from real world scenarios
01:04:47.640 | that it doesn't actually transfer that much.
01:04:49.600 | But obviously I think for the problems
01:04:50.600 | that you work on, it does.
01:04:52.280 | - But it's also like, you know, frankly,
01:04:53.840 | it's like translates to some extent,
01:04:55.760 | not because like the problems are the same,
01:04:57.520 | but just because like it sort of filters for the,
01:04:59.120 | you know, people who are like willing to go very deep
01:05:01.760 | and work hard on things, right?
01:05:04.080 | Like, I feel like a similar thing is like
01:05:06.600 | a lot of good developers are like talented musicians.
01:05:09.880 | Like why, like why is this a correlation?
01:05:12.200 | And like, my theory is like, you know,
01:05:13.960 | it's the same sort of skill.
01:05:14.880 | Like you have to like just hyper focus on something
01:05:16.880 | and practice a lot.
01:05:18.120 | Like, and there's something similar
01:05:19.400 | that I think creates like good developers.
01:05:21.760 | - Sweden also had a lot of very good Counter-Strike players.
01:05:24.560 | I don't know, why does Sweden have fiber optics
01:05:27.720 | before all of Europe?
01:05:28.840 | I feel like, I grew up in Italy
01:05:31.000 | and our internet was terrible.
01:05:34.280 | And then I feel like all the Nordics
01:05:36.440 | had like amazing internet.
01:05:37.760 | I remember getting online and people in the Nordics
01:05:40.040 | had like five ping, 10 ping.
01:05:41.960 | - Yeah, we had very good network back then.
01:05:44.360 | - Yeah, do you know why?
01:05:45.720 | - I mean, I'm sure like, you know,
01:05:47.680 | I think the government, you know,
01:05:49.120 | did certain things quite well, right?
01:05:51.240 | Like in the nineties, like there was like
01:05:53.000 | a bunch of tax rebates for like buying computers.
01:05:55.080 | And I think there was similar like investments
01:05:56.760 | in infrastructure that, you know,
01:05:58.920 | I mean, like, and I think like I always think about,
01:06:00.320 | you know, it's like,
01:06:01.320 | I still can't use my phone in the subway in New York.
01:06:03.840 | And that's, you know,
01:06:04.680 | and that was something I could use in Sweden in '95.
01:06:07.640 | You know, we're talking like 40 years almost, right?
01:06:09.880 | Like, why?
01:06:11.800 | And I don't know, like I think certain infrastructure,
01:06:14.000 | you know, Sweden was just better at, I don't know.
01:06:17.080 | - And also, you never owned a TV or a car?
01:06:19.760 | - Never owned a TV or a car.
01:06:20.840 | I never had a driver's license.
01:06:21.960 | - How do you do that in Sweden though?
01:06:23.600 | Like that's cold.
01:06:24.440 | - I grew up in a city.
01:06:25.400 | I mean, like I took the subway everywhere
01:06:28.140 | with bike or whatever.
01:06:29.280 | Yeah, I always lived in cities.
01:06:31.360 | So I don't, you know, I never felt,
01:06:33.520 | I mean, like we have, like me and my wife has a car,
01:06:37.000 | but like I--
01:06:37.840 | - That doesn't count.
01:06:39.840 | - I mean, it's her name 'cause I don't have a driver's license.
01:06:41.680 | She drives me everywhere, it's nice.
01:06:44.240 | - Nice.
01:06:45.080 | - That's fantastic.
01:06:45.920 | Great.
01:06:47.640 | You know, any, I was gonna ask you,
01:06:49.840 | like the last thing I had on this list was, you know,
01:06:53.280 | your advice to people thinking about running some
01:06:55.320 | sort of run code in the cloud startup is only do it
01:06:58.120 | if you're genuinely excited about spending five years
01:07:00.280 | thinking about load balancing, page faults,
01:07:01.480 | cloud security, and DNS.
01:07:02.680 | So basically like, it sounds like you're summing up
01:07:04.160 | a lot of pain running Modo.
01:07:06.560 | - Yeah.
01:07:07.400 | I mean, like, that's like, like one thing I struggle with,
01:07:10.400 | like I talked to a lot of people starting companies
01:07:12.840 | in the data space or like AI space or whatever,
01:07:15.440 | and they could have sort of come at it at like,
01:07:17.440 | as, you know, from like an application developer
01:07:19.320 | point of view, and they're like, I'm gonna make this better.
01:07:21.320 | But like, guess how you have to make it better?
01:07:23.000 | It's like, you have to go very deep
01:07:24.360 | on the infrastructure layer.
01:07:25.360 | And so, and so one of my frustrations has been like,
01:07:28.280 | so many startups are like, in my opinion,
01:07:29.600 | like Kubernetes wrappers and like, you know,
01:07:31.760 | like, and not very like thick wrappers,
01:07:33.660 | like fairly thin wrappers.
01:07:34.960 | And I think, you know, every startup is a wrapper
01:07:37.120 | to some extent, but like, you need to be like a fat wrapper.
01:07:39.200 | You need to like go deep and like build some stuff.
01:07:41.240 | And that's like, you know, if you build a tech company,
01:07:43.280 | you're gonna wanna have, you're gonna have to spend,
01:07:45.160 | you know, five, 10, 20 years of your life,
01:07:47.280 | like going very deep and like, you know,
01:07:49.320 | building the infrastructure you need in order to like,
01:07:51.480 | make your product truly stand out and be competitive.
01:07:54.360 | And so, you know, I think that goes for everything.
01:07:56.280 | I mean, like you're starting a whatever, you know,
01:07:59.000 | online retailer of, I don't know, bathroom sinks,
01:08:02.760 | you probably have to spend, you know, 10,
01:08:05.440 | you have to be willing to spend 10 years of your life
01:08:07.840 | thinking about, you know, whatever, bathroom sinks.
01:08:10.480 | Like, otherwise it's gonna be hard.
01:08:12.360 | - Yeah, yeah, makes sense.
01:08:14.720 | I think that's good advice for everyone.
01:08:15.800 | And yeah, congrats on all your success.
01:08:17.820 | It's pretty exciting to watch, and it's just the beginning.
01:08:20.520 | - Yeah, yeah, yeah, it's exciting.
01:08:22.080 | And everyone should sign up and try out modal, modal.com.
01:08:24.880 | - Yeah, now it's GA, yay.
01:08:26.000 | - Yeah.
01:08:26.840 | - Used to be behind a wait list.
01:08:28.000 | - Yeah.
01:08:29.480 | - Awesome, Eric.
01:08:30.300 | Thanks so much for coming on.
01:08:31.140 | - Yeah, it's amazing.
01:08:31.980 | Thank you so much.
01:08:32.800 | - Thanks.
01:08:33.640 | (upbeat music)
01:08:36.220 | (upbeat music)
01:08:38.800 | (upbeat music)
01:08:41.380 | (upbeat music)
01:08:43.960 | (upbeat music)
01:08:46.540 | (upbeat music)
01:08:49.120 | (upbeat music)
01:08:51.700 | (upbeat music)
01:08:54.280 | [BLANK_AUDIO]