Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

- Hey everyone, welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Residence and Decibel Partners. And I'm joined by my co-host, Swiggs, founder of Small AI. - Hey, and today we have in the studio Eric Bernhardsson for Modo, welcome. - Hi, it's awesome being here.

- Yeah, awesome seeing you in person. I've seen you online for a number of years as you were building on Modo. And I think you were just making a San Francisco trip just to see people here, right? Like I've been to like two Modo events in San Francisco here.

- Yeah, that's right, we're based in New York. So I figured sometimes I have to come out to, you know, capital of AI and make a presence. - What do you think is the pros and cons of building in New York? - I mean, I never built anything elsewhere.

Like I lived in New York last 12 years. I love the city. Obviously there's a lot more stuff going on here and there's a lot more customers and that's why I'm out here. I do feel like for me where I am in life, like I'm a very boring person.

Like I kind of work hard and then I go home and hang out with my kids. Like I don't have time to go to like events and meetups and stuff anyway. So in that sense, like New York is kind of nice. Like I walk to work every morning, it's like five minutes away from my apartment.

It's like very time efficient in that sense. - Yeah, yeah, it's also a good life. So I'll do a brief bio and then we'll talk about anything else that people should know about you. So you actually, I was surprised to find this out. You're from Sweden. You went to college in KTH.

- Yep, yep, Stockholm. - And your master's was in implementing a scalable music recommender system. - Yeah. - I had no idea. - Yeah, yeah, yeah, yeah. So I actually studied physics, but I grew up coding and I did a lot of programming competition. And then like as I was like thinking about graduating, I got in touch with an obscure music streaming startup called Spotify, which was then like 30 people.

And for some reason I convinced them like, why don't I just come and like write a master's thesis with you and like I'll do some cool collaborative filtering. Despite not knowing anything about collaborative filtering really. I sort of, but no one knew anything back then. So I spent six months at Spotify basically building a prototype of a music recommendation system and then turned that into a master's thesis.

- Yeah. - And then later when I graduated, I joined Spotify full-time. - Yeah, yeah. And then, so that was the start of your data career. You also wrote a couple of popular sort of open source tooling while you were there. And then you joined, is that correct or?

- No, that's right. I mean, I was at Spotify for seven years. It was a long stint and Spotify was a wild place early on. And I mean, the data space is also a wild place. I mean, it was like Hadoop cluster in the like foosball room on the floor.

And so like, it was a lot of crude, like very basic infrastructure and I didn't know anything about it. And like I was hired to kind of figure out data stuff and I started hacking on a recommendation system and then got sidetracked in a bunch of other stuff. I fixed a bunch of reporting things and set up A/B testing.

I started doing like business analytics and later got back to music recommendation system. And a lot of the infrastructure didn't really exist. Like there was like Hadoop back then, which is kind of bad and I don't miss it, but spent a lot of time with that. As a part of that, I ended up building a workflow engine called Luigi, which is like briefly, like somewhat like widely ended up being used by a bunch of companies.

Sort of like, you know, kind of like Airflow, but like before Airflow. I think it did some things better, some things worse. I also built a vector database called Annoy, which is like for a while, it was actually quite widely used in 2012. So it was like way before like all this like vector database stuff ended up happening.

And funny enough, I was actually obsessed with like vectors back then. Like I was like, this is gonna be huge. Like just give it like a few years. I didn't know it was gonna take like nine years and then there's gonna suddenly be like 20 startups doing vector databases in one year.

So it did happen in that sense, I was right. I'm glad I didn't start a startup in the vector database space. I would have started way too early. But yeah, that was, yeah, it was a fun seven years at Spotify. It was a great culture, a great company. - Yeah, just to take a quick tangent on this vector database thing, 'cause we probably won't revisit it.

But like, has anything architecturally changed in the last nine years? Or. (laughs) - I mean, sort of. Like I'm actually not following it like super closely. I think, you know, they're like, some of the best algorithms are still the same as like hierarchical, navigable, small world, whatever. Exactly, yeah, HNSW.

I think now there's like product quantization. There's like some other stuff that I haven't really followed super closely. I mean, obviously like back then it was like, you know, and always like very simple. It's like a C++ library with Python bindings and you could mmap big files into memory and like they had some lookups.

And I used like this kind of recursive like hyperspace splitting strategy, which is not that good, but it sort of was good enough at that time. But I think a lot of like HNSW is still like what people generally use. Now, of course, like databases are much better in the sense like to support like inserts and updates and stuff like that.

I know I never supported that. Yeah, it's sort of exciting to finally see like vector databases becoming a thing. - Yeah, yeah. And then maybe one takeaway on most interesting lesson from Daniel Ek. - I mean, I think Daniel Ek, you know, he started Spotify very young. Like he was like 25, something like that.

I don't know if it was like a good lesson, but like he, in a way, like, I think he was very good leader. Like there was never anything like, and no scandals or like no, he wasn't very eccentric at all. It was just kind of like very like level-headed, like just like ran the company very well.

Like never made any like obvious mistakes or, I think it was like a few bets that maybe like in hindsight were like a little, you know, like took us, you know, too far in one direction or another. But overall, I mean, I think he was a great CEO, like definitely, you know, up there, like generational CEO, at least for like Swedish startups.

- Yeah, yeah, for sure. Okay, we should probably move to, make our way to his model. So then you spent six years as CTO of Better. - Yeah. - You were an early engineer and then you scaled up to like 300 engineers. - I joined as a CTO when there was like no tech team and yeah, that was a wild chapter in my life.

Like the company did very well for a while and then like during the pandemic. - Less well. - Yeah, it was kind of a weird story, but yeah, they kind of collapsed and, you know. - Laid off people poorly. - Yeah, yeah, it was like a bunch of stories.

Yeah, I mean, the company like grew from like 10 people when I joined at 10,000, now it's back to 1,000. But yeah, they actually went public a few months ago, kind of crazy. They're still around, like, you know, they're still, you know, doing stuff. So yeah, very kind of interesting six years of my life.

For non-technical reasons, mostly like. But yeah, like I managed like 300, 400. - Management, scaling. - Like learning a lot of that, like recruiting. I spent all my time recruiting and stuff like that. And so managing at scale, it's like nice, like now in a way, like when I'm building my own startup, like that's actually something I like don't feel nervous about at all.

Like I've managed that scale, like I feel like I can do it again. It's like very different things that I'm nervous about as a startup founder. But yeah, I started Modal three years ago after sort of, after leaving Better. I took a little bit of a time off during the pandemic.

But yeah, pretty quickly I was like, I gotta build something. I just wanna, you know. And then yeah, Modal took form in my head, took shape. - And as far as I understand, and maybe we can sort of trade off questions. So the quick history is, started Modal in 2021, got your seed with Sarah from Amplify 2022.

Last year you just announced your Series A with Redpoint. - That's right. - And that brings us up to mostly today. - Yeah. - And so like most people I think were expecting you to build for the data space. - But it is the data space. - It is the data space.

When I think of data space, so I come from like, you know, Snowflake, BigQuery, you know, Fivetrain, Nearby, that kind of stuff. - Yeah. - And so, you know, what Modal became is more general purpose than that. - Yeah, yeah. I don't know, it was like fun. I actually ran into like Ido Libert as CEO of Pinecone like a few weeks ago.

And he was like, I was so afraid you were building a vector database. (laughing) No, yeah, it's like, I started Modal because, you know, like in a way, like I work with data like throughout most of my career, like every different part of the stack, right? Like I thought everything was like business analytics to like deep learning, you know, like building, you know, training neural networks to scale, like everything in between, right?

And so one of the thoughts like, and one of the observations I had when I started Modal or like why I started was like, I just wanted to make, build better tools for data teams. And like very, like that's sort of an abstract thing, but like I find that the data stack is, you know, full of like point solutions that don't integrate well and still, when you look at like data teams today, you know, like every startup ends up building their own internal Kubernetes wrapper or whatever.

And, you know, all the different data engineers and machine learning engineers end up kind of struggling with the same things. So I started thinking about like how, how do I build a new data stack, which is kind of a megalomaniac project. Like, 'cause you kind of want to like throw out everything and start over.

- It's almost a modern data stack. (laughing) - Yeah, like a postmodern data stack. And so I started thinking about that and a lot of it came from like, like more focused on like the human side of like, how do I make data things more productive? And like, what is the technology tools that they need?

And like, you know, drew out a lot of charts of like, how the data stack looks, you know, what are the different components. And it shows actually very interesting, like workflow scheduling, 'cause it kind of sits in like a nice sort of, you know, it's like a hub in the graph of like data products.

But it was kind of hard to like kind of do that in a vacuum and also to monetize it to some extent. And I got very interested in like the layers below at some point and like, at the end of the day, like most people have code to have to run somewhere.

And I started thinking about like, okay, well, how do you make that nice? Like, how do you make that? And in particular, like the thing I always like thought about like developer productivity is like, I think the best way to measure developer productivity is like in terms of the feedback loops.

Like how quickly when you iterate, like when you write code, like how quickly can you get feedback? And at the innermost loop, it's like running some, like writing code and then running it. And like, as soon as you start working with the cloud, like it's like, takes minutes suddenly 'cause you have to build a fucking Docker container and push it to the cloud and like run it, you know?

So that was like the initial focus for me. It was like, I just wanna solve that problem. Like I wanna, you know, build something unless you're on things in the cloud and like retain this sort of, you know, the joy of productivity as when you're running things locally. And in particular, I was quite focused on data teams 'cause I think they had a couple of unique needs that wasn't well served by the infrastructure at that time or like still isn't.

Like in particular, like Kubernetes, I feel like it's like kind of worked okay for backend teams, but not so well for data teams. And very quickly, I got sucked into like a very deep, like rabbit hole of like-- - Not well for data teams because of burstiness. - Burstiness is one thing, yeah, for sure.

So like burstiness is like one thing, right? Like when you, like, you know, like you often have this like fan out, you wanna like apply some function over very large data sets. Another thing tends to be like hardware requirements. Like you need like GPUs. And like, I've seen this in many companies.

Like you go, you know, data engineers go to like, or data scientists go to a platform team and they're like, "Can we add GPUs to the Kubernetes?" They're like, "No, like that's, you know, complex." And we're not gonna... Or like, so like just getting GPU access. And then like, I mean, I also like data code, like frankly, or like machine learning code, like tends to be like super annoying in terms of like environments.

Like you end up having like a lot of like custom, like containers and like environment conflicts. And like, so it ends up having a lot of like annoying, like it's very hard to set up like a unified container that like can serve like a data scientist because like there's always like packages that break.

And so I think there's a lot of different reasons, why, you know, the technology wasn't well-suited for backend. And I think the attitude at that time was often like, you know, like you had friction between the data team and the platform team. Like, well, it works for the backend stuff.

Like, why can't you use it? Like, you know, why don't you just like, you know, make it work? But like, I actually felt like data teams at that point, you know, or at this point now, like there's so much, so many people working with data and like they, to some extent, like deserve their own tools and their own tool chains.

And like optimizing for that is not something people have done. So that's sort of like very abstract philosophical reason why I started Modal. And then I got sucked into this like rabbit hole of like container cold start and, you know, like whatever Linux, page cache, you know, file system optimizations.

- Yeah, tell people, I think the first time I met you, I think you told me some numbers, but I don't remember. Like, what are the main achievements that you were unhappy with the status quo and then you built your own container stack? - Yeah, I mean, like in particular, it was like, like how do you, like in order to have that loop, right?

Like you wanna be able to start, like take code on your laptop, whatever, and like run in the cloud very quickly and like running in custom containers and maybe like spin up like a hundred containers, a thousand, you know, things like that. And so container cold start was the initial, like from like a developer productivity point of view, it was like really what I was focusing on is I wanna take code, I wanna stick it in container, I wanna execute in the cloud and like, you know, make it feel like fast.

And when you look at like how Docker works for instance, like Docker, you have this like fairly convoluted, like very resource inefficient way they, you know, you build a container, you upload the whole container and then you download it and you run it. And Kubernetes is also like not very fast at like starting containers.

So like, so I started kind of like, you know, going a layer deeper, like Docker is actually like, you know, there's like a couple of different primitives, but like a lower level primitive is run C, which is like a container runner. And I was like, what if I just take the container runner, like run C and I point it to like my own root file system.

And then I built like my own file system that like virtual file system that exposes files over network instead. And that was like the sort of very crude version of model. It's like, now I can actually start containers very quickly because it turns out like when you start a Docker container, like first of all, like most Docker images are like several gigabytes.

And like 99% of that is never gonna be consumed. Like there's a bunch of like, you know, like time zone information for like Uzbekistan or whatever, like no one's gonna read it. And then there's a very high overlap between the files that are gonna be read. There's gonna be like LibTorch or whatever, like it's gonna be read.

So you can also cache it very well. So that was like the first sort of stuff we started working on was like, let's build this like container file system and, you know, couple with like, you know, just using Run-C directly. And that actually enabled us to like get to this point of like you write code and then you can launch it in the cloud within like a second or two, like something like that.

And, you know, there's been many optimizations since then, but that was sort of starting point. - Can we talk about the developer experience as well? I think one of the magic things about Modo is at the very basic layers, like a Python function decorator, it's just like stub and whatnot, but then you also have a way to define a full container.

What were kind of the design decisions that went into it? Where did you start? How easy did you want it to be? And then maybe how much complexity did you then add on to make sure that every use case fit? - Yeah, like, I mean, Modo, I almost feel like it's like almost like two products kind of glued together.

I mean, like there's like the low level, like container runtime, like file system, all that stuff like in Rust. And then there's like the Python SDK, right? Like how do you express applications? And I think, I mean, Swix, like I think your blog was like the self-provisioning runtime was like, to me, always like to sort of like the, for me, like an eye-opening thing.

It's like, so I didn't think about like, I want to-- - You wrote your post four months before me. - Yeah? - The software 2.0, Infra 2.0. - Yeah, well, I don't know, like convergence of minds. Like we're thinking, I guess we're like both thinking. Maybe you put, I think, better words than like, maybe it's something I was like thinking about for a long time.

- Yeah, and I can tell you how I was thinking about it on my end, but I want to hear yours. - Yeah, yeah, I would love it. Like, and like, to me, like what I always wanted to build was like, I don't know, like I don't know if you use like Pulumi, like Pulumi is like nice, like in the sense, like it's like Pulumi is like, you describe infrastructure in code, right?

And to me, that was like so nice. Like finally, I can like, you know, put a for loop that creates S3 buckets or whatever. And I think like Modal sort of goes one step further in the sense that like, what if you also put the app code inside the infrastructure code and like glue it all together and then like you only have one single place that defines everything and it's all programmable.

You don't have any config files. Like Modal has like zero config, there's no config. It's all code. And so that was like the goal that I wanted, like part of that. And then the other part was like, I often find that so much of like my time was spent on like the plumbing between containers.

And so my thing was like, well, if I just build this like Python SDK, then and make it possible to like bridge like different containers, just like a function call. Like, and I can say, oh, this function runs in this container and this other function runs in this container and I can just call it just like a normal function.

Then, you know, I can build this applications that may span a lot of different environments. Maybe the fan out start other containers, but it's all just like inside Python. You just like have this beautiful kind of nice, like DSL almost for like, you know, how to control infrastructure in the cloud.

So that was sort of like how we ended up with the Python SDK as it is, which is still evolving all the time. By the way, we keep changing syntax quite a lot 'cause I think it's still somewhat exploratory, but we're starting to converge on something that feels like reasonably good now.

- Yeah, and along the way, with this expressiveness, you enabled the ability to, for example, attach a GPU to a function. - Totally, yeah. It's like, you just like say, you know, on the function decorator, you're like GPU equals, you know, A100 and then, or like GPU equals, you know, A10 or T4 or something like that.

And then you get that GPU and like, you know, you just run the code and it runs. Like you don't have to, you know, go through hoops to, you know, start a EC2 instance or whatever. - Yeah. - So it's all code. - Yeah, so on my end, the reason I wrote Self-Provisioning Runtimes was I was working at AWS and we had AWS CDK, which is kind of like, you know, the Amazon basics blew me.

- Yeah, totally. - And then like, but you know, it creates, it compiles the cloud formation. - Yeah. - And then on the other side, you have to like get all the config stuff and then put it into your application code and make sure that they line up. So then you're writing code to define your infrastructure, then you're writing code to define your application.

And I was just like, this is like obvious that it's gonna convert, right? - Yeah, totally. But isn't there, it might be wrong, but like, was it like Sam or Chalice or one of those, like, isn't that like an AWS thing that where actually they kind of did that?

I feel like there's like one problem. - Sam, yeah, yeah, yeah, yeah. Still very clunky. - Okay. - It's not as arrogant as modal. - I love AWS for like the stuff it's built, you know, like historically in order for me to like, you know, like what it enables me to build.

But like AWS has always like struggled with developer experience, like, and that's been. I mean, they have to not break things. - Yeah, yeah, and totally. And they have to, you know, build products for a very wide range of use cases. And I think that's hard. - Yeah, yeah, so it's easier to design for.

Yeah, so anyway, I was pretty convinced that this would happen. I wrote that thing. And then, you know, imagine my surprise that you guys had it on your landing page at some point. I think Akshad was just like, I just throw that in there. - Did you trademark it?

- No, I didn't. But I definitely got sent a few pitch decks with me, with like my post on there. And it was like really interesting. This is my first time like kind of putting a name to a phenomenon. And I think that's a useful skill for people to just communicate what they're trying to do.

- Yeah, no, I think it's a beautiful concept, yeah. - Yeah, yeah. But I mean, obviously you implemented it. What became more clear in your explanation today is that actually you're not that tied to Python. - No, I mean, I think that all the like lower level stuff is, you know, just running containers and like scheduling things and, you know, serving container data and stuff.

So, I mean, I think Python is a great place. Like one of the benefits of data teams is obviously like, they're all like using Python, right? And so that made it a lot easier. I think, you know, if we had focused on other workloads, like, you know, for various things, like we've been kind of like half thinking about like CI or like things like that.

But like, in a way that's like harder 'cause like you also, then you have to be like, you know, multiple SDKs. Whereas, you know, focusing on data teams, you can only, you know, Python like covers like 95% of all teams. So that made it a lot easier. - I mean, like definitely like in the future, we're gonna have others support, like supporting other languages.

JavaScript for sure is the obvious next language, but, you know, who knows? Like, you know, Rust, Go, R, like whatever, PHP, Haskell, I don't know. - Yeah, and, you know, I think for me, I actually am a person who like kind of liked the idea of programming language advancements being improvements in developer experience.

But all I saw out of the academic sort of PLT type people is just type level improvements. And I always think like, for me, like one of the core reasons for self-provisioning runtimes and then why I like Modo is like, this is actually a productivity increase. - Totally. - Like it's a language level thing.

You know, you managed to stick it on top of an existing language, but it is your own language. - Yeah. - DSL on top of Python. - Yeah. - It's a language level increase on the order of like automatic memory management. You know, you could sort of make that analogy that yeah, you like, maybe you lose some level of control, but most of the time you're okay with whatever Modo gives you.

And like, that's fine. - Yeah, yeah. I mean, that's how I look at about it too. Like, I think, you know, you look at developer productivity over the last number of decades, like, you know, it's come in like small increments of like, you know, dynamic, like dynamic typing, or like, it's like one thing.

It's not suddenly like for a lot of use cases, you don't need to care about type systems or better compiler technology or like, you know, or new ways to, you know, the cloud or like, you know, relational databases. And, you know, I think, you know, you look at like that, you know, history, it's a steadily, you know, it's like, you know, you look at the developers have been getting like probably 10 X more productive every decade for the last four decades or something.

That was kind of crazy. Like on an exponential scale, we're talking about 10 X or is there a 10,000 X, like, you know, improvement in developer productivity. What we can build today, you know, is arguably like, you know, fraction of the cost of what it, you know, took to build it in the eighties.

Maybe it wasn't even possible in the eighties. So to me, like, that's like so fascinating. I think it's going to keep going for the next few decades. - Yeah, yeah. - Another big thing in the infra 2.0 wishlist was truly serverless infrastructure. The other, on your landing page, you called them native cloud functions, something like that.

I think the issue I've seen with serverless has always been people really wanted it to be stateful, even though stateless was much easier to do. And I think now with AI, most model inference is like stateless, you know, outside of the context. So that's kind of made it a lot easier to just put a model, a model, like a AI model on model to run.

How do you think about how that changes how people think about infrastructure too? - Yeah, I mean, I think modal is definitely going in the direction of like doing more stateful things and working with data and like high IO use cases. I do think one like massive, like, serendipitous thing that happened like halfway, you know, a year and a half into like the, you know, building modal was like Gen AI started exploding.

And like the IO pattern of Gen AI is like, fits the serverless model like so well, because like, it's like, you know, you send this tiny piece of information, like a prompt, right? Or something like that. And then like, you have this GPU that does like trillions of flops, and then it sends back like a tiny piece of information.

Right? And that turns out to be something like, you know, if you can get serverless working with GPU, that just like works really well, right? So I think from that point of view, like serverless always, to me, felt like a little bit of like a solution when looking for a problem.

I don't know, I don't actually like, don't think like backend is like the problem that needs to serve it. Or like not as much, but I look at data, and in particular like things like Gen AI, like model inference, like it's like clearly a good fit. So I think that is, you know, to a large extent explains like why we saw, you know, the initial sort of like killer app for modal being model inference, which actually wasn't like necessarily what we're focused on.

But that's where we've seen like by far the most usage and growth. - And was that stable diffusion? - Stable diffusion in particular, yeah. - Yeah, and this was before you started offering like fine tuning of language models. It was mostly stable diffusion. - Yeah, yeah. I mean, like modal, like I always built it to be a very general purpose compute platform, like something where you could run everything.

And I used to call modal like a better Kubernetes for data team for a long time. And what we realized was like, yeah, that's like, you know, a year and a half in, like we barely had any users or any revenue. And like, we were like, well, maybe we should look at like some use case, trying to think of use case.

And that was around the same time stable diffusion came out. And yeah, like, I mean, like the beauty of modal is like you can run almost anything on modal, right? Like modal inference turned out to be like the place where we found initially, well, like clearly this has like 10X more ergonomic, like better ergonomics than anything else.

But we're also like, you know, going back to my original vision, like we're thinking a lot about, you know, now, okay, now we do inference really well. Like what about training? What about fine tuning? What about, you know, end-to-end lifecycle deployment? What about data pre-processing? What about, you know, I don't know, real-time streaming?

What about, you know, large data munging? Like there's just data observability. I think there's so many things, like kind of going back to what I said about like redefining data stack, like starting with the foundation of compute, like one of the exciting things about modal is like, we've sort of, you know, we've been working on that for three years and it's maturing.

But like, this is so many things you can do, like with just like a better compute primitive and also go up to stack and like do all this other stuff on top of it. - Yeah, how do you think about, or rather like, I would love to learn more about the underlying infrastructure and like how you make that happen because with fine tuning and training, it's a static memory that you're gonna, like you exactly know what you're gonna load in memory one and it's kind of like a set amount of compute versus inference, just like data is like very bursty.

How do you make batches work with a serverless developer experience? You know, like what are like some fun technical challenge you solve to make sure you get max utilization on this GPUs? What we hear from people is like, we have GPUs, but we can really only get like, you know, 30, 40, 50% maybe utilization.

- Yeah. - What's some of the fun stuff you're working on to get a higher number there? - Yeah, I think on the inference side, like that's where we like, you know, like from a cost perspective, like utilization perspective, we've seen, you know, like very, very good numbers. And in particular, like it's our ability to start containers and stop containers very quickly.

And that means that we can, you know, we can auto scale extremely fast and scale down very quickly, which means like we can always adjust the sort of capacity, the number of GPUs running to the exact, you know, the traffic volume. And so in many cases, like that actually leads to a sort of interesting thing where like, we obviously run our things on like the public cloud, like AWS GCP, we're on an Oracle.

But in many cases, like users who do inference on those platforms or those clouds, even though we charge a slightly higher price per GPU hour, a lot of users like moving their large scale inference use cases to model, they end up saving a lot of money. 'Cause we only charge for like with the time the GPU is actually running.

And that's a hard problem, right? Like if you go, you know, if you have to constantly adjust the number of machines, if you have to start containers, stop containers, like that's a very hard problem. And that, you know, and starting containers quickly is a very difficult thing. I mentioned we had to build our own file system for this.

We also, you know, built our own container scheduler for that. We're looking, we've implemented recently CPU memory checkpointing, so we can take running containers and snapshot the entire CPU, like including registers and everything, and restore it from that point, which means we can restore it from like an initialized state.

We're looking at GPU checkpointing next, it's like a very interesting thing. So I think on the inference stuff, on the inference side, like that's where serverless really shines, because you can drive, you know, you can push the frontier of latency versus utilization quite substantially, you know, which either ends up being a latency advantage or a cost advantage, or both, right?

On training, it's probably arguably like less of an advantage doing serverless, frankly, 'cause, you know, you can just like spin up a bunch of machines and try to satisfy, like, you know, train as much as you can on each machine. For that area, like we've seen like, you know, arguably like less usage, like for modal.

But there are always like some interesting use case, like we do have a couple of customers, like RAM, for instance, like they do fine tuning with modal, and they basically like one of the patterns they have is like very bursty type fine tuning, where they fine tune 100 models in parallel.

And that's like a separate thing that modal does really well, right? Like we can start up 100 containers very quickly, run a fine tuning training job on each one of them for that only runs for, I don't know, 10, 20 minutes. And then, you know, you can do hyper parameter tuning in that sense, like just pick the best model and things like that.

So there are like interesting training. Like I think when you get to like training like very large foundational models, that's a use case we don't support super well, 'cause that's very high IO, you know, you need to have like infinity band and all these things. And those are things we haven't supported yet, and might take a while to get to that.

So that's like probably like an area where like we're relatively weakened. - Yeah, have you cared at all about lower level model optimization? There's other cloud providers that do custom kernels to get better performance, or are you just given that you're not just an AI compute company? - Yeah, I mean, I think like we wanna support like a generic, like general workloads in a sense that like we want users to give us a container essentially, or a code or code, and then we wanna run that.

So I think, you know, we benefit from those things in the sense that like we, you know, we can tell our users, you know, to use those things. But I don't know if we wanna like poke into users containers and like do those things automatically that's sort of, I think a little bit tricky from the outside to do, 'cause, you know, we wanna be able to take like arbitrary code and execute it.

But certainly like, you know, we can tell our users to like use those things. - Yeah, I may have betrayed my own biases because I don't really think about Modal as four data teams anymore. I think you started that way. I think you're much more for AI engineers. And, you know, one of my favorite anecdotes, which I think you know, but I don't know if you directly experienced it.

I went through the Vercel AI Accelerator, which you supported. - Yeah. - And in the Vercel AI Accelerator, a bunch of startups gave like free credits and like signups and talks and all that stuff. The only ones that stuck are the people, are the ones that actually appealed to engineers.

And the top usage, the top tool used by far was Modal. - Hmm, that's awesome. - For people building with AI apps. - Yeah, I mean, it might be also like a terminology question like the AI versus data, right? Like I've, you know, maybe I'm just like old and jaded, but like I've seen so many like different titles.

Like for a while it was like, you know, I was a data scientist and I was a machine learning engineer and then, you know, there was like analytics engineers and then it was like an AI engineer, you know? So like, to me, it's like, I just like, in my head, that's to me just like- - Just engineer.

- Just data, like, or like engineer, you know? Like, I don't really, so that's why I've been like, you know, just calling it data teams. But like, of course, like, you know, AI is like, you know, like such a massive fraction of our like workloads. - It's a different Venn diagram of things you do, right?

So the stuff that you're talking about where you need like infinity bands for like highly parallel training, that's not, that's more of the ML engineer and that's more of the research scientist. - Yeah, yeah. - And less of the AI engineer, which is more sort of trying to put, work at the application.

- Yeah, I mean, to be fair to it, like, we have a lot of users that are like doing stuff that I don't think fits neatly into like AI. Like, we have a lot of people using like model for web scraping. Like, it's kind of nice. Like, you can just like, you know, fire up like a hundred or a thousand containers running Chromium and just like render a bunch of webpages and it takes, you know, whatever.

Or like, you know, protein folding. Is that, I mean, maybe that's, I don't know. Like, you know, we have a bunch of users doing that or like, you know, in terms of in the realm of biotech, like sequence alignment, like people using, or like a couple of people using like model to run like large, like mixed integer programming problems, like, you know, using Garobi or things like that.

So video processing is another thing that keeps coming up. Like, you know, let's say you have like petabytes of video and you want to just like transcoded, like, or you can fire up a lot of containers and just run FFmpeg or like, so there are those things too. Like, I mean, like that being said, like AI is by far our biggest use case, but, you know, like again, like model is kind of general purpose in that sense.

- Yeah, well maybe, so I'll stick to the stable diffusion thing and then we'll move on to the other use cases of sort of for AI that you want to highlight. The other big player in my mind is Replicate. - Yeah. - In this era. They're much more, I guess, custom built for that purpose, whereas you're more general purpose.

How do you position yourself with them? Are they just for like different audiences or are you just heads on competitive competing? - I think there's like a tiny sliver of the Venn diagram where we're competitive and then like 99% of the area we're not competitive. I mean, I think for people who, if you think of like front engineers, I think that's where like really they found good fit.

It's like, you know, people who built some cool web app and they want some sort of AI capability and they just, you know, an off the shelf model is like perfect for them. That's like use Replicate, that's great, right? Like, I think where we shine is like custom models or custom workflows, you know, running things at very large scale.

We need to care about utilization, care about costs. You know, we have much lower prices 'cause we spent a lot more time optimizing our infrastructure. And, you know, and that's where we're competitive, right? Like, you know, and you look at some of our use cases, like Suno is a big user.

Like they're running like large scale like AI. - We're talking with Mikey in a month. - Yeah, so I mean, they're using Model for like production infrastructure. Like they have their own like custom model, like custom code and custom weights, you know, for AI generated music, Suno.ai. You know, those are the types of use cases that we like, you know, things that are like very custom or like it's like, you know, and those are the things like it's very hard to run and replicate, right?

And that's fine. Like I think they focus on a very different part of the stack in that sense. - And then the other company pattern that I pattern match you to is Modular. I don't know if you-- - 'Cause of the names? - No, no, well, no, but yes, the name is very similar.

I think there's something that might be insightful there from a linguistics point of view. But no, they have Mojo, the sort of Python SDK. And then they have the Modular Inference Engine, which is their cloud stack, their sort of compute inference stack. I don't know if anyone's made the comparison to you before, but I see you evolving a little bit in parallel there.

- No, I mean, maybe, yeah. Like it's not a company I'm like super like familiar, like, I mean, I know the basics, but like, I guess they're similar in the sense like they wanna like do a lot of, you know, they have sort of big picture vision. - Yes, they also wanna build very general purpose.

- Yeah. - And they also are-- - Which I admire. - Marketing themselves as like, if you wanna do off the shelf stuff, go somewhere else. If you wanna do custom stuff, we're the best place to do it. - Yeah, yeah. - There is some overlap there. There's not overlap in the sense that you are a closed source platform, people have to host their code on you.

- That's true. - Whereas for them, they're very insistent on not running their own cloud service. - Yeah. - They're a box software. - Yeah, yeah. - They're licensed software. - I'm sure their VCs at some point can have forced them to reconsider. - No, no, Chris is very, very insistent and very convincing.

(laughs) So anyway, I would just make that comparison, let people make the links if they want to, but it's an interesting way to see the cloud market develop from my point of view 'cause-- - Yeah. - I came up in this field thinking cloud is one thing and I think your vision is like something slightly different and I'd like to see the different takes on it.

- Yeah, and like one thing I've, you know, like I've written a bit about it in my blog too, it's like I think of us as like a second layer of cloud provider in the sense that like, I think Snowflake is like kind of a good analogy. Like Snowflake, you know, is infrastructure as a service, right?

But they actually run on the like major clouds, right? And I mean, like you can like analyze this very deeply, but like one of the things I always thought about is like why did Snowflake already like win over Redshift? And I think Snowflake, you know, to me, one, because like, I mean, in the end, like AWS makes all the money anyway, like and like Snowflake just had the ability to like focus on like developer experience or like, you know, user experience.

And to me, like really proved that you can build a cloud provider, a layer up from, you know, the traditional like public clouds and in that layer, that's also where I would put Modal. It's like, you know, we're building a cloud provider. Like we're, you know, we're like a multi-tenant environment that runs the user code, but also building on top of the public cloud.

So I think there's a lot of room in that space. I think it's very sort of interesting direction. - Yeah, how do you think of that compared to the traditional past history? Like, you know, AWS, then you had Heroku, then you had Render Railway. - Yeah, I mean, I think those are all like great.

Like, I think the problem that they all faced was like the graduation problem, right? Like, you know, Heroku or like, I mean, like also like Heroku, there's like a counterfactual future of like what would have happened if Salesforce didn't buy them, right? Like, that's a sort of separate thing.

But like, I think what Heroku, I think always struggled with was like, eventually companies would get big enough that you couldn't really justify running in Heroku. They would just go and like move it to, you know, whatever AWS or, you know, in particular. And, you know, that's something that keeps me up at night too.

Like, what does that graduation risk look like for modal? I always think like the only way to do, to build infrastructure, to build a successful infrastructure company in the long run in the cloud today is you have to appeal to the entire spectrum, right? Or at least like the enterprise, like you have to capture the enterprise market.

And, but the truly good companies capture the whole spectrum, right? Like I think of companies like, I don't like Datadog or Mongo or something like that, where like they both captured like the hobbyists, like and acquire them, but also like, you know, have very large enterprise customers. So I think that arguably was like where I, in my opinion, like Heroku struggle was like, how do you maintain the customers as they get more and more advanced?

I don't know what the solution is, but I think there's, you know, that's something I would have thought deeply if I was at Heroku at that time. - What's the AI graduation problem? Is it, I need to fine tune the model, I need better economics, any insights from customer discussions?

- Yeah, I mean, better economics certainly, but although like I would say like, even for people who like, you know, needs like thousands of GPUs at very, you know, like just because we can drive utilization so much better, like we, there's actually like a cost advantage of staying on model.

But yeah, I mean, it's certainly like, you know, and then like the fact that VCs like love, you know, throwing money at least used to, you know, at companies who need it to buy GPUs. I think that didn't help the problem. Yeah, and in training, I think, you know, there's less software differentiation.

So in training, I think there's certainly like better economics of like buying big clusters. But I mean, my hope it's gonna change, right? Like I think, you know, we're still pretty early in the cycle of like building AI infrastructure. And, you know, I think a lot of these companies over in the long run, like, you know, they're accepted maybe super big ones, like, you know, the Facebook and Google, they're always gonna build their own ones.

But like everyone else, like some extent, you know, I think they're better off like buying platforms. And, you know, someone's gonna have to build those platforms. - Yeah, cool. Let's move on to language models. And just specifically that workload, just to flesh it out a little bit. You already said that Ramp is like fine tuning a hundred models at once simultaneously on model.

Closer to home, my favorite example is EricBot. Maybe you wanna tell that story. - Yeah, I mean, it was a prototype thing we built for fun, but it was pretty cool. Like we basically built this thing that you can, it like hooks up to Slack. It like downloads all the Slack history and, you know, fine tunes a model based on a person.

And then you can chat with that. And so you can like, you know, clone yourself and like talk to yourself on Slack. I mean, it's like nice, like demo. And it's just like, I think like it's like fully contained model. Like there's a model app that does everything, right?

Like it downloads Slack, you know, integrates with the Slack API, like downloads the stuff, the data, like just runs the fine tuning. And then like creates like dynamically an inference endpoint. And it's all like self-contained and like, you know, a few hundred lines of code. So I think it's sort of a good kind of use case for more, or like it kind of demonstrates a lot of the capabilities of model.

- Yeah, on a more personal side, how close did you feel EricBot was to you? - It definitely captured the like, the language. Yeah, I mean, I don't know, like the content. I mean, like, it's like, I always feel this way about like AI and it's gotten better, but like you look at like AI output of text, like, and it's like, when you glance at it, it's like, yeah, this seems really smart, you know?

But then you actually like look a little bit deeper. It's like, what does this mean? What does this person say? It's like kind of vacuous, right? And that's like kind of what I felt like, you know, talking to like my clone version. Like it like says like things like the grammar is correct.

Like some of the sentences make a lot of sense, but like, what are you trying to say? Like, there's no content here. I don't know. I mean, it's like, I got that feeling also with chat TBT in the like early versions, right? Now it's like better, but. - That's funny.

Yeah, I built this thing called small podcaster to automate a lot of our back office work, so to speak. And it's great at transcript. It's great at doing chapters. And then I was like, okay, how about you come up with a short summary? And it's like, it sounds good, but it's like, it's not even the same ballpark as like what we end up writing.

And it's hard to see how it's gonna get there. - Oh, I have ideas. - I'm certain it's gonna get there, but like, I agree with you, right? And like, I have the same thing. I don't know if you've read like AI generated books, like they just like kind of seem funny, right?

Like there's off, right? But like you glance at it and it's like, oh, it's kind of cool. Like looks correct, but then it's like very weird when you actually read them. - Well, so for what it's worth, I think anyone can join the modal slack. Is it open to the public?

- Yeah, totally. If you go to modal.com, there's a button in the footer. - Yeah, and then you can talk to Eric Bot. And then sometimes, I really like picking Eric Bot, and then you answer afterwards, but then you're like. - Really? - Yeah, I don't know if that's correct or like whatever.

- Cool. - No, so, okay. Any other broader lessons, you know, just broadening out from like the single use case of fine tuning, like what are you seeing people do with fine tuning or just language models on modal in general? - Yeah, I mean, I think language models is interesting because so many people get started with APIs and that's, you know, they're just dominating a space in particular OpenAI, right?

And that's not necessarily like a place where we aim to compete. I mean, maybe at some point, but like it's just not like a core focus for us. And I think sort of separately, it's sort of a question if like there's economics in that long term. But like, so we tend to focus on more like the areas like around it, right?

Like fine tuning, like another use case we have is a bunch of people, Ramp included, is doing batch embeddings on modal. So let's say, you know, you have like a, actually we're like writing a blog post, like where we take all of Wikipedia and like parallelize embeddings in 15 minutes and produce vectors for each article.

So those types of use cases, I think modal suits really well for. I think also a lot of like custom inference, like you have like, you know, structured output guided generation or things like that we have, you want more control. Like those are the things like we see a lot of users using modal for.

But for a lot of people it's like, you know, just go use like GPT-4 and like, you know, that's like a great starting point and we're not trying to compete necessarily like directly with that. - Yeah, when you say parallelize, I think you should give people an idea of the order of magnitude of parallelism because I think people don't understand how parallel.

So like, I think your classic hello world with modal is like some kind of Fibonacci function, right? - Yeah, we have a bunch of different ones. - Some recursive function. - Yeah, yeah, I mean, like, yeah, I mean, it's like pretty easy in modal, like fan out to like, you know, at least like a hundred GPUs, like in a few seconds.

And, you know, if you give it like a couple of minutes, like we can, you know, you can fan out to like thousands of GPUs. Like we run it relatively large scale and yeah, we've run, you know, many thousands of GPUs at certain points when we need it, you know, big backfills or some customers had very large compute needs.

- Yeah, yeah. And I mean, that's super useful for a number of things. One of the reasons actually I, so one of my early interactions with modal as well was with a small developer, which is my sort of coding agent. The reason I chose modal was a number of things.

One, I just wanted to try it out. I just had an excuse to try it. Akshay offered to onboard me. - Yeah, good excuse. - But the most interesting thing was that you could have that sort of local development experience like I was running on my laptop, but then it would seamlessly translate to a cloud service or like a cloud hosted environment.

And then it could fan out with concurrency controls. So I could say like, because like, you know, the number of times I hit the GPT-3 API at the time was gonna be subject to the rate limit from there. But I wanted to fan out without worrying about that kind of stuff.

With modal, I can just kind of declare that in my config and that's it. - Oh, like a concurrency limit? - Yeah. - Yeah, there's a lot of control there, yeah. - Yeah, yeah, yeah. So like, I just wanted to highlight that to people as like, yeah, this is a pretty good use case for like, you know, just like writing this kind of LLM application code inside of this environment that just understands fan out and rate limiting natively.

You don't actually have an exposed queue system, but you have it under the hood, you know, that kind of stuff. - It's a self-provisioning. (laughing) - So the last part of modal I wanted to touch on, and obviously feel free, I know you're working on new features, was the sandbox that was introduced last year.

And this is something that I think was inspired by Code Interpreter. You can tell me the longer history behind that. - Yeah, like we originally built it for the use case. Like, there was a bunch of customers who looked into code generation applications and then they wanted, they came to us and asked us, is there a safe way to execute code?

And yeah, we spent a lot of time on like container security. We used GeoVisor, for instance, which is a Google product that provides pretty strong isolation of code. So we built a product where you can basically run arbitrary code inside a container and monitor its output, or get it back in a safe way.

I mean, over time, it's evolved into more of like, I think the long-term direction is actually, I think, more interesting, which is that I think modal as a platform where I think the core container infrastructure we offer could actually be like, you know, unbundled from like the client SDK and offered to like other, you know, like we're talking to a couple of like other companies that want to run, you know, through their packages, like run, execute jobs on modal, like kind of programmatically.

So that's actually the direction like Sandbox is going. It's like turning into more like a platform for platforms is kind of what I've been thinking about it as. - Oh boy, platform, that's the old Kubernetes line. - Yeah, yeah, yeah, but it's like, you know, like having that ability to like programmatically, you know, create containers and execute them, I think is really cool.

And I think it opens up a lot of interesting capabilities that are sort of separate from the like core Python SDK in modal. So I'm really excited about C. I mean, it's like one of those features that we kind of released in like, you know, then we kind of look at like what users actually build with it.

And people are starting to build like kind of crazy things. And then, you know, we double down on some of those things 'cause when we see like, you know, potential new product features. And so Sandbox, I think in that sense, it's like kind of in that direction, we found a lot of like interesting use cases in the direction of like, it's like platformized container runner.

- Can you be more specific about what you're double down on after seeing users in action? - Yeah, I mean, we're working with like some companies that, I mean, without getting into specifics, like that need the ability to take their user's code and then launch containers on modal. And it's not about security necessarily, like they just want to use modal as a backend, right?

Like they may already provide like Kubernetes as a backend, Lambda as a backend, and now they want to add modal as a backend, right? And so, you know, they need a way to programmatically define jobs on behalf of their users and execute them. And so I don't know, that's kind of abstract, but does that make sense?

- Yeah, I totally get it. It's sort of one level of recursion to sort of be the modal for their customers. - Exactly, yeah, exactly. - And CloudFlare has done this, you know, Kenton Vardar from CloudFlare, who's like the tech lead on this thing, called it sort of functions as a service as a service.

- Yeah, that's exactly right. Fast sass. - Fast sass. - Fast sass. - Yeah, like, I mean, like that, I think any base layer, second layer, like cloud provider like yourself, compute provider like yourself should provide. It is like very, very, you know, it's a marker of maturity and success that people just trust you to do that.

They'd rather build on top of you than compete with you. Like the more interesting thing for me is like, what does it mean to serve a computer, like a LLM customer developer, rather than a human developer, right? Like, that's what a sandbox is to me. - Yeah, for sure.

- That you have to sort of redefine modal to serve a different non-human audience. - Yeah, yeah, yeah. And I think there's some really interesting people, you know, building very cool things. - Yeah, so I don't have an answer, but, you know, I imagine things like, hey, the way you give feedback is different.

Maybe you have to like stream errors, log errors differently. I don't really know. (laughs) - Yeah. - Obviously there's like safety considerations. Maybe you have a API to like restrict access to the web. - Yeah. - I don't think anyone would use it, but it's there if you want it.

- Yeah, yeah. - Any other sort of design considerations? I have no idea. - With sandboxes? - Yeah, open-ended question here. Yeah, I mean, no, I think, yeah, the network restrictions, I think, make a lot of sense. Yeah, I mean, I think, you know, long-term, like I think there's a lot of interesting use cases where like the LLM instead, in itself, can like decide I want to install these packages and like run this thing.

And like, obviously, for a lot of those use cases, like you want to have some sort of control that it doesn't like install malicious stuff and steal your secrets and things like that. But I think that's what's exciting about the sandbox primitive, is like it lets you do that in a relatively safe way.

- Yeah, cool. Do you have any thoughts on the inference wars? So a lot of providers are just rushing to the bottom to get the lowest price per million tokens. Some of them, you know, the Sean Randomat, they're just losing money. There's like the physics of it just don't work out for them to make any money on it.

How do you think about your pricing and like how much premium you can get and you can kind of command versus using lower prices as kind of like a wedge into getting there, especially once you have model instrumented? Yeah, what are the trade-offs and any thoughts on strategies that work?

- I mean, we focus more on like custom models and custom code. And I think in that space, there's like less competition. And I think we can, you know, have a pricing markup, right? Like, you know, people will always compare our prices to like, you know, the GPU power they can get elsewhere.

And so how big can that markup be? Like it never can be, you know, we can never charge like 10X more, but we can certainly charge a premium. And like, you know, for that reason, like we can have pretty good margins. The LLM space is like the opposite. Like the switching costs of LLMs is zero, right?

Like if all you're doing is like straight up, like at least like open source, right? Like if all you're doing is like, you know, using some, you know, inference endpoint that serves an open source model and, you know, some other provider comes along and like offers a lower price, you're just gonna switch, right?

So I don't know, to me, that reminds me a lot of like, all this like 15 minute delivery wars or like, you know, like Uber versus Lyft or Jaffa Kings versus Fanta, or like, maybe that's not, but like, you know, and like maybe going back even further, like I think a lot about like the sort of, you know, flip side of this is like, this actually positive side of it is like, like, I think I thought a lot about like fiber optics boom of like 98, 99, like the other day, or like, you know, and also like the over-investment in GPU today.

Like, yeah, like, you know, I don't know. Like in the end, like I don't think VCs will have the return they expected, like, you know, in these things, but guess who's gonna benefit? Like, you know, it's the consumers, right? Like, someone's like reaping the value of this. And that's, I think, an amazing flip side is that, you know, we should be very grateful, you know, the fact that like VCs wanna subsidize these things, which is, you know, like you go back to the fiber optics, like there's the extreme like over-investment in fiber optics network in the 99, like 98, and no one made money who did that.

But consumers, you know, got tremendous benefits of all the fiber optics cables that were led, you know, throughout the country in the decades after. I feel something similar about like GPUs today, and also like specifically looking more narrowly at like LLM in France market, like that's great. Like, you know, I'm very happy that, you know, there's a price war.

Modal is like not necessarily like participating in that price war, right? Like, I think, you know, it's gonna shake out and then someone's gonna win and then they're gonna raise prices or whatever. Like, we'll see how that works out. But it's not, for that reason, like we're not like focused, like we're not hyper focused on like serving, you know, just like straight up, like here's an end point to an open source model.

We think the value in Modal comes from all these, you know, the other use cases, the more custom stuff like fine tuning and very complex, you know, guided output, like type stuff, or like also like in other, like outside of LLMs, like we focus a lot more on like image, audio, video stuff, 'cause that's where there's a lot more proprietary models, there's a lot more like custom workflows, and that's where I think, you know, Modal is more, you know, there's a lot of value in software differentiation.

I think focusing on developer experience, developer productivity, that's where I think, you know, you can have more of a competitive mode. - Yeah. I'm curious what the difference is gonna be now that it's an enterprise. So like with DoorDash, Uber, they're gonna charge you more. And like, as a customer, like you can decide to not take Uber, but if you're a company building AI features in your product using the subsidized prices, and then, you know, the VC money dries up in a year and like prices go up, it's like, you can't really take the features back without a lot of backlash, but you also can not really kill your margins by paying the new price.

So I don't know what that's gonna look like, but. - But like margins are gonna go up for sure, but I don't know if prices will go up, 'cause like GPU prices have to drop eventually, right? So like, you know, like in the long run, I still think like prices may not go up that much, but certainly margins will go up.

Like, I think you said, Svek, that margins are negative right now. Like, you know, obviously- - For some people. - That's not sustainable. So certainly margins will have to go up. Like some companies are gonna have to make money in this space. Otherwise, like they're not gonna provide the service, but that's the equilibrium too, right?

Like at some point, like, you know, that it sort of stabilizes and one or two or three providers make money. - Yeah, what else is maybe underrated, immoral, something that people don't talk enough about or yeah, that we didn't cover in the discussion? - Yeah, I think what are some other things?

We talked about a lot of stuff. Like we have the bursty parallelism. I think that's pretty cool. Working on a lot of like, trying to figure out like, like kind of thinking more about the roadmap, but like one of the things I'm very excited about is building primitives for like more like IO intensive workloads.

And so like we're building some like crude stuff right now where like you can like create like direct TCP tunnels to containers and that lets you like pipe data. And like, you know, we haven't really explored this as much as we should, but like there's a lot of interesting applications.

Like you can actually do like kind of real-time video stuff in modal now because you can like create a tunnel to, yeah, exactly. You can create a raw TCP socket to a container, feed it video and then like, you know, get the video back. And I think like, it's still like a little bit like, you know, not fully ergonomically like figured out, but I think there's a lot of like super cool stuff.

Like when we start enabling those more like high IO workloads, I'm super excited about. I think also like, you know, working with large datasets or kind of taking the ability to map and fan out and like building more like higher level, like functional primitives, like filters and group buys and joins.

Like I think there's a lot of like really cool stuff you can do, but this is like, maybe like, you know, years out like. - Yeah, we can just broaden out from modal a little bit, but you still have a lot of, you have a lot of great tweets.

So it's very easy to just kind of go through them. Why is Oracle underrated? - I love Oracle's GPUs. I mean, like, I don't know why, you know, what the economics looks like for Oracle, but like, I think they're great value for money. Like we run a bunch of stuff in Oracle and they have bare metal machines, like two terabytes of RAM.

They're like super fast SSDs and yeah. Like compared, you know, I mean, we love AWS and AGSP too. We have great relationships with them, but I think Oracle's surprising. Like, you know, if you told me like three years ago that I would be using Oracle cloud, like what, wait, why?

But now I'm, you know, I'm a happy customer. - And it's a combination of pricing and the kinds of SKUs, I guess, they offer. - Yeah, great, great machines, good prices, you know. - That's it. - Yeah, yeah. - That's all I care about. - Yeah, the sales team is pretty fun too.

Like, I like them. - In Europe, people often talk about Hetzner. - Yeah, I'm not, I don't know, like Sue, like we've focused on the main clouds, right? Like we've, you know, Oracle, AWS, GCP, we'll probably add Azure at some point. I think, I mean, there's definitely a long tail of like, you know, CoreWeave, Hetzner, like Lambda, like all these things.

And like over time, I think we'll look at those too. Like, you know, wherever we can get the right, you know, GPUs at the right price. Yeah, I mean, I think it's fascinating. Like, it's a tough business. Like, I wouldn't want to try to build like a cloud provider.

You know, it's just, you just have to be like incredibly focused on like, you know, efficiency and margins and things like that. But I mean, I'm glad people are trying. - Yeah, and you can ramp up on any of these clouds very quickly, right? 'Cause it's-- - Yeah, I mean, yeah.

Like, I think so. Like, we, like a lot of, you know, what Modal does is like programmatic, you know, launching and termination of machines. So that's like, what's nice about the clouds is, you know, they're relatively like immature APIs for doing that, as well as like, you know, support for Terraform for all the networking and all this stuff.

That makes it easier to work with the big clouds. But yeah, I mean, some of those things, like I think, you know, I also expect the smaller clouds to like embrace those things in the long run. But also think, you know, we can also probably integrate with some of the clouds, like even without that.

There's always an HTML API that you can use. Just like script something that launches instances, like through the web. - Yeah, I think a lot of people are always curious about whether or not you will buy your own hardware someday. I think you're pretty firm in that it's not your interest.

But like your story and your growth does remind me a little bit of Cloudflare, which obviously, you know, invests a lot in its own physical network. - Yeah, I don't remember like early days. Like, did they have their own hardware or? - They bootstrapped a lot with like agreements through other, you know, providers.

- Yeah, okay, interesting. - But now it's all their own hardware. - Yeah. - So I understand. - Yeah, I mean, my feeling is that when you're a venture funded startup, like buying physical hardware is maybe not the best use of the money. - No, I really wanted to put you in a room with Isocat from Poolside.

- Yeah. - Totally opposite view. - Yeah. - This is great, yeah. - I mean, I don't, I just think for like a capital efficiency point of view, like do you really want to tie up that much money in like, you know, physical hardware and think about depreciation and like, like as much as possible, like I, you know, I favor a more capital efficient way of like, we don't want to own the hardware 'cause then, and ideally we want to, we want the sort of margin structure to be sort of like 100% correlated revenue in cogs in the sense that like, you know, when someone comes and pays us, you know, $1 for compute, like, you know, we immediately incur a cost of like whatever, 70 cents, 80 cents, you know, and there's like complete correlation between cost and revenue.

'Cause then you can leverage up in like a, kind of a nice way, you can scale very efficiently. You know, like that's not, you know, turns out like that's hard to do. Like you can't just only use like spotting on demand instances. Like over time, we've actually started adding pretty significant amount of reservations too.

So I don't know, like reservation is always like one step towards owning your own hardware. Like, I don't know, like, do we really want to be, you know, thinking about switches and cooling and HVAC and like power supplies. - Disaster recovery. - Yeah, like, is that the thing I want to think about?

Like, I don't know, like I like to make developers happy, but who knows, like maybe one day, like, but I don't think it's gonna happen anytime soon. - Yeah, obviously for what it's worth, obviously I'm a believer in cloud, but it's interesting to have the devil's advocate on the other side.

The main thing you have to do is be confident that you can manage your depreciation better than the typical assumption, which is two to three years. - Yeah, yeah. - And so the moment you have a CTO that tells you, "No, I think I can make these things last seven years," then it changes the math.

- Yeah, yeah, but you know, are you deluding yourself then? That's the question, right? - Yeah, yeah. - It's like the waste management scandal. Do you know about that? Like they had all this like accounting scandal back in the '90s, like this garbage company, like was, they like started assuming their garbage trucks had a 10-year depreciation schedule, booked like a massive profit.

You know, the stock went to like, you know, up like, you know, and then it turns out actually all those garbage trucks broke down and like you can't really depreciate them over 10 years. And so, so then the whole company, you know, they had to restate all the earnings and leaves.

- Nice. Let's go into some personal nuggets. You received the IOI gold medal, which is the International Olympiad in Informatics. - 20 years ago. - Yeah. - How have these models and like going to change competitive programming? Like, do you think people still love the craft? I feel like over time, we're kind of like, programming has kind of lost maybe a little bit of its luster in the eyes of a lot of people.

Yeah, I'm curious to see what you think. - I mean, maybe, but like, I don't know, like, you know, I've been coding for almost 30, or more than 30 years. And like, I feel like, you know, you look at like programming and, you know, where it is today versus where it was, you know, 30, 40, 50 years ago, there's like probably thousand times more developers today than, you know, so like, and every year there's more and more developers.

And at the same time, developer productivity keeps going up. And so like, I actually don't expect, like, and when I look at the real world, I just think there's so much software that's still waiting to be built. Like, I think we can, you know, 10X the amount of developers and still, you know, have a lot of people making a lot of money, you know, building amazing software, and also being while at the same time being more productive.

Like, I never understood this, like, you know, AI is gonna, you know, replace engineers. That's very rarely how this actually works. When AI makes engineers more productive, like the demand actually goes up because the cost of engineers goes down because you can build software more cheaply. And that's, I think, the story of software in the world over the last few decades.

So, I mean, I don't know how this like relates to, like, competitive programming is a, like, I don't know. Kind of going back to your question, competitive programming to me was always kind of a weird kind of, you know, niche, like kind of, I don't know, I loved it.

It's like puzzle solving. And like, my experience is like, you know, half of competitive programmers are able to translate that to actual, like, building cool stuff in the world. Half just like get really in, you know, sucked into this like puzzle stuff. And, you know, it never loses its grip on them.

But like, for me, it was an amazing way to get started with coding and, or get very deep into coding and, you know, kind of battle off with like other smart kids and traveling to different countries when I was a teenager. - Yeah. There was another, oh, sorry. I was just going to mention, like, it's not just that he personally is a competitive programmer.

Like, I think a lot of people at Modal are competitive programmers. I think you met Akshat through-- - Akshat, co-founder is also III Gold Medal. By the way, gold medal doesn't mean you win. Like, but, although we actually had an intern that won Iowa. Gold Medal is like the top 20, 30 people roughly.

- Yeah. And so like, obviously it's very hard to get hired at Modal, but like, what is it like to work with like such a talent density? Like, you know, how is that contributing to the culture at Modal? - Yeah, I mean, I think humans are the root cause of like everything at a company, right?

Like, you know, bad code is because it's bad human or like whatever, you know, bad culture. So like, I think, you know, like talent density is very important and like keeping the bar high and like hiring smart people. And, you know, it's not always like the case that like high and competitive programmers, it's the right strategy, right?

If you're building something very different, like you may not, you know, but we actually end up having a lot of like hard, you know, complex challenges. Like, you know, I talked about like the cloud, you know, the resource allocation, like turns out like that actually, like you can phrase that as a mixed integer programming problem.

Like we now have that running in production, like constantly optimizing how we allocate cloud resources. There's a lot of like interesting and like complex, like scheduling problems and like, how do you do all the bin packing of all the containers? Like, so I, you know, I think for, you know, for what we're building, you know, it makes a lot of sense to hire these people who like those very hard problems.

- Yeah, and they don't necessarily have to know the details of the stack. They just need to be very good at algorithms. - No, but my feeling is like people who are like pretty good at competitive programming, they can also pick up like other stuff like elsewhere. Not always the case, but you know, there's definitely a high correlation.

- Yeah, oh yeah, I'm just, I'm interested in that just because, you know, like there's competitive mental talents in other areas, like competitive speed memorization or whatever. And like, you don't really see those transfer. And I always assumed in my narrow perception that competitive programming is so specialized, it's so obscure, even like so divorced from real world scenarios that it doesn't actually transfer that much.

But obviously I think for the problems that you work on, it does. - But it's also like, you know, frankly, it's like translates to some extent, not because like the problems are the same, but just because like it sort of filters for the, you know, people who are like willing to go very deep and work hard on things, right?

Like, I feel like a similar thing is like a lot of good developers are like talented musicians. Like why, like why is this a correlation? And like, my theory is like, you know, it's the same sort of skill. Like you have to like just hyper focus on something and practice a lot.

Like, and there's something similar that I think creates like good developers. - Sweden also had a lot of very good Counter-Strike players. I don't know, why does Sweden have fiber optics before all of Europe? I feel like, I grew up in Italy and our internet was terrible. And then I feel like all the Nordics had like amazing internet.

I remember getting online and people in the Nordics had like five ping, 10 ping. - Yeah, we had very good network back then. - Yeah, do you know why? - I mean, I'm sure like, you know, I think the government, you know, did certain things quite well, right? Like in the nineties, like there was like a bunch of tax rebates for like buying computers.

And I think there was similar like investments in infrastructure that, you know, I mean, like, and I think like I always think about, you know, it's like, I still can't use my phone in the subway in New York. And that's, you know, and that was something I could use in Sweden in '95.

You know, we're talking like 40 years almost, right? Like, why? And I don't know, like I think certain infrastructure, you know, Sweden was just better at, I don't know. - And also, you never owned a TV or a car? - Never owned a TV or a car. I never had a driver's license.

- How do you do that in Sweden though? Like that's cold. - I grew up in a city. I mean, like I took the subway everywhere with bike or whatever. Yeah, I always lived in cities. So I don't, you know, I never felt, I mean, like we have, like me and my wife has a car, but like I-- - That doesn't count.

- I mean, it's her name 'cause I don't have a driver's license. She drives me everywhere, it's nice. - Nice. - That's fantastic. Great. You know, any, I was gonna ask you, like the last thing I had on this list was, you know, your advice to people thinking about running some sort of run code in the cloud startup is only do it if you're genuinely excited about spending five years thinking about load balancing, page faults, cloud security, and DNS.

So basically like, it sounds like you're summing up a lot of pain running Modo. - Yeah. I mean, like, that's like, like one thing I struggle with, like I talked to a lot of people starting companies in the data space or like AI space or whatever, and they could have sort of come at it at like, as, you know, from like an application developer point of view, and they're like, I'm gonna make this better.

But like, guess how you have to make it better? It's like, you have to go very deep on the infrastructure layer. And so, and so one of my frustrations has been like, so many startups are like, in my opinion, like Kubernetes wrappers and like, you know, like, and not very like thick wrappers, like fairly thin wrappers.

And I think, you know, every startup is a wrapper to some extent, but like, you need to be like a fat wrapper. You need to like go deep and like build some stuff. And that's like, you know, if you build a tech company, you're gonna wanna have, you're gonna have to spend, you know, five, 10, 20 years of your life, like going very deep and like, you know, building the infrastructure you need in order to like, make your product truly stand out and be competitive.

And so, you know, I think that goes for everything. I mean, like you're starting a whatever, you know, online retailer of, I don't know, bathroom sinks, you probably have to spend, you know, 10, you have to be willing to spend 10 years of your life thinking about, you know, whatever, bathroom sinks.

Like, otherwise it's gonna be hard. - Yeah, yeah, makes sense. I think that's good advice for everyone. And yeah, congrats on all your success. It's pretty exciting to watch, and it's just the beginning. - Yeah, yeah, yeah, it's exciting. And everyone should sign up and try out modal, modal.com.

- Yeah, now it's GA, yay. - Yeah. - Used to be behind a wait list. - Yeah. - Awesome, Eric. Thanks so much for coming on. - Yeah, it's amazing. Thank you so much. - Thanks. (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music)

Truly Serverless Infra for AI Engineers - with Erik Bernhardsson of Modal

Chapters

Transcript