Back to Index

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate


Chapters

0:0 Introductions
1:22 Low latency is all you need
4:39 Evolution of CLIs
6:47 How building ArxivVanity led to Replicate
13:13 Making ML research replicable with containers
19:47 Doing YC in 2020 and pivoting to tools for COVID
23:11 Launching the first version of Replicate
29:26 Embracing the generative image community
31:58 Getting reverse engineered into an API product
35:54 Growing to 2 million users
39:37 Indie vs Enterprise customers
42:58 How customers uses Replicate
44:30 Learnings from Docker that went into Cog
52:24 Creating AI standards
57:49 Replicate's compute availability
62:38 Fixing GPU waste
70:58 What's open source AI?
75:19 Building for AI engineers
77:33 Hiring at Replicate

Transcript

Hey, everyone. Welcome to the Latent Space Podcast. This is Alessio, partner and CTO of Residence at Decibel Partners. And I'm joined by my co-host, Swoogs, founder of Small.ai. Hey, and today we have Ben Fershman in the studio. Welcome, Ben. Hey, good to be here. Ben, you're a co-founder and CEO of Replicate.

Before that, you were most notably creator of Fig, or founder of Fig, which became Docker Compose. You also did a couple other things before that. But that's what a lot of people know you for. What should people know about you outside of your LinkedIn profile? Yeah, good question. I think I'm a builder and tinkerer in a very broad sense.

And I love using my hands to make things. So I work on things maybe a bit closer to tech, like electronics. But I also build things out of wood. And I fix cars, and I fix my bike, and build bicycles, and all this kind of stuff. And there's so much I think I've learned from transferable skills, from just working in the real world to building things in software.

And there's so much about being a builder, both in real life and in software, that crosses over. Is there a real-world analogy that you use often when you're thinking about a code architecture problem? I like to build software tools as if they were something real. I like to imagine-- so I wrote this thing called the command line interface guidelines, which was a bit like sort of the Mac human interface guidelines, but for command line interfaces.

I did it with the guy I created Docker Compose with and a few other people. And I think something in there-- I think I described that your command line interface should feel like a big iron machine, where you pull a lever and it goes clunk. And things should respond within 50 milliseconds, as if it was a real-life thing.

And another analogy here is in the real life, you know when you press a button on an electronic device and it's like a soft switch, and you press it, and nothing happens, and there's no physical feedback about anything happening? And then half a second later, something happens? That's how a lot of software feels.

But instead, software should feel more like something that's real, where you touch-- you pull a physical lever and the physical lever moves. And I've taken that lesson of human interface to software a ton. It's all about latency, things feeling really solid and robust, both the command lines and user interfaces as well.

And how did you operationalize that for Fig or Docker? A lot of it's just low latency. Actually, we didn't do it very well for Fig and in the first place. We used Python, which was a big mistake, where Python's really hard to get booting up fast, because you have to load up the whole Python runtime before it can run anything.

Go is much better at this, where Go just instantly starts. So you have to be under 500 milliseconds to start up? Yeah, effectively. I mean, perception of human things being immediate is something like 100 milliseconds. So anything like that is good enough. Yeah. Also, I should mention, since we're talking about your side projects-- well, one thing is I am maybe one of a few fellow people who have actually written something about CLI design principles, because I was in charge of the Netlify CLI back in the day and had many thoughts.

One of my fun thoughts-- I'll just share it in case you have thoughts-- is I think CLIs are effectively starting points for scripts that are then run. And the moment one of the script's preconditions are not fulfilled, typically they end. So the CLI developer will just exit the program.

And the way that I really wanted to create the Netlify dev workflow was for it to be kind of a state machine that would resolve itself. If it detected a precondition wasn't fulfilled, it would actually delegate to a subprogram that would then fulfill that precondition, asking for more info or waiting until a condition is fulfilled.

Then it would go back to the original flow and continue that. Don't know if that was ever tried or is there a more formal definition of it, because I just came up with it randomly. But it felt like the beginnings of AI, in the sense that when you run a CLI command, you have an intent to do something.

And you may not have given the CLI all the things that it needs to execute that intent. So that was my two cents. Yeah, that reminds me of a thing we sort of thought about when writing the CLI guidelines, where CLIs were designed in a world where the CLI was really a programming environment.

And it was primarily designed for machines to use all of these commands and scripts. Whereas over time, the CLI has evolved to humans-- it was back in a world where the primary way of using and computers was writing shell scripts, effectively. And we've transitioned to a world where, actually, humans are using CLI programs much more than they used to.

And the current best practices about how Unix was designed-- there's lots of design documents about Unix from the '70s and '80s, where they say things like, command line commands should not output anything on success. It should be completely silent, which makes sense if you're using it in a shell script.

But if a user is using that, it just looks like it's broken. If you type copy and it just doesn't say anything, you assume that it didn't work as a new user. And yeah, so I think what's really interesting about the CLI is that it's actually a really good-- to your point, it's a really good user interface where it can be like a conversation, where it feels like you're-- instead of just you telling the computer to do this thing and either silently succeeding or saying, no, you did-- failed, it can guide you in the right direction and tell you what your intent might be and that kind of thing in a way that's actually-- it's almost more natural to a CLI than it is in a graphical user interface because it feels like this back and forth with the computer, almost funnily like a language model.

So I think there's some interesting intersection of CLIs and language models actually being very closely related and a good fit for each other. FRANCESC CAMPOY: Yeah. I would say one of the surprises from last year-- I worked on a coding agent, but I think the most successful coding agent of my cohort was Open Interpreter, which was a CLI implementation.

And I have chronically-- even as a CLI person, I have chronically underestimated the CLI as a useful interface. You also developed ArchiveVanity, which you recently retired after a glorious seven years. Something like that, yeah. Something like that, which is nice, I guess, HTML PDFs. Yeah, that was actually the start of where Replicate came from.

OK, we can tell that story. Which-- so when I quit Docker, I got really interested in science infrastructure, just as a problem area, because it is-- science has created so much progress in the world. The fact that we can talk to each other on a podcast, and we use computers, and the fact that we're alive is probably thanks to medical research.

But science is just completely archaic and broken. And it's like 19th century processes that just happen to be copied to the internet rather than taken into account that we can transfer information at the speed of light now. And the whole way science is funded and all this kind of thing is all very broken.

There's just so much potential for making science work better. And I realized that I wasn't a scientist, and I didn't really have any time to go and get a PhD and become a researcher. But I'm a tool builder, and I could make existing scientists better at their job. And if I could make a bunch of scientists a little bit better at their job, maybe that's the kind of equivalent of being a researcher.

So one particular thing I dialed in on is just how science is disseminated, in that it's all of these PDFs, quite often behind paywalls on the internet. And that's a whole thing, because it's funded by national grants, government grants, and then they're put behind paywalls. Yeah, exactly. That's like a whole-- yeah, I could talk for hours about that.

But the particular thing we got dialed in on was-- or I got kind of-- but interestingly, these PDFs are also-- there's a bunch of open science that happens as well. So math, physics, computer science, machine learning, notably, is all published on the Archive, which is actually a surprisingly old institution.

Some random Cornell science project. Yeah, it was just like somebody in Cornell who started a mailing list in the '80s. And then when the web was invented, they built a web interface around it. Like, it's super old. And it's kind of like a user group thing, right? That's why all these numbers and stuff.

Yeah, exactly. Like, it's a bit like Usenet or something. And that's where basically all of math, physics, and computer science happens. But it's still PDFs published to this thing, which is just so infuriating. So the web was invented at CERN, a physics institution, to share academic writing. Like, there are figure tags.

There are author tags. There are heading tags. There are site tags. Hyperlinks are effectively citations, because you want to link to another academic paper. But instead, you have to copy and paste these things and try and get around paywalls. Like, it's absurd. And now we have social media and things, but still academic papers as PDFs.

It's just like, why? This is not what the web was for. So anyway, I got really frustrated with that. And I went on vacation with my old friend Andreas. So we used to work together in London on a startup at somebody else's startup. And we were just on vacation in Greece for fun.

And he was trying to read a machine learning paper on his phone. We had to zoom in and scroll line by line on the PDF. And he was like, this is fucking stupid. And I was like, I know. This is something. We discovered our mutual hatred for this. And we spent our vacation sitting by the pool, making LaTeX to HTML converters and making the first version of Archive Vanity.

Anyway, that was then a whole thing. And the story-- we shut it down recently because they caught the eye of Archive, who were like, oh, this is great. We just haven't had the time to work on this. And what's tragic about the Archive is it is like this department of-- it's like this project of Cornell that's like, they can barely scrounge together enough money to survive.

I think it might be better funded now than when we were collaborating with them. And compared to these scientific journals, this is actually where the work happens. But they just have a fraction of the money that these big scientific journals have, which is just so tragic. But anyway, they were like, yeah, this is great.

We can't afford to do it. But do you want to, as a volunteer, integrate Archive Vanity into Archive? Oh, you did the work. We didn't do the work. We started doing the work. We did some. I think we worked on this for a few months to actually get it integrated into Archive.

And then we got distracted by Replicate. So a guy called Dayan picked up the work and made it happen, like somebody who works on one of the libraries that powers Archive Vanity. OK, and relationship with Archive Sanity? None. Did you predate them? I actually don't know the lineage. We were after-- we both were both users of Archive Sanity, which is like a sort of Archive-- Which is Andre's-- --like Rexis on top of Archive.

Yeah, yeah. And we were both users of that. And I think we were trying to come up with a working name for Archive. And Andreas just like cracked a joke of like, oh, let's call it Archive Vanity. Let's make the papers look nice. Yeah, yeah. And that was the working name.

And it just stuck. Got it. Got it. Yeah. And then from there, tell us more about why you got distracted, right? So Replicate maybe feels like an overnight success to a lot of people. But you've been building this since 2019. Yeah. So what prompted the start? And we've been collaborating for even longer.

We created Archive Vanity in 2017. So in some sense, we've been doing this almost like six, seven years now. A classic seven-year-- Overnight success. Yeah. Yes, we did Archive Vanity, and then worked on a bunch of surrounding projects. I was still really interested in science publishing at that point.

And I'm trying to remember-- because I tell a lot of the condensed story to people, because I can't really tell a seven-year history. So I'm trying to figure out the right-- Oh, we got room for you. --the right length to-- We want to nail the definitive Replicate story here.

One thing that's really interesting about these machine learning papers is that these machine learning papers are published on the archive. And a lot of them are actual fundamental research, so should be prose describing a theory. But a lot of them are just running pieces of software that a machine learning researcher made that did something.

It was like an image classification model or something. And they managed to make an image classification model that was better than the existing state of the art. And they've made an actual running piece of software that does image segmentation. And then what they had to do is they then had to take that piece of software and write it up as prose and math in a PDF.

And what's frustrating about that is if you want to-- so this was Andreas. Andreas was a machine learning engineer at Spotify. And some of his job was-- he did pure research as well. He did a PhD, and he was doing a lot of stuff internally. But part of his job was also being an engineer and taking some of these existing things that people have made and published and trying to apply them to actual problems at Spotify.

And he was like-- you get given a paper, which describes roughly how the model works. It's probably listing lots of crucial information. There's sometimes code on GitHub. More and more, there's code on GitHub. Back then, it was kind of relatively rare. But it was quite often just scrappy research code and didn't actually run.

And there was maybe the weights that were on Google Drive, but they accidentally deleted the weights off Google Drive. And it was really hard to take this stuff and actually use it for real things. And we just started talking together about his problems at Spotify. And I connected this back to my work at Docker as well.

I was like, oh, this is what we created containers for. We solved this problem for normal software by putting the thing inside a container so you could ship it around and it kept on running. So we were sort of hypothesizing about, hmm, what if we put machine learning models inside containers so that they could actually be shipped around and they could be defined in some production-ready formats?

And other researchers could run them to generate baselines. And people who wanted to actually apply them to real problems in the world could just pick up the container and run it. And we then thought, this is where it gets-- normally, in this part of the story, I skip forward to be like, and then we created Cog, this container stuff for machine learning models.

And we created Replicate, the place for people to publish these machine learning models. But there's actually like two or three years between that. The thing we then got dialed into was Andreas was like, what if there was a CI system for machine learning? Because one of the things he really struggled with as a researcher is generating baselines.

So when he's writing a paper, he needs to get five other models that are existing in work and get them running-- FRANCESC CAMPOY: On the same evals. MARK MANDEL: Exactly, on the same evals so you can compare apples to apples, because you can't trust the numbers in the paper.

So-- FRANCESC CAMPOY: Or you can be Google and just publish them anyway. MARK MANDEL: So he was like, what if you could-- I think this was coming from the thinking of, there should be containers for machine learning, but why are people going to use that? OK, maybe we can create a supply of containers by creating this useful tool for researchers.

And the useful tool was like, let's get researchers to package up their models and push them to the central place where we run a standard set of benchmarks across the models so that you can trust those results and you can compare these models apples to apples. And for a researcher, for Andreas, doing a new piece of research, he could trust those numbers.

And he could pull down those models, confirm it on his machine, use the standard benchmark to then measure his model, and all this kind of stuff. And so we started building that. That's what we applied to YC with. We got into YC, and we started building a prototype of this.

And then this is where it all starts to fall apart. We were like, OK, that sounds great. And we talked to a bunch of researchers, and they really wanted that. And that sounds brilliant. That's a great way to create a supply of models on this research platform. But how the hell is this a business?

How are we even going to make any money out of this? And we're like, oh, shit, that's the real unknown here of what the business is. So we thought it would be a really good idea to-- OK, before we get too deep into this, let's try and reduce the risk of this turning into a business.

So let's try and research what the business could be for this research tool, effectively. So we went and talked to a bunch of companies trying to sell them something which didn't exist. So we're like, hey, do you want a way to share research inside your company so that other researchers, or say the product manager, can test out the machine learning model?

Maybe. And we were like, do you want a deployment platform for deploying models? Do you want a central place for versioning models? We're trying to think of lots of different products we could sell that were related to this thing. And terrible idea. We're not salespeople, and people don't want to buy something that doesn't exist.

I think some people can pull this off, but we were just a bunch of product people, products and engineer people, and we just couldn't pull this off. So we then got halfway through our YC batch. We didn't have-- we hadn't built a product. We had no users. We had no idea what our business was going to be, because we couldn't get anybody to buy something which didn't exist.

And actually, this was quite a way through our-- I think it was like 2/3 the way through our YC batch or something. So we're like, OK, well, we're kind of screwed now, because we don't have anything to show at demo day. And then we then tried to figure out, OK, what can we build in two weeks that will be something?

So we desperately tried to-- I can't remember what we tried to build at that point. And then two weeks before demo day, I just remember this. I remember it was all-- we were going down to Mountain View every week for dinners, and we got called on to an all-hands Zoom call, which was super weird.

We're like, what's going on? And they were like, don't come to dinner tomorrow. And we realized-- we kind of looked at the news, and we were like, oh, there's a pandemic going on. We were so deep in our startup, we were just completely oblivious to what was going on around us.

Was this Jan or Feb 2020? This was March 2020. March 2020. 2020, yeah. Because I remember Silicon Valley at the time was early to COVID. Like, they started locking down a lot faster than the rest of the US. And I remember soon after that, there was the San Francisco lockdowns.

And then the YC batch just stopped. There wasn't demo day. And it was, in a sense, a blessing for us, because we just kind of couldn't raise money anyway. FRANCESC CAMPOY: In the normal course of events, you're actually allowed to defer to a future demo day. Yeah. So we didn't even take any defer, because it just kind of didn't happen.

So was YC helpful? Yes. We completely screwed up the batch, and that was our fault. I think the thing that YC has become incredibly valuable for us has been after YC. I think there was a reasonable argument that we didn't need to do YC to start with, because we were quite experienced.

We had done some startups before. We were kind of well-connected with VCs. It was relatively easy to raise money, because we were a known quantity. If you go to a VC and be like, hey, I made this piece of-- It's Docker-composed for AI. Exactly. Yeah, and people can pattern match like that, and they can have some trust you know what you're doing.

Whereas it's much harder for people straight out of college, and that's where YC's sweet spot is helping people straight out of college who are super promising figure out how to do that. Yeah, no credentials. Yeah, exactly. So in some sense, we didn't need that. But the thing that's been incredibly useful for us since YC has been-- this was actually, I think-- so Docker was a YC company.

And Solomon, the founder of Docker, I think, told me this. He was like, a lot of people underestimate the value of YC after you finish the batch. And his biggest regret was not staying in touch with YC. I might be misattributing this, but I think it was him. And so we made a point of that, and we just stayed in touch with our batch partner, who-- Jared at YC has been fantastic.

Jared Harris. Jared Friedman. Friedman. And all of the team at YC-- there was the growth team at YC when they were still there, and they've been super helpful. And two things that have been super helpful about that is raising money. They just know exactly how to raise money, and they've been super helpful during that process in all of our rounds.

We've done three rounds since we did YC, and they've been super helpful during the whole process. And also just reaching a ton of customers. So the magic of YC is that you have all of-- there's thousands of YC companies, I think, on the order of thousands, I think. And they're all of your first customers.

And they're super helpful, super receptive, really want to try out new things. You have a warm intro to every one of them, basically. And there's this mailing list where you can post about updates to your product, which is really receptive. And that's just been fantastic for us. We've just got so many of our users and customers through YC.

Yeah, well, so the classic criticism or the pushback is people don't buy you because you are both from YC. But at least they'll open the email. Yeah. Right? That's the-- OK. Yeah, effectively. And yeah, so that's been a really, really positive experience for us. And sorry, I interrupted with the YC question.

You just made it out of the YC, survived the pandemic. And you-- yeah. I'll try and condense this a little bit. Then we started building tools for COVID, weirdly. We were like, OK, we don't have a startup. We haven't figured out anything. What's the most useful thing we could be doing right now?

Save lives. So yeah, let's try and save lives. I think we failed at that as well. We had a bunch of products that didn't really go anywhere. We worked on a bunch of stuff, like contact tracing, which didn't really be a useful thing. Andreas worked on a door dash for people delivering food to people who are vulnerable.

What else did we do? We met a problem of helping people direct their efforts to what was most useful and a few other things like that. It didn't really go anywhere. So we're like, OK, this is not really working either. We were considering actually just doing work for COVID.

We have this decision document early on in our company, which is like, should we become a government app contracting shop? We decided no. Because you also did work for the US-- for the gov.uk. Yeah, exactly. We had experience doing some-- And the Guardian and all that. Yeah, for government stuff.

And we were just really good at building stuff. We were just product people. I was the front end product side, and Andreas was the back end side. So we were just a product. And we were working with a designer at the time, a guy called Mark, who did our early designs for Replicate.

And we were like, hey, what if we just team up and become it and build stuff? But yeah, we gave up on that in the end for-- I can't remember the details. So we went back to machine learning. And then we were like, well, we're not really sure if this is going to work.

And one of my most painful experiences from previous startups is shutting them down, when you realize it's not really working and having to shut it down. It's a ton of work. And people hate you. And it's just sort of, you know-- so we were like, how can we make something we don't have to shut down?

And even better, how can we make something that won't page us in the middle of the night? So we made an open source project. We made a thing which was an open source weights and biases, because we had this theory that people want open source tools. There should be an open source version control experiment tracking-like thing.

And it was intuitive to us. And we were like, oh, we're software developers. And we like command line tools. Everyone loves command line tools and open source stuff. But machine learning researchers just really didn't care. They just wanted to click on buttons. They didn't mind that it was a cloud service.

It was all very visual as well, that you need lots of graphs and charts and stuff like this. So it just didn't-- it wasn't right. Like, it was right. We were actually rebuilding something that Andreas made at Spotify for just saving experiments to cloud storage automatically. But other people didn't really want this.

So we kind of gave up on that. And then that was actually originally called Replicate. And we renamed that out of the way. So it's now called Keepsake. And I think some people still use it. Then we sort of came back-- we looped back to our original idea. So we were like, oh, maybe there was a thing.

And that thing we were originally thinking about of researchers sharing their work in containers for machine learning models. So we just built that. And at that point, we were kind of running out of the YC money. So we were like, OK, this feels good, though. Let's give this a shot.

So that was the point we raised a seed round. We raised-- - Pre-launch. - We raised pre-launch, pre-launch and pre-team. It was an idea, basically. We had a little prototype. It was just an idea and a team. But we were like, OK, bootstrapping this thing is getting hard, so let's actually raise some money.

And then we made Cog and Replicate. It initially didn't have APIs, interestingly. It was just the bit that I was talking about before, of helping researchers share their work. So it was a way for researchers to put their work on a web page such that other people could try it out, and so that you could download the Docker container.

So that we didn't have-- we cut the benchmarks thing of it, because we thought it was too complicated. But it had a Docker container that Andreas, in a past life, could download and run with his benchmark. And you could compare all these models apples to apples. So that was the theory behind it.

And that kind of started to work. It was still when it was long time pre-AI hype. And there was lots of interesting stuff going on. But it was very much in the classic deep learning era, so image segmentation models, and sentiment analysis, and all these kind of things that people were using deep learning models for.

And we were very much building for research, because all of this stuff was happening in research institutions. These are people who'd be publishing to archive. So we were creating an accompanying material for their models, basically. They wanted a demo for their models. And we were creating accompanying material for it.

And what was funny about that is they were not very good users. They were doing great work, obviously. But the way the research worked is that they just made one thing every six months, and they just fired and forgot it. They published this piece of paper, and like, done.

I've published it. So they output it to Replicate. And then they just stopped using Replicate. They were like once every six monthly users. And that wasn't great for us. But we stumbled across this early community. This was early 2021, when people started-- OpenAI created this-- created Clip. And people started smushing Clip and GANs together to produce image generation models.

And this started with-- it was just a bunch of tinkerers on Discord, basically. It was-- there was an early model called BigSleep by AdVadNown. And then there was VQGAN Clip, which was a bit more popular, by Rivers Have Wings. And it was all just people tinkering on stuff in Colabs.

And it was very dynamic. And it was people just making copies of Colabs and playing around with things and forking. And to me, I saw this, and I was like, oh, this feels like open source software, so much more than the research world, where people are publishing these papers.

Yeah, you don't know their real names, and it's just like a Discord. Yeah, exactly. But crucially, it was like people were tinkering and forking. And people were-- things were moving really fast. And it just felt like this creative, dynamic, collaborative community in a way that research wasn't really. Like, it was still stuck in this kind of six-month publication cycle.

So we just kind of latched onto that and started building for this community. And a lot of those early models were published on Replicate. I think the first one that was really primarily on Replicate was one called Pixray, which was sort of mid-2021. And it had a really cool pixel art output, but it also just produced-- they weren't crisp in images, but they were quite aesthetically pleasing, like some of these early image generation models.

And that was published primarily on Replicate. And then a few other models around that were published on Replicate. And that's where we really started to find our early community and where we really found, oh, we've actually built a thing that people want. And they were great users as well, and people really want to try out these models.

Lots of people were running the models on Replicate. We still didn't have APIs, though, interestingly. And this is another really complicated part of the story. We had no idea what a business model was still at this point. I don't think people could even pay for it. It's just these web forms where people could run the model.

FRANCESC CAMPOY: Just before this API bit continues, just for historical interests, which discords were they, and how did you find them? Was this the Lion Discord? MARK MANDEL: Yeah, Lion-- FRANCESC CAMPOY: This is Alutha. MARK MANDEL: Alutha, yeah. It was the Alutha one. FRANCESC CAMPOY: These two, right? MARK MANDEL: Alutha, I particularly remember.

There was a channel where VikiGangClip-- this was early 2021-- where VikiGangClip was set up as a Discord bot. And I just remember being completely just captivated by this thing. I was just playing around with it all afternoon and the sort of thing-- FRANCESC CAMPOY: In Discord. MARK MANDEL: --where, oh, shit, it's 2 AM.

FRANCESC CAMPOY: This is the beginnings of MidJourney. MARK MANDEL: Yeah, exactly. FRANCESC CAMPOY: And it was instability. MARK MANDEL: It was the start of MidJourney. And it's where that kind of user interface came from. What was beautiful about the user interface is you could see what other people are doing.

And you could riff off other people's ideas. And it was just so much fun to just play around with this in a channel full of 100 people. And yeah, that just completely captivated me. And I'm like, OK, this is something. So we should get these things on Replicate. And yeah, that's where that all came from.

FRANCESC CAMPOY: OK, sorry. I just wanted to capture that. MARK MANDEL: Yeah, yeah. FRANCESC CAMPOY: And then you moved on to-- so was it APIs Next or was it Stable Diffusion Next? MARK MANDEL: It was APIs Next. And the APIs happened because one of our users-- our web form had an internal API for making the web form work, like with an API that was called from JavaScript.

And somebody reverse engineered that to start generating images with a script. They did web inspector, copy as curl, figure out what the API request was. And it wasn't secured or anything. FRANCESC CAMPOY: Of course not. MARK MANDEL: And they started generating a bunch of images. And we got tons of traffic.

We're like, what's going on? And I think a sort of usual reaction to that would be like, hey, you're abusing our API to shut them down. And instead we're like, oh, this is interesting. Like, people want to run these models. So we documented the API in a Notion document, like our internal API in a Notion document, and messaged this person being like, hey, you seem to have found our API.

Here's the documentation. That'll be like $1,000 a month, please, with a straight form that we just clicked some buttons to make. And they were like, sure, that sounds great. So that was our first customer. FRANCESC CAMPOY: $1,000 a month? MARK MANDEL: It was a surprising amount of money, yeah.

FRANCESC CAMPOY: That's not casual. MARK MANDEL: It was on the order of $1,000 a month. FRANCESC CAMPOY: So was it a business? MARK MANDEL: It was the creator of PixRay. He generated NFT art. And so he made a bunch of art with these models and was selling these NFTs, effectively.

And I think lots of people in his community were doing similar things. And he then referred us to other people who were also generating NFTs and trying to save models. And that was the start of our API business, yeah. And then we made an official API and actually added some billing to it so it wasn't just like a fixed fee.

FRANCESC CAMPOY: And now people think of you as the host and models API business. MARK MANDEL: Yeah, exactly. But that just turned out to be our business. But what ended up being beautiful about this is it was really fulfilling, like the original goal of what we wanted to do is that we wanted to make this research that people were making accessible to other people and for it to be used in the real world.

And this was just ultimately the right way to do it because all of these people making these generative models could publish them to replicate, and they wanted a place to publish it. And software engineers, like myself-- I'm not a machine learning expert, but I want to use this stuff-- could just run these models with a single line of code.

And we thought, oh, maybe the Docker image is enough. But it's actually super hard to get the Docker image running on a GPU and stuff. So it really needed to be the hosted API for this to work and to make it accessible to software engineers. And we just wound our way to this-- FRANCESC CAMPOY: Yeah, two years to the first paying customer.

MARK MANDEL: Yeah, exactly. FRANCESC CAMPOY: Did you ever think about becoming MidJourney during that time? You have so much interest in image generation. MARK MANDEL: What could have been? I mean, you're doing fine, for the record, but you know. It was right there. You were playing with it. Yeah, I don't think it was our expertise.

I think our expertise was DevTools rather than-- MidJourney is almost like a consumer products. So I don't think it was our expertise. It certainly occurred to us. I think at the time, we were thinking about, like, oh, maybe we could hire some of these people in this community and make great models and stuff like this.

But we ended up more being at the tooling. I think before, I was saying, I'm not really a researcher. I'm more like the tool builder, the behind the scenes. And I think both me and Andreas are like that. FRANCESC CAMPOY: Yeah. I think this is also like an illustration of the tool builder philosophy, something where you latch onto in DevTools, which is when you see people behaving weird, it's not their fault, it's yours.

And you want to pave the cow paths, is what they say, right? Like, the unofficial paths that people are making, like, make it official and make it easy for them, and then maybe charge a bit of money. Yeah. And now fast forward a couple of years, you have two million developers using Replicate, maybe more.

That was the last public number that I found. Two million-- I think that got mangled, actually, by-- it's two million users. Not all those people are developers, but a lot of them are developers. Yeah. And then 30,000 paying customers was the number. That's awesome. Latent Space runs on Replicate.

So we had a small podcaster, and we host-- FRANCESC CAMPOY: We do a transcription on-- MARK MANDEL: --Whisper diarization on Replicate. And we're paying. So we're late in space, and this is in the 30,000. You raised $40 million, Series B. I would say that maybe the stable diffusion time, August '22, was really when the company started to break out.

Tell us a bit about that and the community that came out. And I know now you're expanding beyond just image generation. Yeah. I think we kind of set ourselves-- we saw there was this really interesting generative image world going on. So we're building the tools for that community already, really.

And we knew stable diffusion was coming out. We knew it was a really exciting thing. It was the best generative image model so far. I think the thing we underestimated was just what an inflection point it would be, where it was-- it was-- I think Simon Willison put it this way, where he said something along the lines of, it was a model that was open source and tinkerable and good enough that it was just good enough and open source and tinkerable, such that it just took off in a way that none of the models had before.

And what was really neat about stable diffusion is it was open source, so you could-- compared to DALI, for example, which was equivalent quality, it was open source. So you could fork it and tinker on it. And the first week, we saw people making animation models out of it.

We saw people make game texture models that use circular convolutions to make repeatable textures. We saw-- what else did we see? A few weeks later, people were fine-tuning it so you could put your face in these models. And all of these other-- Textual inversion. Yeah, exactly. That happened a bit before that.

And all of this innovation was happening all of a sudden. And people were publishing on Replicate because you could just publish arbitrary models on Replicate. So we had this supply of interesting stuff being built. But because it was a sufficiently good model, there was also just a ton of people building with it.

They were like, oh, we can build products with this thing. And this was about the time where people were starting to get really interested in AI. So tons of product builders wanted to build stuff with it. And we were just sitting in there in the middle as the interface layer between all these people who wanted to build and all these machine learning experts who were building cool models.

And that's really where it took off. We were just incredible supply, incredible demand. And we were just in the middle. And then, yeah, since then we've just grown and grown, really. And we've been building a lot for the indie hacker community, these individual tinkerers, but also startups, and a lot of large companies as well who are exploring and building AI things.

And then the same thing happened middle of last year with language models and LLAMA2, where the same stable diffusion effect happened with LLAMA. And LLAMA2 was our biggest week of growth ever because tons of people wanted to tinker with it and run it. And since then, we've just been seeing a ton of growth in language models as well as image models.

And yeah, we're just riding a lot of the interest that's going on in AI and all the people building in AI. FRANCESC CAMPOY: That's-- yeah, kudos. Right place, right time. But also took a while to position for the right place before the wave came. I'm curious if you have any insights on these different markets.

So Peter Levels, notably a very loud person, very picky about his tools. I wasn't sure, actually, if he used you. He does because you cited him on your Series B blog post, and Danny Post might as well, his competitor, all in that wave. What are their needs versus the more enterprise or B2B type needs?

Did you come to a decision point where you're like, OK, how serious are these indie hackers versus the actual businesses that are bigger and perhaps better customers because they're less churny? They're surprisingly similar because I think a lot of people right now want to use and build with AI, but they're not AI experts.

And they're not infrastructure experts either. So they want to be able to use this stuff without having to figure out all the internals of the models and touch PyTorch and whatever. And they also don't want to be setting up and booting up servers. And that's the same all the way from indie hackers just getting started-- because obviously, you just want to get started as quickly as possible-- all the way through to large companies who want to be able to use this stuff, but don't have all of the experts on stuff.

I think some companies are quite-- big companies like Google and so on that do actually have a lot of experts on stuff, but the vast majority of companies don't. And they're all software engineers who want to be able to use this AI stuff, but they just don't know how to use it.

And it's like, you really need to be an expert. And it takes a long time to learn the skills to be able to use that. So they're surprisingly similar in that sense. And I think it's kind of also unfair of the indie community. They're not churning, surprisingly, or churny or spiky, surprisingly.

They're building real established businesses, which is like, kudos to them of building these really large, sustainable businesses, often just as solo developers. And it's kind of remarkable how they can do that, actually. And it's in credit to a lot of their product skills. And we're just there to help them, being their machine learning team, effectively, to help them use all of this stuff.

So we're actually making-- a lot of these indie hackers are some of our largest customers, alongside some of our biggest customers that you would think would be spending a lot more money than them. Yeah. And we should name some of these. You have them on your landing page. You have BuzzFeed.

You have Unsplash, Character AI. What do they power? What can you say about their usage? Yeah, totally. It's kind of various things. I'm trying to think. Let me actually think. What can I say about what customers? Well, I mean, I'm naming them because they're on your landing page. So you have logo rights.

It's useful for people to-- I'm not imaginative. I see-- monkeys see monkeys do, right? Like, if I see someone doing something that I want to do, then I'm like, OK, Replicate's great for that. So that's what I think about case studies on company landing pages, is that it's just a way of explaining, like, yep, this is something that we are good for.

Yeah, totally. I mean, these companies are doing things all the way up and down the stack at different levels of sophistication. So Unsplash, for example, they actually publicly posted this story on Twitter where they're using Blip to annotate all of the images in their catalog. So they have lots of images in the catalog, and they want to create a text description of it so you can search for it.

And they're annotating images with off-the-shelf open source model. We have this big library of open source models that you can run. And we've got lots of people who are running these open source models off the shelf. And then most of our larger customers are doing more sophisticated stuff. So they're fine-tuning the models.

They're running completely custom models on us. And so a lot of these larger companies are using us for a lot of their inference. But it's a lot of custom models and them writing the Python themselves, because they've got machine learning experts on the team. And they're using us for their inference infrastructure effectively.

So it's lots of different levels of sophistication, where some people are using these off-the-shelf models. Some people are fine-tuning models. Peter Levels is a great example, where a lot of his products are based off fine-tuning image models, for example. And then we've also got larger customers who are just using us as infrastructure, effectively, as servers.

So yeah, it's all things up and down the stack. Let's talk a bit about Cog and the technical layer. So there are a lot of GPU clouds. I think people have different pricing points. And I think everybody tries to offer a different developer experience on top of it, which then lets you charge a premium.

Why did you want to create Cog? What were some of the-- you worked at Docker. What were some of the issues with traditional container runtimes? And maybe, yeah, what were you surprised with as you built it? Cog came right from the start, actually, when we were thinking about this evaluation, the benchmarking system for machine learning researchers, where we wanted researchers to publish their models in a standard format that was guaranteed to keep on running, that you could replicate the results of.

That's where the name came from. And we realized that we needed something like Docker to make that work. And I think it was just natural, from my point of view, obviously, that should be open source, that we should try and create some kind of open standard here that people can share.

Because if more people use this format, then that's great for everyone involved. I think the magic of Docker is not really in the software. It's just the standard that people have agreed on. Here are a bunch of keys for a JSON document, basically. And that was the magic of the metaphor of real containerization as well.

It's not the containers that are interesting. It's like the size and shape of the damn box. And it's a similar thing here, where really we just wanted to get people to agree on this is what a machine learning model is. This is how a prediction works. This is what the inputs are.

This is what the outputs are. So Cog is really just a Docker container that attaches to a CUDA device, if it needs a GPU, that has a open API specification as a label on the Docker image. And the open API specification defines the interface for the machine learning model, like the inputs and outputs effectively, or the params in machine learning terminology.

And we just wanted to get people to agree on this thing. And it's general purpose enough. We weren't saying-- some of the existing things were at the graph level. But we really wanted something general purpose enough that you could just put anything inside this. And it was future compatible.

And it was just like arbitrary software. And it'd be future compatible with future inference servers and future machine learning model formats and all this kind of stuff. So that was the intent behind it. And it just came naturally that we wanted to define this format. And that's been really working for us.

A bunch of people have been using Cog outside of Replicates, which is kind of our original intention. This should be how machine learning models are packaged and how people should use it. It's common to use Cog in situations where maybe they can't use the SAS service because they're in a big company.

And they're not allowed to use a SAS service. But they can use Cog internally still. And they can download the models from Replicates and run them internally in their org, which we've been seeing happen. That works really well. People who want to build custom inference pipelines but don't want to reinvent the world, they can use Cog off the shelf and use it as a component in their inference pipelines.

We've been seeing tons of usage like that. And it's just been kind of happening organically. We haven't really been trying. But it's there if people want it. And we've been seeing people use it. So that's great. And yeah, so a lot of it is just sort of philosophical. This is how it should work from my experience at Docker.

And there's just a lot of value from the core being open, I think, and that other people can share it. And it's like an integration point. So if Replicate, for example, wanted to work with a testing system, like a CI system or whatever, we can just interface at the Cog level.

That system just needs to put Cog models. And then you can test your models on that CI system before they get deployed to Replicate. And it's just a format that we can get everyone to agree on. What do you think, I guess, Docker got wrong? Because if I look at a Docker Compose and a Cog definition, first of all, the Cog is kind of like the Docker file plus the Compose.

And Docker Compose are just exposing the services. And also, Docker Compose is very ports-driven, versus you have the actual predict, this is what you have to run. Yeah, any learnings and maybe tips for other people building container-based runtimes? Like, how much should you separate the API services versus the image building, or how much you want to build them together?

I think it was coming from two sides. We were thinking about the design from the point of view of user needs. Like, what do users-- what are their problems, and what problems can we solve for them? But also, what the interface should be for a machine learning model. And it's sort of the combination of two things that led us to this design.

So the thing I talked about before was a little bit of the interface around the machine learning model. So we realized that we wanted it to be general purpose. We wanted it to be at the JSON human-readable things, rather than the tensor level. So it's like an open API specification that wrapped a Docker container.

That's where that design came from. And it's really just a wrapper around Docker. So we were kind of building on-- standing on shoulders there. But Docker's too low-level. So it's just like arbitrary software. So we wanted to be able to have a open API specification there that defined the function, effectively, that is the machine learning model, but also how that function is written, how that function is run, which is all defined in code, and stuff like that.

So it's like a bunch of abstraction on top of Docker to make that work. And that's where that design came from. But the core problems we were solving for users was that Docker's really hard to use. And productionizing machine learning models is really hard. So on the first part of that, we knew we couldn't use Dockerfiles.

Dockerfiles are hard enough for software developers to write. I'm saying this with love as somebody who works on Docker and works on Dockerfiles. But it's really hard to use. And you need to know a bunch about Linux, basically, because you're running a bunch of CLI commands. You need to know a bunch of Linux and best practices, how apt works, and all this kind of stuff.

So we're like, OK, we can't get to that level. We need something that machine learning researchers will be able to understand. People who are used to Colab notebooks. And what they understand is they're like, I need this version of Python. I need these Python packages. And somebody told me to apt-get install something.

You know? MARK MANDEL: And throw sudo in there when I don't really know what that means. So we tried to create a format that was at that level. And that's what cog.yaml is. And we're really kind of trying to imagine, what is that machine learning researcher going to understand and trying to build for them?

And then the productionizing machine learning models thing is like, OK, how can we package up all of the complexity of productionizing machine learning models? Like picking CUDA versions, like hooking it up to GPUs, writing an inference server, defining a schema, doing batching, all of these just really gnarly things that everyone does again and again, and just provide that as a tool.

And that's where that side of it came from. So it's like combining those user needs with the world need of needing a common standard for what a machine learning model is. And that's how we thought about the design. I don't know whether that answers the question. FRANCESC CAMPOY: Yeah.

So your idea was like, hey, you really want what Docker stands for in terms of standard, but you actually don't want people to do all the work that goes into Docker. MARK MANDEL: It needs to be higher level. FRANCESC CAMPOY: So I want to, for the listener, you're not the only standard that is out there.

As with any standard, there must be 14 of them. You are surprisingly friendly with Olama, who is your former colleagues from Docker, who came out with the model file. Mozilla came out with the Lama file. And then I don't know if this is in the same category even, but I'm just going to throw it in there.

Like Hugging Face has the Transformers and Diffusers library, which is a way of disseminating models that, obviously, people use. How would you compare your contrast, your approach of Cog versus all these? MARK MANDEL: It's kind of complementary, actually, which is kind of neat. It's a lot of-- Transformers, for example, is lower level than Cog.

So it's a Python library, effectively. But you still need to-- FRANCESC CAMPOY: Expose them. MARK MANDEL: Yeah, you still need to turn that into an inference server. You still need to install the Python packages and that kind of thing. So lots of Replicate models are Transformers models and Diffusers models inside Cog.

So that's the level that that sits. So it's very complementary in some sense. And we're kind of working on integration with Hugging Face such that you can deploy models from Hugging Face into Cog models and stuff like that and to Replicate. And so some of these things, like Llamafile and what Llama are working on, are also very complementary in that they're doing a lot of the running these things locally on laptops, which is not a thing that works very well with Cog.

Cog is really designed around servers and attaching to CUDA devices and NVIDIA GPUs and this kind of thing. So we're trying to figure out-- we're actually figuring out ways that those things can be interoperable because they should be. And I think they are quite complementary in that you should be able to take a model and Replicate and run it on your local machine.

You should be able to take a model on your local machine and run it in the cloud. So, yeah. FRANCESC CAMPOY: Is the base layer something like-- is it at the GGUF level? Which, by the way, I need to get a primer on the different formats that have emerged.

Or is it at the star.file level, which is model file, Llamafile, whatever? Or is it at the Cog level? I don't know, to be honest. And I think this is something we still have to figure out. I think there's a lot yet. Exactly where those lines are drawn, I don't know exactly.

I think this is something we're trying to figure out ourselves. But I think there's certainly a lot of promise about these systems interoperating. I think we just want things to work together. We want to try and reduce the number of standards so the more these things can interoperate and convert between each other and that kind of stuff at the minute.

FRANCESC CAMPOY: Andreas comes out of Spotify. Eric from Modo also comes out of Spotify. You work at Docker and the Llama guys work at Docker. Did you know that these ideas were-- did both you and Andreas know that there was somebody else you work with that had a kind of like similar-- not similar idea, but was interested in the same thing?

Or did you then just say, oh, I know those people. They're doing something very similar. We learned about both early on, actually. Because we know them both quite well. And it's funny how I think we're all seeing the same problems and just applying, trying to fix the same problems that we're all seeing.

I think the Llama one's particularly funny because I joined Docker through my startup. Funnily, actually, the thing which worked from my startup was Compose, but we were actually working on another thing, which was a bit like EC2 for Docker. So we were working on productionizing Docker containers, and our Llama was working on a thing called Kitematic, which was a bit like a desktop app for Docker.

And our companies both got bought by Docker at the same time. And Kitematic turned into Docker Desktop, and then our thing then turned into Compose. And it's funny how we're both applying the things we saw at Docker to the AI world, where they're building the local environment for us, and we're building the cloud for it.

And yeah, so that's just really pleasing. And I think we're collaborating closely because there's just so much opportunity for working there. FRANCESC CAMPOY: When you have a hammer, everything's a nail. MARK MANDEL: Yeah, exactly, exactly. And I think a lot of-- this is-- I mean, where we're coming from a lot with AI is we're taking a lot of things that-- because we're all kind of, on the Replicator team, we're all kind of people who have built developer tools in the past.

So we've got a team-- like, I worked at Docker. We've got people who worked at Heroku, and GitHub, and the iOS ecosystem, and all this kind of thing. Like, the previous generation of developer tools, where we figured out a bunch of stuff, and then AI has come along. And we just don't yet have those tools and abstractions to make it easy to use.

So we're trying to take the lessons that we learned from the previous generation of stuff and apply it to this new generation of stuff. And obviously, there's a bit of nuance there, because the trick is to take the right lessons and do new stuff where it makes sense. You can't just cut and paste, you know?

But that's how we're approaching this, is we're trying to, as much as possible, take some of those lessons we learned from how Heroku and GitHub was built, for example, and apply them to AI. FRANCESC CAMPOY: Excellent. We should also talk a little bit about your compute availability. We're trying to ask this of all-- it's Compute Provider Month.

Do you own your own GPUs? How many do you have access to? What do you feel about the tightness of the GPU market? ALEX DANILO: We don't own our own GPUs. We've got a few that we play around with, but not for production workloads. And we are primarily built on just public clouds, so primarily GCP and CoreWeave, and some smatterings elsewhere.

And-- FRANCESC CAMPOY: Not from NVIDIA, which is your newest investor? ALEX DANILO: We work with NVIDIA. So they're kind of helping us get GPU availability. GPUs are hard to get hold of if you go to AWS and ask for one A100, they won't give you an A100. But if you go to AWS and say, I would like 100 A100s in two years, they're like, sure, we've got some.

I think the problem is the cloud providers. The cloud providers, that makes sense from their point of view. They want just reliable, sustained usage. They don't want spiky usage and wastage in their infrastructure, which makes total sense. But that makes it really hard for startups who are wanting to just get a hold of GPUs.

I think we're in a fortunate position where we can aggregate demand, so we can make commits to cloud providers. And then we actually have good availability. It's not-- we don't have infinite availability, obviously, but if you want an A100 from Replicate, you can get it. But we're seeing other companies pop up as well.

I guess SfCompute's a great example of this, where they're doing the same idea for training almost, where a lot of startups need to be able to train a model, but they can't get hold of GPUs from large cloud providers. So SfCompute is letting people rent 10 H100s for two days, which is just impossible otherwise.

And what they're effectively doing there is they're aggregating demand such that they can make a big commit to the cloud provider and then let people use smaller chunks of it. And that's what we're doing with Replicate as well, where we're aggregating demand such that we make big commits to the cloud providers.

And then people can run a 100-millisecond API request on an A100. FRANCESC CAMPOY: Coming from a finance background, this sounds surprisingly similar to banks, where the job of a bank is maturity transformation, is what you call it. You take short-term deposits, which technically can be withdrawn at any time, and you turn that into long-term loans for mortgages and stuff.

And you pocket the difference in interest. And that's the bank. MARK MANDEL: Yeah, that's exactly what we're doing. FRANCESC CAMPOY: So you run a bank. MARK MANDEL: Yeah, a GPU bank. Yeah, and it's so much a finance problem as well, because we have to make bets on the future demand and the value of GPUs.

FRANCESC CAMPOY: What are you-- OK, I don't know how much you can disclose, but what are you forecasting? Up, down? Up a lot? Up 10x? MARK MANDEL: I can't really-- we're projecting our growth with some educated guesses about what kind of models are going to come out and what kind of models these will run.

So we need to bet that, OK, maybe language models are getting larger. So we need to have GPUs with a lot of RAM, or multi-GPU nodes, or maybe models are getting smaller. We actually need smaller GPUs. We have to make some educated guesses about that kind of stuff. FRANCESC CAMPOY: Yeah.

Speaking of which, the mixture of experts' models must be throwing a spanner into the planning. MARK MANDEL: Not so much. I mean, we've got multi-node A100 machines, which can run this, and multi-node H100 machines, which can run those, no problem. So we're set up for that world. FRANCESC CAMPOY: OK, I didn't expect it to be so easy.

My impression was that the amount of RAM per model is increasing a lot, especially on a sort of per parameter basis, per active parameter basis, going from Mixed Trial being eight experts to the Deep Seek MOE models-- I don't know if you saw them-- being like 30, 60 experts.

And you can see it keep going up, I guess. MARK MANDEL: I think we might run into problems at some point. And yeah, I don't know exactly what's going on there. I think something that we're finding, which is kind of interesting-- like, I don't know this in depth. But we're certainly seeing a lot of good results from lower-precision models.

So 90% of the performance with just much less RAM required. And that means that we can run them on GPUs we have available. And it's good for customers as well, because it runs faster. And they want that trade-off of where it's just slightly worse, but way faster and cheaper.

FRANCESC CAMPOY: Do you see a lot of GPU waste in terms of people running the thing on a GPU that is too advanced? I think we use a T4 to run Whisper. So we're at the bottom end of it. Yeah, any thoughts? I think at one of the hackathons we were at, people were like, oh, how do I get access to like H100s?

And it's like, you need to run It's like, you don't need an H100. Yeah, yeah. Well, if you want low latency, sure. Like, spend a lot of money on an H100. Yeah, we see a ton of that kind of stuff. And it's surprisingly hard to optimize these models right now.

So a lot of people are just running really unoptimized models. We're doing the same, honestly. Like, we're a lot of models on Replicate have just not been optimized very well. So something we want to be able to help people with is optimizing those models. Like, either we show people how to with guides, or we make it easier to use some of these more optimized inference servers, or we show people how to compile the models, or we do that automatically, or something like that.

But that's only something we're exploring. Like, there's so much wastage. Like, it's not just wasting the GPUs. It's also a bad experience, and the models run slow. So a lot of the models on Replicate-- some of the most popular models on Replicate we have-- so the models on Replicate are almost all pushed by our community.

Like, people have pushed those models themselves. But it's like a big head of distribution, where there's like a long tail of lots of models that people have pushed, and then a big head of the models most people run. So models like Llama 2, like Stable Diffusion, we work with Meta and Stability to maintain those models.

And we've done a ton of optimization to make those really fast. So yeah, those models are optimized, but the long tail is not. And there's a lot of wastage there. And going into the-- well, it's already the new year. Do you see the customer demand and the GPU hardware demand kind of staying together?

Because I think a lot of people are saying, there's like hundreds of thousands of GPUs being shipped this year. Like, the crunch is going to be over. But you also have millions of people that now care about using AI. How do you see the two lines progressing? Are you seeing customer demand is going to outpace the GPU growth?

Do you see them together? Do you see maybe a lot of this model improvement work kind of helping alleviate that? From our point of view, demand is not outpacing supply of GPUs. We have enough-- from my point of view, we have enough GPUs to go around. And that might change, for sure.

Yeah. That's a very nicely put way, as a startup founder, to respond. Yeah, I'll maybe get into a little bit of this on the-- you said optimizing models. Actually, so when Alessio talked about GPU waste, he was more-- oh, that's you. Sorry. Yeah, it is getting a little bit warm in here, some greenhouse gas effect.

So Alessio framed it more as sort of picking the wrong box model, whereas yours is more about maybe the inference stack, if you can call it. Were you referencing VLM? What other sort of techniques are you referencing? And also keeping in mind that when I talk to your competitors, and I don't know if-- we don't have to name any of them, but they are working on trying to optimize the kinds of models.

Basically, they'll quantize their models for you with their special stack. So you basically use their versions of LamaTool. You use their versions of Mistral. And that's one way to approach it. I don't see it as the Replicate DNA to do that, because that would be sort of-- you would have to slap the Replicate house brand on something, which-- I mean, just comment on any of that.

Like, what do you mean when you say optimize models? Yeah, I mean, things like quantizing the models, you can imagine a way that we could help people quantize their models if we want to. We've had success using inference servers like VLM and TRT-LLM, and we're using those kind of things to serve language models.

We've had success with things like AI templates, which compile the models, all of those kind of things. And there's some even really just boring things of just making the code more efficient. Like, some people, when they're just writing some Python code, it's really easy to just write an efficient Python code.

There's really boring things like that as well. But it's like a whole smash of things like that. FRANCESC CAMPOY: So you will do that for a customer? You look at their code and-- MARK MANDEL: Yeah, we've certainly helped some of our customers be able to do that some stuff.

That's some stuff, yeah. And a lot of the models on-- like, the popular models on Replicate, we've rewritten them to use that stuff as well. And the stable diffusion that we run, for example, is compiled with AI template to make it super fast. And it's all open source, that you can see all of this stuff on GitHub if you want to see how we do it.

But you can imagine ways that we could help people. It's almost like built into the Cog layer maybe, where we could help people use these fast inference servers or use AI template to compile their models to make it faster. Whether it's manual, semi-manual, or automatic, we're not really sure.

But that's something we want to explore, because that benefits everyone. FRANCESC CAMPOY: Yeah, awesome. And then on the competitive piece, there was a price war on McStraw last year, this last December. As far as I can tell, you guys did not enter that war. You have McStraw, but it's just regular pricing.

I think also some of these players are probably losing money on their pricing. You don't have to say anything, but the break-even is somewhere between $0.50 to $0.75 per million tokens served. How are you thinking about just the overall competitiveness in the market? How should people choose when everyone's an API?

So for Lama2 and McStraw-- I think not McStraw, but I can't remember exactly-- we have similar performance and similar price to some of these other services. We're not bargain basement to some of the others, because to your point, we don't want to burn tons of money. But we're pricing it sensibly and sustainably to a point where we think it's competitive with other people, such that-- the thing we don't want-- we want developers using Replicate.

And we don't want to price it such that it's only affordable by big companies. We want to make it cheap enough such that the developers can afford it. But we also don't want the super cheap prices, because then it's almost like then your customers are hostile. And the more customers you get, the worse it gets.

So we're pricing it sensibly, but still to the point where hopefully it's cheap enough to build on. And I think the thing we really care about-- obviously, we want models and Replicate to be comparable to other people. But I think the really crucial thing about Replicate and the way I think we think about it is that it's not just the API for them.

Particularly in open source, it's not just the API for the model that is the important bit. Because quite often with open source models, the whole point of open source is that you can tinker on it, and you can customize it, and you can fine tune it, and you can smush it together with another model, like Lava, for example.

And you can't do that if it's just a hosted API, because you can't touch the code. So what we want to do with Replicate is build a platform that's actually open. So we've got all of these models where the performance and price is on par with everything else. But if you want to customize it, you can fine tune it.

You can go to GitHub and get the source code for it, and edit the source code, and push up your own custom version, and this kind of thing. Because that's the crucial thing for open source machine learning, is be able to tinker on it and customizing it. And we think that's really important to make open source AI work.

You mentioned open source. How do you think about levels of openness? When Lama 2 came out, I wrote a post about this. It's like open source, and there's open weights, then there's restrictive weights. It was on the front page of "Hacker News," so there was all sort of comments from people.

So I'm always curious to hear your thoughts. What do you think is OK for people to license? What's OK for people to not release? Yeah. Yeah, we're seeing-- I mean, before, it was just like closed source, big models, open source, little models, purely open source stuff. And we're now seeing lots of variations where model companies putting restrictive licenses on their models.

That means it can only be used for non-commercial use. And a lot of the open source crowd is complaining it's not true open source, all this kind of thing. And I think a lot of that is coming from philosophy, the sort of free software movement kind of philosophy. And I don't think it's necessarily a bad thing.

I think it's good that model companies can make money out of their models. That's how this model incentivized people to make more models and this kind of thing. And I think it's totally fine if somebody made something to ask for some money in return if you're making money out of it.

And I think that's totally OK. And I think there's some really interesting midpoints, as well, where people are releasing the code. So you can still tinker on it. But the person who trained the model still wants to get a cut of it if you're making a bunch of money out of it.

And I think that's good. And that's going to make the ecosystem more sustainable. And I think we're just going to see-- I don't think anybody's really figured it out yet. And we're going to see more experimentation with this and more people try to figure out, hmm, what are the business models around building models?

And how can I make money out of this? And we'll just see where it ends up. And I think it's something we want to support as Replicate, as well, because we believe in open source. We think it's great. But there's also going to be lots of models which are closed source, as well.

And these companies might not be-- there's probably going to be a long tail of a bunch of people building models that don't have the reach that OpenAI have. And hopefully, as Replicate, we can help those people find developers and help them make money and that kind of thing. I think the compute requirements of AI kind of changed the thing.

I started an open source company. I'm a big open source fan. And before, it was kind of man hours was really all that went into open source. It wasn't much monetary investment. Well, not that man hours are not worth a lot. But if you think about Llama 2, it's like $25 million all in.

It's like you can't just spin up a Discord and spend $25 million. So I think it's net positive for everybody that Llama 2 is open source. And, well, is the open source term-- I think people, like you're saying, they kind of argue on the semantics of it. But all we care about is that Llama 2 is open.

Because if Llama 2 wasn't open source today, if Mistral was not open source, we would be in a bad spot. And I think the nuance here is making sure that these models are still tinkerable. Because the beautiful thing about Llama 2 as a base model is that, yeah, it costs $25 million to train to start with.

But then you can fine tune it for like $50. And that's what's so beautiful about the open source ecosystem and something I think is really surprising as well. It completely surprised me. I think a lot of people assumed that it's not going to be-- open source machine learning is just not going to be practical because it's so expensive to train these models.

But fine tuning is unreasonably effective. And people are getting really good results out of it. And it's really cheap. So people can effectively create open source models really cheaply. And there's going to be this sort of ecosystem of tons of models being made. And I think the risk there from a licensing point of view is we need to make sure that the licenses let people do that.

Because if you release a big model under a non-commercial license and people can't fine tune it, you've lost the magic of it being open. And I'm sure there are ways to structure that such that the person paying $25 million feels like they're compensated somehow and they can feel like they should keep on training models.

And people can keep on fine tuning it. But I guess we just have to figure out exactly how that plays out. FRANCESC CAMPOY: Excellent. So just wanted to round it out. You've been an excellent, very open guest so far. I actually kind of-- I should have started my intro with this.

But I feel like you found the AI engineer crew before I did. And something that really resonated with you in the Series B announcement was that you put in some stats here about how there are two orders of magnitude more software engineers than there are machine learning engineers, about 30 million software engineers and 500,000 machine learning engineers.

You can maybe plus/minus one of those orders of magnitude, but it's around that ballpark. And so obviously, there will be a lot more AI engineers than there will be ML engineers. How do you see this group? Like, is it all software engineers? Are they going to specialize? What would you advise someone trying to become an AI engineer?

Is this a legitimate career path? Yeah, absolutely. I mean, it's very clear that AI is going to be a large part of how we build software in the future now. It's a bit like being a software developer in the '90s and ignoring the internet. You just need to learn about this stuff.

You need to figure this stuff out. I don't think it needs to be-- you don't need to be super low level. You don't need to be like-- the metaphor here is that you don't need to be digging down into this sort of PyTorch level if you don't want to.

In the same way as a software engineer in the '90s, you don't need to be understanding how network stacks work to be able to build a website. But you need to understand the shape of this thing and how to hold it and what it's good at and what it's not.

And that's really important. So yeah, I certainly just advise people to just start playing around with it. Get a feel of how language models work. Get a feel of how these diffusion models work. Get a feel of what fine tuning is and how it works, because some of your job might be building data sets.

Get a feeling of how prompting works, because some of your job might be writing a prompt. And those are just all really important skills to sort of figure out. Well, thanks for building the definitive platform for doing all that. Yeah, of course. Any final call to actions? Who should come work at Replicate?

Yeah, anything for the audience? Yeah, I mean, we're hiring. If you click on Jobs at the bottom of replicate.com, there's some jobs. And I just encourage you to just try out AI, even if you think you're not smart enough. Like, the whole reason I started this company is because I was looking at the cool stuff that Andreas was making.

Like, Andreas is like a proper machine learning person with a PhD. And I was just like a sort of lonely software engineer. And I was like, you're doing really cool stuff, and I want to be able to do that. And by us working together, we've now made it accessible to dummies like me.

And I just encourage anyone who wants to try this stuff out, just give it a try. And I think I would also encourage people who are tool builders. Like, the limiting factor now on AI is not like the technology. Like, the technology has made incredible advances. And there's just so many incredible machine learning models that can do a ton of stuff.

The limiting factor is just like making that accessible to people who build products. Because it's really hard to use this stuff right now. And obviously, we're building some of that stuff as Replicate. But there's just like a ton of other tooling and abstractions that need to be built out to make this stuff usable.

So I just encourage people who like building developer tools to just like get stuck into it as well. Because that's going to make this stuff accessible to everyone. FRANCESC CAMPOY: Yeah. I especially want to highlight you have a Hacker-in-Residence job opening available, which not every company has, which means just join you and hack stuff.

I think Charlie Holtz is doing a fantastic job of that. CHRIS BANES: Yep. Effectively, most of our-- a lot of our job is just like showing people how to use AI. So we've just got a team of software developers. And people have kind of figured this stuff out who are writing about it, who are making videos about it, who are making example applications to show people what you can do with this stuff.

FRANCESC CAMPOY: Yeah. In my world, that used to be called DevRel. But now it's Hacker-in-Residence. And that's-- CHRIS BANES: Yeah, this came from-- Zeke is another one of our hackers. FRANCESC CAMPOY: Tell me this came from Chroma. To start that one. CHRIS BANES: We developed-- Antoine actually was like, hey, we came up with that first.

But I think we came up with it independently. FRANCESC CAMPOY: Yeah, I made that page, yeah. CHRIS BANES: I think we came up with it independently. Because the story behind this is we originally called it the DevRel team. And-- FRANCESC CAMPOY: DevRel is cursed now. No one wants to listen to DevRel.

CHRIS BANES: And Zeke was like, that sounds so boring. I have to go to someone and say I'm a developer relations person. FRANCESC CAMPOY: You don't want to be a hacker man. CHRIS BANES: Or a developer advocate or something. So we were like, OK, what's the way we can make this sound the most fun?

All right, you're a hacker. I would say that is consistently the vibe I get from Replicate, everyone on your team I interact with. When I go to your San Francisco office, that's the vibe that you're generating. It's a hacker space more than an office. And you hold fantastic meetups there.

And I think you're a really positive presence in our community. So thank you for doing all that. And it's instilling the hacker vibe and culture into AI. I'm really glad that's working. Yeah. Cool. That's a wrap, I think. Thank you so much for coming on, man. Yeah, of course.

Thank you. This was a lot of fun.