A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

00:00:00.000 | Hey, everyone.

00:00:00.800 | Welcome to the Latent Space Podcast.

00:00:02.640 | This is Alessio, partner and CTO of Residence

00:00:05.040 | at Decibel Partners.

00:00:06.080 | And I'm joined by my co-host, Swoogs, founder of Small.ai.

00:00:09.360 | Hey, and today we have Ben Fershman in the studio.

00:00:12.080 | Welcome, Ben.

00:00:12.760 | Hey, good to be here.

00:00:14.560 | Ben, you're a co-founder and CEO of Replicate.

00:00:17.280 | Before that, you were most notably creator of Fig,

00:00:21.120 | or founder of Fig, which became Docker Compose.

00:00:24.160 | You also did a couple other things before that.

00:00:26.480 | But that's what a lot of people know you for.

00:00:29.520 | What should people know about you outside

00:00:32.640 | of your LinkedIn profile?

00:00:35.720 | Yeah, good question.

00:00:37.600 | I think I'm a builder and tinkerer in a very broad sense.

00:00:40.680 | And I love using my hands to make things.

00:00:43.320 | So I work on things maybe a bit closer to tech,

00:00:46.720 | like electronics.

00:00:48.040 | But I also build things out of wood.

00:00:50.480 | And I fix cars, and I fix my bike,

00:00:55.040 | and build bicycles, and all this kind of stuff.

00:00:58.320 | And there's so much I think I've learned

00:01:01.560 | from transferable skills, from just working in the real world

00:01:04.160 | to building things in software.

00:01:08.400 | And there's so much about being a builder, both in real life

00:01:11.440 | and in software, that crosses over.

00:01:14.240 | Is there a real-world analogy that you use often

00:01:16.160 | when you're thinking about a code architecture problem?

00:01:22.040 | I like to build software tools as if they were something real.

00:01:30.720 | I like to imagine--

00:01:33.240 | so I wrote this thing called the command line interface

00:01:36.520 | guidelines, which was a bit like sort of the Mac human interface

00:01:39.560 | guidelines, but for command line interfaces.

00:01:41.400 | I did it with the guy I created Docker Compose with

00:01:47.680 | and a few other people.

00:01:50.240 | And I think something in there--

00:01:53.120 | I think I described that your command line interface should

00:01:55.460 | feel like a big iron machine, where you pull a lever

00:01:58.920 | and it goes clunk.

00:02:00.520 | And things should respond within 50 milliseconds,

00:02:04.160 | as if it was a real-life thing.

00:02:07.120 | And another analogy here is in the real life,

00:02:10.040 | you know when you press a button on an electronic device

00:02:13.000 | and it's like a soft switch, and you press it,

00:02:15.080 | and nothing happens, and there's no physical feedback

00:02:17.800 | about anything happening?

00:02:19.040 | And then half a second later, something happens?

00:02:21.240 | That's how a lot of software feels.

00:02:22.700 | But instead, software should feel more

00:02:24.540 | like something that's real, where you touch--

00:02:26.440 | you pull a physical lever and the physical lever moves.

00:02:29.760 | And I've taken that lesson of human interface

00:02:33.400 | to software a ton.

00:02:35.320 | It's all about latency, things feeling

00:02:37.600 | really solid and robust, both the command lines and user

00:02:42.800 | interfaces as well.

00:02:44.240 | And how did you operationalize that for Fig or Docker?

00:02:49.000 | A lot of it's just low latency.

00:02:50.320 | Actually, we didn't do it very well for Fig and [INAUDIBLE]

00:02:53.040 | in the first place.

00:02:53.840 | We used Python, which was a big mistake,

00:02:56.000 | where Python's really hard to get booting up fast,

00:02:58.840 | because you have to load up the whole Python runtime

00:03:01.000 | before it can run anything.

00:03:02.800 | Go is much better at this, where Go just instantly starts.

00:03:07.880 | So you have to be under 500 milliseconds to start up?

00:03:12.120 | Yeah, effectively.

00:03:13.080 | I mean, perception of human things

00:03:16.200 | being immediate is something like 100 milliseconds.

00:03:19.280 | So anything like that is good enough.

00:03:23.520 | Yeah.

00:03:24.640 | Also, I should mention, since we're

00:03:26.100 | talking about your side projects--

00:03:27.560 | well, one thing is I am maybe one of a few fellow people who

00:03:30.280 | have actually written something about CLI design principles,

00:03:33.600 | because I was in charge of the Netlify CLI back in the day

00:03:37.480 | and had many thoughts.

00:03:39.360 | One of my fun thoughts--

00:03:40.560 | I'll just share it in case you have thoughts--

00:03:42.480 | is I think CLIs are effectively starting points

00:03:45.680 | for scripts that are then run.

00:03:48.060 | And the moment one of the script's preconditions

00:03:50.200 | are not fulfilled, typically they end.

00:03:53.920 | So the CLI developer will just exit the program.

00:03:58.920 | And the way that I really wanted to create the Netlify dev

00:04:01.760 | workflow was for it to be kind of a state machine that

00:04:04.560 | would resolve itself.

00:04:06.640 | If it detected a precondition wasn't fulfilled,

00:04:09.480 | it would actually delegate to a subprogram that

00:04:11.960 | would then fulfill that precondition,

00:04:13.640 | asking for more info or waiting until a condition is fulfilled.

00:04:16.640 | Then it would go back to the original flow

00:04:18.440 | and continue that.

00:04:20.360 | Don't know if that was ever tried

00:04:21.800 | or is there a more formal definition of it,

00:04:23.800 | because I just came up with it randomly.

00:04:25.920 | But it felt like the beginnings of AI,

00:04:27.960 | in the sense that when you run a CLI command,

00:04:30.000 | you have an intent to do something.

00:04:32.040 | And you may not have given the CLI all the things

00:04:34.220 | that it needs to execute that intent.

00:04:37.640 | So that was my two cents.

00:04:39.520 | Yeah, that reminds me of a thing we sort of thought

00:04:43.160 | about when writing the CLI guidelines, where CLIs were

00:04:47.720 | designed in a world where the CLI was really

00:04:50.080 | a programming environment.

00:04:51.680 | And it was primarily designed for machines

00:04:54.840 | to use all of these commands and scripts.

00:04:57.560 | Whereas over time, the CLI has evolved to humans--

00:05:05.240 | it was back in a world where the primary way of using

00:05:08.360 | and computers was writing shell scripts, effectively.

00:05:12.440 | And we've transitioned to a world

00:05:14.960 | where, actually, humans are using CLI programs

00:05:17.240 | much more than they used to.

00:05:19.320 | And the current best practices about how Unix was designed--

00:05:27.280 | there's lots of design documents about Unix

00:05:29.240 | from the '70s and '80s, where they say things like,

00:05:33.080 | command line commands should not output anything on success.

00:05:37.560 | It should be completely silent, which

00:05:40.040 | makes sense if you're using it in a shell script.

00:05:42.120 | But if a user is using that, it just looks like it's broken.

00:05:45.640 | If you type copy and it just doesn't say anything,

00:05:47.780 | you assume that it didn't work as a new user.

00:05:52.120 | And yeah, so I think what's really interesting about the CLI

00:06:00.040 | is that it's actually a really good--

00:06:02.160 | to your point, it's a really good user interface

00:06:05.960 | where it can be like a conversation, where

00:06:08.920 | it feels like you're-- instead of just you

00:06:10.840 | telling the computer to do this thing

00:06:12.800 | and either silently succeeding or saying, no, you did--

00:06:16.720 | failed, it can guide you in the right direction

00:06:20.280 | and tell you what your intent might be

00:06:22.680 | and that kind of thing in a way that's actually--

00:06:25.680 | it's almost more natural to a CLI

00:06:27.280 | than it is in a graphical user interface

00:06:28.960 | because it feels like this back and forth with the computer,

00:06:31.760 | almost funnily like a language model.

00:06:36.920 | So I think there's some interesting intersection

00:06:39.040 | of CLIs and language models actually

00:06:41.160 | being very closely related and a good fit for each other.

00:06:46.240 | FRANCESC CAMPOY: Yeah.

00:06:47.200 | I would say one of the surprises from last year--

00:06:49.600 | I worked on a coding agent, but I

00:06:51.120 | think the most successful coding agent of my cohort

00:06:53.800 | was Open Interpreter, which was a CLI implementation.

00:06:56.740 | And I have chronically-- even as a CLI person,

00:06:59.280 | I have chronically underestimated the CLI

00:07:01.320 | as a useful interface.

00:07:05.020 | You also developed ArchiveVanity,

00:07:06.480 | which you recently retired after a glorious seven years.

00:07:10.120 | Something like that, yeah.

00:07:11.200 | Something like that, which is nice, I guess, HTML PDFs.

00:07:16.360 | Yeah, that was actually the start

00:07:19.240 | of where Replicate came from.

00:07:20.720 | OK, we can tell that story.

00:07:22.120 | Which-- so when I quit Docker, I got really interested

00:07:26.920 | in science infrastructure, just as a problem area,

00:07:32.560 | because it is--

00:07:36.080 | science has created so much progress in the world.

00:07:38.760 | The fact that we can talk to each other on a podcast,

00:07:42.680 | and we use computers, and the fact that we're alive

00:07:44.760 | is probably thanks to medical research.

00:07:46.800 | But science is just completely archaic and broken.

00:07:49.800 | And it's like 19th century processes

00:07:51.560 | that just happen to be copied to the internet

00:07:55.080 | rather than taken into account that we can transfer information

00:07:57.740 | at the speed of light now.

00:07:59.740 | And the whole way science is funded

00:08:01.240 | and all this kind of thing is all very broken.

00:08:04.040 | There's just so much potential for making science work better.

00:08:06.840 | And I realized that I wasn't a scientist,

00:08:08.560 | and I didn't really have any time to go and get a PhD

00:08:11.000 | and become a researcher.

00:08:12.520 | But I'm a tool builder, and I could make existing scientists

00:08:15.400 | better at their job.

00:08:16.240 | And if I could make a bunch of scientists a little bit better

00:08:19.040 | at their job, maybe that's the kind of equivalent

00:08:21.240 | of being a researcher.

00:08:23.600 | So one particular thing I dialed in on

00:08:27.040 | is just how science is disseminated,

00:08:28.960 | in that it's all of these PDFs, quite often behind paywalls

00:08:37.600 | on the internet.

00:08:39.400 | And that's a whole thing, because it's

00:08:41.400 | funded by national grants, government grants,

00:08:44.320 | and then they're put behind paywalls.

00:08:46.120 | Yeah, exactly.

00:08:47.080 | That's like a whole--

00:08:48.040 | yeah, I could talk for hours about that.

00:08:49.660 | But the particular thing we got dialed in on was--

00:08:54.080 | or I got kind of--

00:08:55.680 | but interestingly, these PDFs are also--

00:08:58.600 | there's a bunch of open science that happens as well.

00:09:00.800 | So math, physics, computer science, machine learning,

00:09:03.680 | notably, is all published on the Archive, which is actually

00:09:09.480 | a surprisingly old institution.

00:09:10.960 | Some random Cornell science project.

00:09:12.520 | Yeah, it was just like somebody in Cornell who started

00:09:13.940 | a mailing list in the '80s.

00:09:15.720 | And then when the web was invented,

00:09:17.940 | they built a web interface around it.

00:09:19.440 | Like, it's super old.

00:09:22.000 | And it's kind of like a user group thing, right?

00:09:24.480 | That's why all these numbers and stuff.

00:09:26.320 | Yeah, exactly.

00:09:27.080 | Like, it's a bit like Usenet or something.

00:09:31.040 | And that's where basically all of math, physics,

00:09:34.520 | and computer science happens.

00:09:36.600 | But it's still PDFs published to this thing, which

00:09:39.200 | is just so infuriating.

00:09:42.200 | So the web was invented at CERN, a physics institution,

00:09:50.280 | to share academic writing.

00:09:52.040 | Like, there are figure tags.

00:09:54.240 | There are author tags.

00:09:55.920 | There are heading tags.

00:09:56.920 | There are site tags.

00:09:57.920 | Hyperlinks are effectively citations,

00:10:00.100 | because you want to link to another academic paper.

00:10:02.280 | But instead, you have to copy and paste these things

00:10:04.320 | and try and get around paywalls.

00:10:05.600 | Like, it's absurd.

00:10:07.920 | And now we have social media and things,

00:10:10.040 | but still academic papers as PDFs.

00:10:13.840 | It's just like, why?

00:10:14.800 | This is not what the web was for.

00:10:17.720 | So anyway, I got really frustrated with that.

00:10:19.600 | And I went on vacation with my old friend Andreas.

00:10:22.080 | So we used to work together in London

00:10:23.840 | on a startup at somebody else's startup.

00:10:26.600 | And we were just on vacation in Greece for fun.

00:10:29.800 | And he was trying to read a machine learning

00:10:32.280 | paper on his phone.

00:10:33.520 | We had to zoom in and scroll line by line on the PDF.

00:10:37.400 | And he was like, this is fucking stupid.

00:10:39.720 | And I was like, I know.

00:10:40.760 | This is something.

00:10:41.480 | We discovered our mutual hatred for this.

00:10:44.880 | And we spent our vacation sitting by the pool,

00:10:48.080 | making LaTeX to HTML converters and making the first version

00:10:53.160 | of Archive Vanity.

00:10:55.240 | Anyway, that was then a whole thing.

00:10:56.920 | And the story-- we shut it down recently

00:10:59.920 | because they caught the eye of Archive, who were like,

00:11:03.520 | oh, this is great.

00:11:04.320 | We just haven't had the time to work on this.

00:11:06.200 | And what's tragic about the Archive is it is like this

00:11:09.400 | department of--

00:11:10.320 | it's like this project of Cornell that's like,

00:11:12.920 | they can barely scrounge together enough money to survive.

00:11:15.320 | I think it might be better funded now

00:11:16.860 | than when we were collaborating with them.

00:11:19.480 | And compared to these scientific journals,

00:11:21.760 | this is actually where the work happens.

00:11:23.480 | But they just have a fraction of the money

00:11:25.240 | that these big scientific journals have,

00:11:27.440 | which is just so tragic.

00:11:29.240 | But anyway, they were like, yeah, this is great.

00:11:31.240 | We can't afford to do it.

00:11:32.560 | But do you want to, as a volunteer,

00:11:34.060 | integrate Archive Vanity into Archive?

00:11:36.600 | Oh, you did the work.

00:11:37.720 | We didn't do the work.

00:11:38.680 | We started doing the work.

00:11:39.600 | We did some.

00:11:40.200 | I think we worked on this for a few months

00:11:42.280 | to actually get it integrated into Archive.

00:11:44.760 | And then we got distracted by Replicate.

00:11:48.980 | So a guy called Dayan picked up the work

00:11:52.420 | and made it happen, like somebody

00:11:54.380 | who works on one of the libraries that

00:11:56.820 | powers Archive Vanity.

00:11:58.340 | OK, and relationship with Archive Sanity?

00:12:02.100 | None.

00:12:02.900 | Did you predate them?

00:12:03.980 | I actually don't know the lineage.

00:12:05.540 | We were after-- we both were both users of Archive Sanity,

00:12:08.300 | which is like a sort of Archive--

00:12:09.740 | Which is Andre's--

00:12:10.540 | [INTERPOSING VOICES]

00:12:11.700 | --like Rexis on top of Archive.

00:12:13.100 | Yeah, yeah.

00:12:13.780 | And we were both users of that.

00:12:15.240 | And I think we were trying to come up

00:12:16.240 | with a working name for Archive.

00:12:17.680 | And Andreas just like cracked a joke of like,

00:12:19.900 | oh, let's call it Archive Vanity.

00:12:21.280 | Let's make the papers look nice.

00:12:22.840 | Yeah, yeah.

00:12:23.360 | And that was the working name.

00:12:24.200 | And it just stuck.

00:12:25.240 | Got it.

00:12:25.800 | Got it.

00:12:27.040 | Yeah.

00:12:28.000 | And then from there, tell us more

00:12:29.440 | about why you got distracted, right?

00:12:31.480 | So Replicate maybe feels like an overnight success

00:12:34.240 | to a lot of people.

00:12:35.740 | But you've been building this since 2019.

00:12:38.480 | Yeah.

00:12:39.000 | So what prompted the start?

00:12:40.800 | And we've been collaborating for even longer.

00:12:42.680 | We created Archive Vanity in 2017.

00:12:45.840 | So in some sense, we've been doing this almost like six,

00:12:48.180 | seven years now.

00:12:49.320 | A classic seven-year--

00:12:50.760 | Overnight success.

00:12:51.600 | Yeah.

00:12:52.100 | [LAUGHTER]

00:12:53.640 | Yes, we did Archive Vanity, and then worked

00:12:55.900 | on a bunch of surrounding projects.

00:12:57.360 | I was still really interested in science publishing

00:12:59.440 | at that point.

00:13:01.520 | And I'm trying to remember--

00:13:02.800 | because I tell a lot of the condensed story to people,

00:13:04.960 | because I can't really tell a seven-year history.

00:13:06.720 | So I'm trying to figure out the right--

00:13:08.360 | Oh, we got room for you.

00:13:09.640 | --the right length to--

00:13:10.760 | We want to nail the definitive Replicate story here.

00:13:13.240 | One thing that's really interesting about these machine

00:13:15.480 | learning papers is that these machine learning papers

00:13:19.240 | are published on the archive.

00:13:21.080 | And a lot of them are actual fundamental research,

00:13:23.520 | so should be prose describing a theory.

00:13:27.280 | But a lot of them are just running pieces of software

00:13:31.440 | that a machine learning researcher

00:13:32.880 | made that did something.

00:13:34.240 | It was like an image classification

00:13:38.940 | model or something.

00:13:40.040 | And they managed to make an image classification

00:13:42.380 | model that was better than the existing state of the art.

00:13:46.360 | And they've made an actual running piece of software

00:13:48.520 | that does image segmentation.

00:13:50.960 | And then what they had to do is they then

00:13:52.920 | had to take that piece of software

00:13:55.040 | and write it up as prose and math in a PDF.

00:13:58.640 | And what's frustrating about that is if you want to--

00:14:04.960 | so this was Andreas.

00:14:06.640 | Andreas was a machine learning engineer at Spotify.

00:14:09.480 | And some of his job was--

00:14:11.780 | he did pure research as well.

00:14:13.120 | He did a PhD, and he was doing a lot of stuff internally.

00:14:15.480 | But part of his job was also being an engineer

00:14:17.720 | and taking some of these existing things

00:14:20.340 | that people have made and published

00:14:22.120 | and trying to apply them to actual problems at Spotify.

00:14:25.760 | And he was like--

00:14:26.920 | you get given a paper, which describes

00:14:30.680 | roughly how the model works.

00:14:31.960 | It's probably listing lots of crucial information.

00:14:34.360 | There's sometimes code on GitHub.

00:14:36.060 | More and more, there's code on GitHub.

00:14:37.640 | Back then, it was kind of relatively rare.

00:14:40.880 | But it was quite often just scrappy research code

00:14:42.880 | and didn't actually run.

00:14:44.960 | And there was maybe the weights that were on Google Drive,

00:14:47.520 | but they accidentally deleted the weights off Google Drive.

00:14:50.240 | And it was really hard to take this stuff

00:14:52.560 | and actually use it for real things.

00:14:54.760 | And we just started talking together

00:14:57.520 | about his problems at Spotify.

00:15:00.200 | And I connected this back to my work at Docker as well.

00:15:03.640 | I was like, oh, this is what we created containers for.

00:15:07.560 | We solved this problem for normal software

00:15:09.040 | by putting the thing inside a container

00:15:10.740 | so you could ship it around and it kept on running.

00:15:13.000 | So we were sort of hypothesizing about,

00:15:16.880 | hmm, what if we put machine learning

00:15:18.440 | models inside containers so that they could actually

00:15:21.280 | be shipped around and they could be defined

00:15:23.080 | in some production-ready formats?

00:15:25.720 | And other researchers could run them to generate baselines.

00:15:29.040 | And people who wanted to actually apply them

00:15:31.200 | to real problems in the world could just pick up the container

00:15:33.800 | and run it.

00:15:36.800 | And we then thought, this is where it gets--

00:15:40.980 | normally, in this part of the story,

00:15:42.520 | I skip forward to be like, and then we

00:15:44.140 | created Cog, this container stuff for machine learning

00:15:47.060 | models.

00:15:47.560 | And we created Replicate, the place

00:15:48.920 | for people to publish these machine learning models.

00:15:50.480 | But there's actually like two or three years between that.

00:15:53.640 | The thing we then got dialed into

00:15:55.640 | was Andreas was like, what if there was a CI

00:15:58.840 | system for machine learning?

00:16:00.200 | Because one of the things he really

00:16:01.660 | struggled with as a researcher is generating baselines.

00:16:05.000 | So when he's writing a paper, he needs

00:16:06.680 | to get five other models that are existing in work

00:16:10.960 | and get them running--

00:16:12.460 | FRANCESC CAMPOY: On the same evals.

00:16:13.920 | MARK MANDEL: Exactly, on the same evals

00:16:15.400 | so you can compare apples to apples,

00:16:16.540 | because you can't trust the numbers in the paper.

00:16:18.540 | So--

00:16:19.040 | FRANCESC CAMPOY: Or you can be Google

00:16:20.540 | and just publish them anyway.

00:16:21.800 | [LAUGHTER]

00:16:24.560 | MARK MANDEL: So he was like, what if you could--

00:16:28.340 | I think this was coming from the thinking of,

00:16:30.180 | there should be containers for machine learning,

00:16:31.840 | but why are people going to use that?

00:16:33.520 | OK, maybe we can create a supply of containers

00:16:36.200 | by creating this useful tool for researchers.

00:16:39.080 | And the useful tool was like, let's get researchers

00:16:41.960 | to package up their models and push them

00:16:43.580 | to the central place where we run a standard set of benchmarks

00:16:46.560 | across the models so that you can trust those results

00:16:51.200 | and you can compare these models apples to apples.

00:16:53.200 | And for a researcher, for Andreas,

00:16:54.600 | doing a new piece of research, he could trust those numbers.

00:16:57.800 | And he could pull down those models,

00:17:02.440 | confirm it on his machine, use the standard benchmark

00:17:04.560 | to then measure his model, and all this kind of stuff.

00:17:08.480 | And so we started building that.

00:17:10.320 | That's what we applied to YC with.

00:17:12.560 | We got into YC, and we started building a prototype of this.

00:17:16.000 | And then this is where it all starts to fall apart.

00:17:19.160 | We were like, OK, that sounds great.

00:17:20.680 | And we talked to a bunch of researchers,

00:17:21.680 | and they really wanted that.

00:17:22.320 | And that sounds brilliant.

00:17:22.960 | That's a great way to create a supply of models

00:17:24.920 | on this research platform.

00:17:26.520 | But how the hell is this a business?

00:17:28.480 | How are we even going to make any money out of this?

00:17:30.640 | And we're like, oh, shit, that's the real unknown here

00:17:33.200 | of what the business is.

00:17:35.560 | So we thought it would be a really good idea to--

00:17:42.200 | OK, before we get too deep into this,

00:17:44.880 | let's try and reduce the risk of this turning into a business.

00:17:49.720 | So let's try and research what the business could

00:17:51.880 | be for this research tool, effectively.

00:17:57.360 | So we went and talked to a bunch of companies trying

00:18:00.660 | to sell them something which didn't exist.

00:18:02.360 | So we're like, hey, do you want a way

00:18:04.040 | to share research inside your company

00:18:06.320 | so that other researchers, or say the product manager,

00:18:09.000 | can test out the machine learning model?

00:18:11.240 | Maybe.

00:18:12.760 | And we were like, do you want a deployment platform

00:18:18.240 | for deploying models?

00:18:20.360 | Do you want a central place for versioning models?

00:18:22.880 | We're trying to think of lots of different products

00:18:24.960 | we could sell that were related to this thing.

00:18:27.640 | And terrible idea.

00:18:29.700 | We're not salespeople, and people

00:18:32.100 | don't want to buy something that doesn't exist.

00:18:34.640 | I think some people can pull this off,

00:18:36.180 | but we were just a bunch of product people, products

00:18:39.540 | and engineer people, and we just couldn't pull this off.

00:18:42.620 | So we then got halfway through our YC batch.

00:18:44.620 | We didn't have-- we hadn't built a product.

00:18:46.540 | We had no users.

00:18:47.480 | We had no idea what our business was going to be,

00:18:49.560 | because we couldn't get anybody to buy something

00:18:51.560 | which didn't exist.

00:18:53.860 | And actually, this was quite a way through our--

00:18:55.860 | I think it was like 2/3 the way through our YC batch

00:18:57.780 | or something.

00:18:58.300 | So we're like, OK, well, we're kind of screwed now,

00:19:00.460 | because we don't have anything to show at demo day.

00:19:02.740 | And then we then tried to figure out, OK,

00:19:05.780 | what can we build in two weeks that will be something?

00:19:09.060 | So we desperately tried to--

00:19:10.260 | I can't remember what we tried to build at that point.

00:19:13.300 | And then two weeks before demo day, I just remember this.

00:19:16.740 | I remember it was all--

00:19:22.580 | we were going down to Mountain View every week for dinners,

00:19:25.000 | and we got called on to an all-hands Zoom

00:19:26.620 | call, which was super weird.

00:19:27.860 | We're like, what's going on?

00:19:29.100 | And they were like, don't come to dinner tomorrow.

00:19:33.900 | And we realized-- we kind of looked at the news,

00:19:37.400 | and we were like, oh, there's a pandemic going on.

00:19:40.420 | We were so deep in our startup, we

00:19:42.020 | were just completely oblivious to what was going on around us.

00:19:44.640 | Was this Jan or Feb 2020?

00:19:46.540 | This was March 2020.

00:19:48.020 | March 2020.

00:19:48.540 | 2020, yeah.

00:19:49.340 | Because I remember Silicon Valley at the time

00:19:51.260 | was early to COVID.

00:19:52.940 | Like, they started locking down a lot faster

00:19:55.060 | than the rest of the US.

00:19:56.060 | And I remember soon after that, there

00:19:58.740 | was the San Francisco lockdowns.

00:20:00.580 | And then the YC batch just stopped.

00:20:02.900 | There wasn't demo day.

00:20:05.180 | And it was, in a sense, a blessing for us,

00:20:07.820 | because we just kind of couldn't raise money anyway.

00:20:11.520 | FRANCESC CAMPOY: In the normal course of events,

00:20:13.480 | you're actually allowed to defer to a future demo day.

00:20:16.140 | Yeah.

00:20:16.620 | So we didn't even take any defer,

00:20:17.700 | because it just kind of didn't happen.

00:20:19.240 | [LAUGHTER]

00:20:21.940 | So was YC helpful?

00:20:23.420 | Yes.

00:20:24.380 | We completely screwed up the batch,

00:20:25.840 | and that was our fault. I think the thing

00:20:27.620 | that YC has become incredibly valuable for us

00:20:30.220 | has been after YC.

00:20:33.700 | I think there was a reasonable argument

00:20:36.860 | that we didn't need to do YC to start with, because we

00:20:40.660 | were quite experienced.

00:20:42.340 | We had done some startups before.

00:20:45.220 | We were kind of well-connected with VCs.

00:20:47.020 | It was relatively easy to raise money,

00:20:48.600 | because we were a known quantity.

00:20:50.180 | If you go to a VC and be like, hey, I made this piece of--

00:20:53.020 | It's Docker-composed for AI.

00:20:54.280 | Exactly.

00:20:54.780 | Yeah, and people can pattern match like that,

00:20:59.020 | and they can have some trust you know what you're doing.

00:21:01.380 | Whereas it's much harder for people straight out of college,

00:21:03.540 | and that's where YC's sweet spot is helping people straight

00:21:05.740 | out of college who are super promising figure out

00:21:07.780 | how to do that.

00:21:08.300 | Yeah, no credentials.

00:21:09.140 | Yeah, exactly.

00:21:09.640 | So in some sense, we didn't need that.

00:21:11.180 | But the thing that's been incredibly useful for us

00:21:13.220 | since YC has been--

00:21:15.980 | this was actually, I think--

00:21:17.980 | so Docker was a YC company.

00:21:20.500 | And Solomon, the founder of Docker, I think, told me this.

00:21:22.900 | He was like, a lot of people underestimate the value of YC

00:21:26.940 | after you finish the batch.

00:21:29.140 | And his biggest regret was not staying in touch with YC.

00:21:32.780 | I might be misattributing this, but I think it was him.

00:21:35.620 | And so we made a point of that, and we just

00:21:37.360 | stayed in touch with our batch partner, who--

00:21:39.900 | Jared at YC has been fantastic.

00:21:41.660 | Jared Harris.

00:21:43.120 | Jared Friedman.

00:21:43.860 | Friedman.

00:21:44.740 | And all of the team at YC--

00:21:47.540 | there was the growth team at YC when they were still there,

00:21:49.980 | and they've been super helpful.

00:21:52.660 | And two things that have been super helpful about that

00:21:57.060 | is raising money.

00:21:58.260 | They just know exactly how to raise money,

00:22:00.100 | and they've been super helpful during that process

00:22:01.580 | in all of our rounds.

00:22:02.540 | We've done three rounds since we did YC,

00:22:04.180 | and they've been super helpful during the whole process.

00:22:06.860 | And also just reaching a ton of customers.

00:22:09.500 | So the magic of YC is that you have all of--

00:22:11.700 | there's thousands of YC companies,

00:22:13.220 | I think, on the order of thousands, I think.

00:22:16.180 | And they're all of your first customers.

00:22:18.700 | And they're super helpful, super receptive,

00:22:20.980 | really want to try out new things.

00:22:23.900 | You have a warm intro to every one of them, basically.

00:22:26.500 | And there's this mailing list where

00:22:27.960 | you can post about updates to your product, which

00:22:31.860 | is really receptive.

00:22:33.620 | And that's just been fantastic for us.

00:22:35.340 | We've just got so many of our users and customers

00:22:39.940 | through YC.

00:22:41.580 | Yeah, well, so the classic criticism

00:22:43.780 | or the pushback is people don't buy you

00:22:47.740 | because you are both from YC.

00:22:51.380 | But at least they'll open the email.

00:22:52.980 | Yeah.

00:22:53.480 | Right?

00:22:53.980 | That's the-- OK.

00:22:54.780 | Yeah, effectively.

00:22:56.820 | And yeah, so that's been a really, really positive

00:23:00.980 | experience for us.

00:23:02.100 | And sorry, I interrupted with the YC question.

00:23:05.340 | You just made it out of the YC, survived the pandemic.

00:23:09.540 | And you-- yeah.

00:23:10.580 | I'll try and condense this a little bit.

00:23:12.780 | Then we started building tools for COVID, weirdly.

00:23:15.100 | We were like, OK, we don't have a startup.

00:23:16.420 | We haven't figured out anything.

00:23:17.820 | What's the most useful thing we could be doing right now?

00:23:20.900 | Save lives.

00:23:21.700 | So yeah, let's try and save lives.

00:23:23.640 | I think we failed at that as well.

00:23:25.020 | We had a bunch of products that didn't really go anywhere.

00:23:28.340 | We worked on a bunch of stuff, like contact tracing,

00:23:31.940 | which didn't really be a useful thing.

00:23:36.060 | Andreas worked on a door dash for people delivering food

00:23:42.780 | to people who are vulnerable.

00:23:45.380 | What else did we do?

00:23:46.220 | We met a problem of helping people direct their efforts

00:23:48.540 | to what was most useful and a few other things like that.

00:23:51.740 | It didn't really go anywhere.

00:23:52.980 | So we're like, OK, this is not really working either.

00:23:55.820 | We were considering actually just doing work for COVID.

00:23:58.780 | We have this decision document early on in our company, which

00:24:01.300 | is like, should we become a government app contracting

00:24:04.220 | shop?

00:24:06.860 | We decided no.

00:24:07.700 | Because you also did work for the US--

00:24:09.740 | for the gov.uk.

00:24:10.740 | Yeah, exactly.

00:24:11.340 | We had experience doing some--

00:24:14.780 | And the Guardian and all that.

00:24:16.060 | Yeah, for government stuff.

00:24:18.100 | And we were just really good at building stuff.

00:24:20.060 | We were just product people.

00:24:21.460 | I was the front end product side,

00:24:23.060 | and Andreas was the back end side.

00:24:24.520 | So we were just a product.

00:24:25.940 | And we were working with a designer at the time,

00:24:28.140 | a guy called Mark, who did our early designs for Replicate.

00:24:30.660 | And we were like, hey, what if we just team up and become it

00:24:33.240 | and build stuff?

00:24:35.020 | But yeah, we gave up on that in the end for--

00:24:36.940 | I can't remember the details.

00:24:38.540 | So we went back to machine learning.

00:24:42.340 | And then we were like, well, we're

00:24:44.580 | not really sure if this is going to work.

00:24:46.340 | And one of my most painful experiences

00:24:49.420 | from previous startups is shutting them down,

00:24:51.380 | when you realize it's not really working

00:24:52.540 | and having to shut it down.

00:24:53.380 | It's a ton of work.

00:24:54.260 | And people hate you.

00:24:55.700 | And it's just sort of, you know--

00:24:59.380 | so we were like, how can we make something

00:25:01.040 | we don't have to shut down?

00:25:02.460 | And even better, how can we make something

00:25:04.240 | that won't page us in the middle of the night?

00:25:07.820 | So we made an open source project.

00:25:10.700 | We made a thing which was an open source weights and biases,

00:25:14.940 | because we had this theory that people want open source tools.

00:25:18.780 | There should be an open source version control experiment

00:25:21.580 | tracking-like thing.

00:25:22.860 | And it was intuitive to us.

00:25:24.300 | And we were like, oh, we're software developers.

00:25:26.300 | And we like command line tools.

00:25:27.340 | Everyone loves command line tools and open source stuff.

00:25:30.100 | But machine learning researchers just really didn't care.

00:25:31.940 | They just wanted to click on buttons.

00:25:33.480 | They didn't mind that it was a cloud service.

00:25:35.420 | It was all very visual as well, that you

00:25:37.380 | need lots of graphs and charts and stuff like this.

00:25:40.620 | So it just didn't-- it wasn't right.

00:25:43.460 | Like, it was right.

00:25:44.300 | We were actually rebuilding something

00:25:45.340 | that Andreas made at Spotify for just saving experiments

00:25:47.980 | to cloud storage automatically.

00:25:50.020 | But other people didn't really want this.

00:25:52.420 | So we kind of gave up on that.

00:25:54.680 | And then that was actually originally called Replicate.

00:25:56.940 | And we renamed that out of the way.

00:25:58.400 | So it's now called Keepsake.

00:25:59.680 | And I think some people still use it.

00:26:01.220 | Then we sort of came back-- we looped back

00:26:03.140 | to our original idea.

00:26:05.900 | So we were like, oh, maybe there was a thing.

00:26:07.820 | And that thing we were originally

00:26:09.220 | thinking about of researchers sharing

00:26:11.020 | their work in containers for machine learning models.

00:26:13.740 | So we just built that.

00:26:15.000 | And at that point, we were kind of running out of the YC money.

00:26:17.700 | So we were like, OK, this feels good, though.

00:26:19.580 | Let's give this a shot.

00:26:20.980 | So that was the point we raised a seed round.

00:26:23.260 | We raised--

00:26:24.820 | - Pre-launch.

00:26:25.900 | - We raised pre-launch, pre-launch and pre-team.

00:26:29.060 | It was an idea, basically.

00:26:30.140 | We had a little prototype.

00:26:31.220 | It was just an idea and a team.

00:26:34.060 | But we were like, OK, bootstrapping this thing

00:26:38.700 | is getting hard, so let's actually raise some money.

00:26:42.460 | And then we made Cog and Replicate.

00:26:46.940 | It initially didn't have APIs, interestingly.

00:26:49.500 | It was just the bit that I was talking about before,

00:26:52.080 | of helping researchers share their work.

00:26:53.780 | So it was a way for researchers to put their work on a web page

00:26:59.180 | such that other people could try it out,

00:27:02.420 | and so that you could download the Docker container.

00:27:04.580 | So that we didn't have--

00:27:05.940 | we cut the benchmarks thing of it, because we thought

00:27:08.140 | it was too complicated.

00:27:09.580 | But it had a Docker container that Andreas, in a past life,

00:27:13.300 | could download and run with his benchmark.

00:27:15.740 | And you could compare all these models apples to apples.

00:27:18.220 | So that was the theory behind it.

00:27:20.300 | And that kind of started to work.

00:27:24.500 | It was still when it was long time pre-AI hype.

00:27:29.740 | And there was lots of interesting stuff going on.

00:27:31.740 | But it was very much in the classic deep learning era,

00:27:35.060 | so image segmentation models, and sentiment analysis,

00:27:39.700 | and all these kind of things that people were using deep

00:27:43.900 | learning models for.

00:27:44.900 | And we were very much building for research,

00:27:46.740 | because all of this stuff was happening

00:27:48.580 | in research institutions.

00:27:49.860 | These are people who'd be publishing to archive.

00:27:51.820 | So we were creating an accompanying material

00:27:54.500 | for their models, basically.

00:27:55.720 | They wanted a demo for their models.

00:27:57.900 | And we were creating accompanying material for it.

00:28:01.540 | And what was funny about that is they

00:28:04.140 | were not very good users.

00:28:06.500 | They were doing great work, obviously.

00:28:09.100 | But the way the research worked is

00:28:10.600 | that they just made one thing every six months,

00:28:13.820 | and they just fired and forgot it.

00:28:16.060 | They published this piece of paper, and like, done.

00:28:18.780 | I've published it.

00:28:20.660 | So they output it to Replicate.

00:28:22.940 | And then they just stopped using Replicate.

00:28:25.100 | They were like once every six monthly users.

00:28:28.420 | And that wasn't great for us.

00:28:30.940 | But we stumbled across this early community.

00:28:34.220 | This was early 2021, when people started--

00:28:41.540 | OpenAI created this-- created Clip.

00:28:43.980 | And people started smushing Clip and GANs together

00:28:46.940 | to produce image generation models.

00:28:49.420 | And this started with--

00:28:51.940 | it was just a bunch of tinkerers on Discord, basically.

00:28:56.860 | It was-- there was an early model called BigSleep

00:29:01.740 | by AdVadNown.

00:29:03.980 | And then there was VQGAN Clip, which

00:29:05.900 | was a bit more popular, by Rivers Have Wings.

00:29:08.560 | And it was all just people tinkering on stuff in Colabs.

00:29:10.900 | And it was very dynamic.

00:29:11.940 | And it was people just making copies of Colabs

00:29:13.420 | and playing around with things and forking.

00:29:15.300 | And to me, I saw this, and I was like, oh, this

00:29:17.300 | feels like open source software, so much more

00:29:19.180 | than the research world, where people

00:29:21.620 | are publishing these papers.

00:29:22.820 | Yeah, you don't know their real names,

00:29:24.020 | and it's just like a Discord.

00:29:25.420 | Yeah, exactly.

00:29:26.100 | But crucially, it was like people

00:29:27.440 | were tinkering and forking.

00:29:28.620 | And people were-- things were moving really fast.

00:29:30.780 | And it just felt like this creative, dynamic,

00:29:34.940 | collaborative community in a way that research wasn't really.

00:29:41.460 | Like, it was still stuck in this kind of six-month publication

00:29:44.040 | cycle.

00:29:44.940 | So we just kind of latched onto that

00:29:46.660 | and started building for this community.

00:29:51.220 | And a lot of those early models were published on Replicate.

00:29:55.460 | I think the first one that was really primarily on Replicate

00:29:58.580 | was one called Pixray, which was sort of mid-2021.

00:30:04.500 | And it had a really cool pixel art output,

00:30:06.420 | but it also just produced--

00:30:07.880 | they weren't crisp in images, but they

00:30:09.460 | were quite aesthetically pleasing,

00:30:10.880 | like some of these early image generation models.

00:30:13.140 | And that was published primarily on Replicate.

00:30:18.100 | And then a few other models around that

00:30:19.780 | were published on Replicate.

00:30:21.620 | And that's where we really started

00:30:23.040 | to find our early community and where we really found,

00:30:25.300 | oh, we've actually built a thing that people want.

00:30:29.100 | And they were great users as well,

00:30:30.700 | and people really want to try out these models.

00:30:32.700 | Lots of people were running the models on Replicate.

00:30:35.020 | We still didn't have APIs, though, interestingly.

00:30:37.220 | And this is another really complicated part of the story.

00:30:39.340 | We had no idea what a business model was still at this point.

00:30:41.660 | I don't think people could even pay for it.

00:30:43.460 | It's just these web forms where people could run the model.

00:30:47.020 | FRANCESC CAMPOY: Just before this API bit continues,

00:30:48.940 | just for historical interests, which discords were they,

00:30:51.420 | and how did you find them?

00:30:52.140 | Was this the Lion Discord?

00:30:53.620 | MARK MANDEL: Yeah, Lion--

00:30:54.140 | FRANCESC CAMPOY: This is Alutha.

00:30:54.740 | MARK MANDEL: Alutha, yeah.

00:30:55.620 | It was the Alutha one.

00:30:56.300 | FRANCESC CAMPOY: These two, right?

00:30:56.860 | MARK MANDEL: Alutha, I particularly remember.

00:30:58.780 | There was a channel where VikiGangClip--

00:31:00.860 | this was early 2021-- where VikiGangClip

00:31:02.700 | was set up as a Discord bot.

00:31:06.860 | And I just remember being completely just captivated

00:31:11.260 | by this thing.

00:31:11.980 | I was just playing around with it all afternoon

00:31:13.820 | and the sort of thing--

00:31:14.580 | FRANCESC CAMPOY: In Discord.

00:31:15.060 | MARK MANDEL: --where, oh, shit, it's 2 AM.

00:31:16.580 | FRANCESC CAMPOY: This is the beginnings of MidJourney.

00:31:18.340 | MARK MANDEL: Yeah, exactly.

00:31:18.860 | FRANCESC CAMPOY: And it was instability.

00:31:20.560 | MARK MANDEL: It was the start of MidJourney.

00:31:22.740 | And it's where that kind of user interface came from.

00:31:25.020 | What was beautiful about the user interface

00:31:26.540 | is you could see what other people are doing.

00:31:28.900 | And you could riff off other people's ideas.

00:31:32.180 | And it was just so much fun to just play around

00:31:35.260 | with this in a channel full of 100 people.

00:31:38.540 | And yeah, that just completely captivated me.

00:31:40.620 | And I'm like, OK, this is something.

00:31:43.620 | So we should get these things on Replicate.

00:31:46.780 | And yeah, that's where that all came from.

00:31:49.320 | FRANCESC CAMPOY: OK, sorry.

00:31:50.860 | I just wanted to capture that.

00:31:52.260 | MARK MANDEL: Yeah, yeah.

00:31:53.260 | FRANCESC CAMPOY: And then you moved on to--

00:31:54.780 | so was it APIs Next or was it Stable Diffusion Next?

00:31:56.940 | MARK MANDEL: It was APIs Next.

00:31:58.200 | And the APIs happened because one of our users--

00:32:02.700 | our web form had an internal API for making the web form work,

00:32:05.860 | like with an API that was called from JavaScript.

00:32:08.740 | And somebody reverse engineered that

00:32:12.820 | to start generating images with a script.

00:32:15.020 | They did web inspector, copy as curl,

00:32:18.500 | figure out what the API request was.

00:32:22.180 | And it wasn't secured or anything.

00:32:24.460 | FRANCESC CAMPOY: Of course not.

00:32:25.800 | MARK MANDEL: And they started generating a bunch of images.

00:32:28.300 | And we got tons of traffic.

00:32:29.500 | We're like, what's going on?

00:32:31.700 | And I think a sort of usual reaction to that would be like,

00:32:36.820 | hey, you're abusing our API to shut them down.

00:32:39.420 | And instead we're like, oh, this is interesting.

00:32:41.420 | Like, people want to run these models.

00:32:44.220 | So we documented the API in a Notion document,

00:32:48.500 | like our internal API in a Notion document,

00:32:50.860 | and messaged this person being like, hey,

00:32:54.620 | you seem to have found our API.

00:32:57.140 | Here's the documentation.

00:32:58.180 | That'll be like $1,000 a month, please, with a straight form

00:33:01.980 | that we just clicked some buttons to make.

00:33:04.540 | And they were like, sure, that sounds great.

00:33:06.340 | So that was our first customer.

00:33:08.780 | FRANCESC CAMPOY: $1,000 a month?

00:33:10.140 | MARK MANDEL: It was a surprising amount of money, yeah.

00:33:11.780 | FRANCESC CAMPOY: That's not casual.

00:33:13.300 | MARK MANDEL: It was on the order of $1,000 a month.

00:33:14.860 | FRANCESC CAMPOY: So was it a business?

00:33:17.020 | MARK MANDEL: It was the creator of PixRay.

00:33:20.420 | He generated NFT art.

00:33:23.220 | And so he made a bunch of art with these models

00:33:27.180 | and was selling these NFTs, effectively.

00:33:31.300 | And I think lots of people in his community

00:33:33.100 | were doing similar things.

00:33:34.220 | And he then referred us to other people

00:33:35.840 | who were also generating NFTs and trying to save models.

00:33:39.860 | And that was the start of our API business, yeah.

00:33:44.860 | And then we made an official API and actually

00:33:47.620 | added some billing to it so it wasn't just like a fixed fee.

00:33:52.720 | FRANCESC CAMPOY: And now people think of you as the host

00:33:55.020 | and models API business.

00:33:56.140 | MARK MANDEL: Yeah, exactly.

00:33:57.660 | But that just turned out to be our business.

00:33:59.500 | But what ended up being beautiful about this

00:34:02.380 | is it was really fulfilling, like the original goal of what

00:34:05.820 | we wanted to do is that we wanted to make this research

00:34:08.220 | that people were making accessible to other people

00:34:12.220 | and for it to be used in the real world.

00:34:14.460 | And this was just ultimately the right way

00:34:17.860 | to do it because all of these people making

00:34:19.900 | these generative models could publish them to replicate,

00:34:22.460 | and they wanted a place to publish it.

00:34:24.380 | And software engineers, like myself--

00:34:26.900 | I'm not a machine learning expert,

00:34:28.300 | but I want to use this stuff--

00:34:30.180 | could just run these models with a single line of code.

00:34:32.500 | And we thought, oh, maybe the Docker image is enough.

00:34:34.380 | But it's actually super hard to get the Docker image running

00:34:36.380 | on a GPU and stuff.

00:34:37.300 | So it really needed to be the hosted API for this to work

00:34:40.060 | and to make it accessible to software engineers.

00:34:42.100 | And we just wound our way to this--

00:34:45.340 | FRANCESC CAMPOY: Yeah, two years to the first paying customer.

00:34:47.940 | MARK MANDEL: Yeah, exactly.

00:34:49.620 | FRANCESC CAMPOY: Did you ever think about becoming

00:34:51.580 | MidJourney during that time?

00:34:53.220 | You have so much interest in image generation.

00:34:55.140 | MARK MANDEL: What could have been?

00:34:57.020 | I mean, you're doing fine, for the record, but you know.

00:35:01.020 | It was right there.

00:35:01.820 | You were playing with it.

00:35:04.100 | Yeah, I don't think it was our expertise.

00:35:06.740 | I think our expertise was DevTools rather than--

00:35:08.740 | MidJourney is almost like a consumer products.

00:35:12.420 | So I don't think it was our expertise.

00:35:14.300 | It certainly occurred to us.

00:35:16.220 | I think at the time, we were thinking about,

00:35:18.060 | like, oh, maybe we could hire some of these people

00:35:19.940 | in this community and make great models and stuff like this.

00:35:22.440 | But we ended up more being at the tooling.

00:35:26.380 | I think before, I was saying, I'm not really a researcher.

00:35:28.740 | I'm more like the tool builder, the behind the scenes.

00:35:30.500 | And I think both me and Andreas are like that.

00:35:32.420 | FRANCESC CAMPOY: Yeah.

00:35:33.380 | I think this is also like an illustration

00:35:35.260 | of the tool builder philosophy, something

00:35:37.220 | where you latch onto in DevTools, which

00:35:39.500 | is when you see people behaving weird,

00:35:41.500 | it's not their fault, it's yours.

00:35:43.940 | And you want to pave the cow paths, is what they say, right?

00:35:46.380 | Like, the unofficial paths that people are making,

00:35:48.540 | like, make it official and make it easy for them,

00:35:50.580 | and then maybe charge a bit of money.

00:35:52.780 | Yeah.

00:35:54.300 | And now fast forward a couple of years,

00:35:56.460 | you have two million developers using Replicate, maybe more.

00:35:59.940 | That was the last public number that I found.

00:36:01.980 | Two million-- I think that got mangled, actually, by--

00:36:04.700 | it's two million users.

00:36:05.900 | Not all those people are developers,

00:36:07.100 | but a lot of them are developers.

00:36:08.460 | Yeah.

00:36:09.780 | And then 30,000 paying customers was the number.

00:36:13.660 | That's awesome.

00:36:14.940 | Latent Space runs on Replicate.

00:36:16.620 | So we had a small podcaster, and we host--

00:36:18.860 | FRANCESC CAMPOY: We do a transcription on--

00:36:20.620 | MARK MANDEL: --Whisper diarization on Replicate.

00:36:23.300 | And we're paying.

00:36:24.180 | So we're late in space, and this is in the 30,000.

00:36:27.860 | You raised $40 million, Series B.

00:36:31.620 | I would say that maybe the stable diffusion time, August

00:36:34.740 | '22, was really when the company started to break out.

00:36:39.320 | Tell us a bit about that and the community that came out.

00:36:41.820 | And I know now you're expanding beyond just image generation.

00:36:45.500 | Yeah.

00:36:48.500 | I think we kind of set ourselves--

00:36:50.220 | we saw there was this really interesting generative image

00:36:52.660 | world going on.

00:36:53.300 | So we're building the tools for that community already, really.

00:36:59.420 | And we knew stable diffusion was coming out.

00:37:03.420 | We knew it was a really exciting thing.

00:37:05.040 | It was the best generative image model so far.

00:37:08.060 | I think the thing we underestimated

00:37:10.020 | was just what an inflection point it would be,

00:37:12.860 | where it was--

00:37:13.660 | it was-- I think Simon Willison put it this way,

00:37:19.020 | where he said something along the lines of,

00:37:20.820 | it was a model that was open source and tinkerable

00:37:24.580 | and good enough that it was just good enough

00:37:32.260 | and open source and tinkerable, such that it just took off

00:37:34.900 | in a way that none of the models had before.

00:37:37.540 | And what was really neat about stable diffusion

00:37:40.520 | is it was open source, so you could--

00:37:44.580 | compared to DALI, for example, which was equivalent quality,

00:37:48.340 | it was open source.

00:37:49.180 | So you could fork it and tinker on it.

00:37:50.760 | And the first week, we saw people making animation models

00:37:53.580 | out of it.

00:37:54.080 | We saw people make game texture models

00:37:57.420 | that use circular convolutions to make repeatable textures.

00:38:00.460 | We saw-- what else did we see?

00:38:03.740 | A few weeks later, people were fine-tuning it

00:38:05.620 | so you could put your face in these models.

00:38:07.820 | And all of these other--

00:38:09.940 | Textual inversion.

00:38:11.220 | Yeah, exactly.

00:38:12.020 | That happened a bit before that.

00:38:13.940 | And all of this innovation was happening all of a sudden.

00:38:18.140 | And people were publishing on Replicate

00:38:19.860 | because you could just publish arbitrary models on Replicate.

00:38:22.400 | So we had this supply of interesting stuff being built.

00:38:25.140 | But because it was a sufficiently good model,

00:38:28.580 | there was also just a ton of people building with it.

00:38:33.100 | They were like, oh, we can build products with this thing.

00:38:35.500 | And this was about the time where people were starting

00:38:37.380 | to get really interested in AI.

00:38:38.420 | So tons of product builders wanted to build stuff with it.

00:38:40.980 | And we were just sitting in there

00:38:41.900 | in the middle as the interface layer between all these people

00:38:44.580 | who wanted to build and all these machine learning

00:38:46.660 | experts who were building cool models.

00:38:48.860 | And that's really where it took off.

00:38:50.580 | We were just incredible supply, incredible demand.

00:38:53.000 | And we were just in the middle.

00:38:55.260 | And then, yeah, since then we've just grown and grown, really.

00:38:58.840 | And we've been building a lot for the indie hacker community,

00:39:02.080 | these individual tinkerers, but also startups,

00:39:04.340 | and a lot of large companies as well who are exploring

00:39:07.200 | and building AI things.

00:39:09.560 | And then the same thing happened middle of last year

00:39:13.880 | with language models and LLAMA2, where

00:39:16.280 | the same stable diffusion effect happened with LLAMA.

00:39:19.840 | And LLAMA2 was our biggest week of growth

00:39:21.640 | ever because tons of people wanted to tinker with it

00:39:23.760 | and run it.

00:39:25.000 | And since then, we've just been seeing a ton of growth

00:39:27.320 | in language models as well as image models.

00:39:29.720 | And yeah, we're just riding a lot of the interest that's

00:39:33.880 | going on in AI and all the people building in AI.

00:39:37.560 | FRANCESC CAMPOY: That's-- yeah, kudos.

00:39:39.160 | Right place, right time.

00:39:40.160 | But also took a while to position for the right place

00:39:44.480 | before the wave came.

00:39:46.360 | I'm curious if you have any insights

00:39:49.240 | on these different markets.

00:39:51.880 | So Peter Levels, notably a very loud person,

00:39:56.200 | very picky about his tools.

00:39:58.120 | I wasn't sure, actually, if he used you.

00:39:59.760 | He does because you cited him on your Series B blog post,

00:40:02.680 | and Danny Post might as well, his competitor,

00:40:05.240 | all in that wave.

00:40:07.040 | What are their needs versus the more enterprise or B2B type

00:40:12.760 | needs?

00:40:14.080 | Did you come to a decision point where you're like,

00:40:16.240 | OK, how serious are these indie hackers

00:40:18.280 | versus the actual businesses that

00:40:20.040 | are bigger and perhaps better customers because they're

00:40:22.720 | less churny?

00:40:23.840 | They're surprisingly similar because I

00:40:25.960 | think a lot of people right now want to use and build with AI,

00:40:29.480 | but they're not AI experts.

00:40:32.040 | And they're not infrastructure experts either.

00:40:34.080 | So they want to be able to use this stuff

00:40:35.780 | without having to figure out all the internals of the models

00:40:38.680 | and touch PyTorch and whatever.

00:40:42.040 | And they also don't want to be setting up and booting up

00:40:44.680 | servers.

00:40:46.800 | And that's the same all the way from indie hackers just

00:40:51.360 | getting started-- because obviously, you just

00:40:53.280 | want to get started as quickly as possible--

00:40:55.200 | all the way through to large companies

00:40:57.160 | who want to be able to use this stuff,

00:41:00.000 | but don't have all of the experts on stuff.

00:41:04.200 | I think some companies are quite--

00:41:07.680 | big companies like Google and so on that

00:41:09.380 | do actually have a lot of experts on stuff,

00:41:10.840 | but the vast majority of companies don't.

00:41:12.560 | And they're all software engineers

00:41:13.600 | who want to be able to use this AI stuff,

00:41:15.300 | but they just don't know how to use it.

00:41:17.080 | And it's like, you really need to be an expert.

00:41:19.320 | And it takes a long time to learn the skills

00:41:20.800 | to be able to use that.

00:41:21.760 | So they're surprisingly similar in that sense.

00:41:24.200 | And I think it's kind of also unfair of the indie community.

00:41:30.160 | They're not churning, surprisingly, or churny

00:41:32.360 | or spiky, surprisingly.

00:41:33.600 | They're building real established businesses,

00:41:36.040 | which is like, kudos to them of building

00:41:39.240 | these really large, sustainable businesses, often just

00:41:44.760 | as solo developers.

00:41:47.800 | And it's kind of remarkable how they can do that, actually.

00:41:50.260 | And it's in credit to a lot of their product skills.

00:41:53.320 | And we're just there to help them,

00:41:55.600 | being their machine learning team, effectively,

00:41:57.640 | to help them use all of this stuff.

00:41:59.760 | So we're actually making--

00:42:02.280 | a lot of these indie hackers are some of our largest customers,

00:42:04.880 | alongside some of our biggest customers

00:42:06.720 | that you would think would be spending a lot more money

00:42:12.560 | than them.

00:42:13.200 | Yeah.

00:42:13.720 | And we should name some of these.

00:42:14.760 | You have them on your landing page.

00:42:16.180 | You have BuzzFeed.

00:42:16.920 | You have Unsplash, Character AI.

00:42:21.400 | What do they power?

00:42:22.720 | What can you say about their usage?

00:42:24.260 | Yeah, totally.

00:42:24.880 | It's kind of various things.

00:42:28.200 | I'm trying to think.

00:42:28.960 | Let me actually think.

00:42:33.360 | What can I say about what customers?

00:42:35.680 | Well, I mean, I'm naming them because they're

00:42:37.120 | on your landing page.

00:42:38.000 | So you have logo rights.

00:42:40.480 | It's useful for people to--

00:42:42.640 | I'm not imaginative.

00:42:43.640 | I see-- monkeys see monkeys do, right?

00:42:45.480 | Like, if I see someone doing something that I want to do,

00:42:47.840 | then I'm like, OK, Replicate's great for that.

00:42:50.040 | So that's what I think about case studies on company

00:42:52.320 | landing pages, is that it's just a way of explaining,

00:42:55.000 | like, yep, this is something that we are good for.

00:42:57.800 | Yeah, totally.

00:42:58.800 | I mean, these companies are doing things

00:43:01.800 | all the way up and down the stack

00:43:03.300 | at different levels of sophistication.

00:43:05.320 | So Unsplash, for example, they actually

00:43:10.160 | publicly posted this story on Twitter

00:43:12.440 | where they're using Blip to annotate all

00:43:17.280 | of the images in their catalog.

00:43:19.180 | So they have lots of images in the catalog,

00:43:20.920 | and they want to create a text description of it

00:43:22.920 | so you can search for it.

00:43:24.160 | And they're annotating images with off-the-shelf open source

00:43:26.880 | model.

00:43:27.400 | We have this big library of open source models that you can run.

00:43:30.360 | And we've got lots of people who are running these open source

00:43:32.940 | models off the shelf.

00:43:34.280 | And then most of our larger customers

00:43:37.920 | are doing more sophisticated stuff.

00:43:39.880 | So they're fine-tuning the models.

00:43:42.200 | They're running completely custom models on us.

00:43:45.000 | And so a lot of these larger companies

00:43:47.640 | are using us for a lot of their inference.

00:43:54.000 | But it's a lot of custom models and them

00:43:56.400 | writing the Python themselves, because they've

00:43:58.800 | got machine learning experts on the team.

00:44:01.280 | And they're using us for their inference infrastructure

00:44:04.640 | effectively.

00:44:05.840 | So it's lots of different levels of sophistication,

00:44:08.080 | where some people are using these off-the-shelf models.

00:44:10.760 | Some people are fine-tuning models.

00:44:13.080 | Peter Levels is a great example, where a lot of his products

00:44:15.540 | are based off fine-tuning image models, for example.

00:44:19.420 | And then we've also got larger customers

00:44:21.120 | who are just using us as infrastructure,

00:44:23.060 | effectively, as servers.

00:44:25.760 | So yeah, it's all things up and down the stack.

00:44:30.000 | Let's talk a bit about Cog and the technical layer.

00:44:33.520 | So there are a lot of GPU clouds.

00:44:37.080 | I think people have different pricing points.

00:44:39.520 | And I think everybody tries to offer a different developer

00:44:41.940 | experience on top of it, which then lets you charge a premium.

00:44:46.080 | Why did you want to create Cog?

00:44:48.120 | What were some of the-- you worked at Docker.

00:44:49.960 | What were some of the issues with traditional container

00:44:52.240 | runtimes?

00:44:53.920 | And maybe, yeah, what were you surprised with as you built it?

00:44:57.080 | Cog came right from the start, actually,

00:44:58.920 | when we were thinking about this evaluation,

00:45:05.600 | the benchmarking system for machine learning researchers,

00:45:08.760 | where we wanted researchers to publish their models

00:45:11.520 | in a standard format that was guaranteed to keep on running,

00:45:16.000 | that you could replicate the results of.

00:45:17.920 | That's where the name came from.

00:45:19.640 | And we realized that we needed something like Docker

00:45:22.800 | to make that work.

00:45:24.920 | And I think it was just natural, from my point of view,

00:45:28.360 | obviously, that should be open source,

00:45:29.940 | that we should try and create some kind of open standard

00:45:32.080 | here that people can share.

00:45:33.200 | Because if more people use this format,

00:45:35.400 | then that's great for everyone involved.

00:45:38.560 | I think the magic of Docker is not really in the software.

00:45:41.560 | It's just the standard that people have agreed on.

00:45:44.640 | Here are a bunch of keys for a JSON document, basically.

00:45:49.000 | And that was the magic of the metaphor of real containerization

00:45:53.280 | as well.

00:45:53.760 | It's not the containers that are interesting.

00:45:55.640 | It's like the size and shape of the damn box.

00:45:59.540 | And it's a similar thing here, where really we just

00:46:01.280 | wanted to get people to agree on this is what

00:46:03.320 | a machine learning model is.

00:46:04.800 | This is how a prediction works.

00:46:07.760 | This is what the inputs are.

00:46:09.000 | This is what the outputs are.

00:46:10.480 | So Cog is really just a Docker container

00:46:13.120 | that attaches to a CUDA device, if it needs a GPU, that

00:46:17.400 | has a open API specification as a label on the Docker image.

00:46:21.920 | And the open API specification defines the interface

00:46:26.800 | for the machine learning model, like the inputs and outputs

00:46:32.440 | effectively, or the params in machine learning terminology.

00:46:36.680 | And we just wanted to get people to agree on this thing.

00:46:40.040 | And it's general purpose enough.

00:46:41.440 | We weren't saying-- some of the existing things

00:46:43.620 | were at the graph level.

00:46:45.200 | But we really wanted something general purpose

00:46:47.160 | enough that you could just put anything inside this.

00:46:48.680 | And it was future compatible.

00:46:49.980 | And it was just like arbitrary software.

00:46:51.900 | And it'd be future compatible with future inference servers

00:46:54.360 | and future machine learning model formats

00:46:56.080 | and all this kind of stuff.

00:46:57.760 | So that was the intent behind it.

00:47:00.000 | And it just came naturally that we

00:47:04.800 | wanted to define this format.

00:47:06.080 | And that's been really working for us.

00:47:08.520 | A bunch of people have been using Cog outside of Replicates,

00:47:11.240 | which is kind of our original intention.

00:47:13.080 | This should be how machine learning models are packaged

00:47:15.320 | and how people should use it.

00:47:16.520 | It's common to use Cog in situations

00:47:19.000 | where maybe they can't use the SAS service because they're

00:47:23.080 | in a big company.

00:47:23.800 | And they're not allowed to use a SAS service.

00:47:28.760 | But they can use Cog internally still.

00:47:30.300 | And they can download the models from Replicates

00:47:32.260 | and run them internally in their org, which

00:47:34.680 | we've been seeing happen.

00:47:35.720 | That works really well.

00:47:37.240 | People who want to build custom inference pipelines

00:47:39.440 | but don't want to reinvent the world,

00:47:40.980 | they can use Cog off the shelf and use

00:47:42.520 | it as a component in their inference pipelines.

00:47:45.720 | We've been seeing tons of usage like that.

00:47:48.900 | And it's just been kind of happening organically.

00:47:50.900 | We haven't really been trying.

00:47:52.080 | But it's there if people want it.

00:47:53.440 | And we've been seeing people use it.

00:47:54.980 | So that's great.

00:47:56.680 | And yeah, so a lot of it is just sort of philosophical.

00:48:00.360 | This is how it should work from my experience at Docker.

00:48:03.120 | And there's just a lot of value from the core being open,

00:48:05.640 | I think, and that other people can share it.

00:48:06.920 | And it's like an integration point.

00:48:08.380 | So if Replicate, for example, wanted

00:48:12.760 | to work with a testing system, like a CI system or whatever,

00:48:18.520 | we can just interface at the Cog level.

00:48:21.080 | That system just needs to put Cog models.

00:48:22.840 | And then you can test your models on that CI system

00:48:25.280 | before they get deployed to Replicate.

00:48:26.860 | And it's just a format that we can get everyone to agree on.

00:48:30.040 | What do you think, I guess, Docker got wrong?

00:48:33.280 | Because if I look at a Docker Compose and a Cog definition,

00:48:36.000 | first of all, the Cog is kind of like the Docker

00:48:38.800 | file plus the Compose.

00:48:40.800 | And Docker Compose are just exposing the services.

00:48:43.960 | And also, Docker Compose is very ports-driven,

00:48:47.600 | versus you have the actual predict,

00:48:51.040 | this is what you have to run.

00:48:53.120 | Yeah, any learnings and maybe tips for other people building

00:48:55.800 | container-based runtimes?

00:48:57.040 | Like, how much should you separate

00:49:00.080 | the API services versus the image building,

00:49:04.200 | or how much you want to build them together?

00:49:06.560 | I think it was coming from two sides.

00:49:09.640 | We were thinking about the design

00:49:11.860 | from the point of view of user needs.

00:49:14.800 | Like, what do users--

00:49:16.560 | what are their problems, and what problems

00:49:18.960 | can we solve for them?

00:49:20.360 | But also, what the interface should

00:49:22.040 | be for a machine learning model.

00:49:23.540 | And it's sort of the combination of two things

00:49:25.480 | that led us to this design.

00:49:27.880 | So the thing I talked about before

00:49:29.760 | was a little bit of the interface around the machine

00:49:32.000 | learning model.

00:49:32.600 | So we realized that we wanted it to be general purpose.

00:49:35.320 | We wanted it to be at the JSON human-readable things,

00:49:41.800 | rather than the tensor level.

00:49:44.400 | So it's like an open API specification that

00:49:46.240 | wrapped a Docker container.

00:49:47.360 | That's where that design came from.

00:49:49.200 | And it's really just a wrapper around Docker.

00:49:51.080 | So we were kind of building on--

00:49:52.920 | standing on shoulders there.

00:49:54.160 | But Docker's too low-level.

00:49:55.320 | So it's just like arbitrary software.

00:49:57.160 | So we wanted to be able to have a open API specification there

00:50:02.680 | that defined the function, effectively,

00:50:04.440 | that is the machine learning model,

00:50:06.200 | but also how that function is written,

00:50:09.800 | how that function is run, which is all defined in code,

00:50:12.080 | and stuff like that.

00:50:12.520 | So it's like a bunch of abstraction on top of Docker

00:50:15.040 | to make that work.

00:50:16.360 | And that's where that design came from.

00:50:18.600 | But the core problems we were solving for users

00:50:21.880 | was that Docker's really hard to use.

00:50:27.320 | And productionizing machine learning models

00:50:29.920 | is really hard.

00:50:31.600 | So on the first part of that, we knew

00:50:37.240 | we couldn't use Dockerfiles.

00:50:38.560 | Dockerfiles are hard enough for software developers to write.

00:50:41.080 | I'm saying this with love as somebody who works on Docker

00:50:43.400 | and works on Dockerfiles.

00:50:46.840 | But it's really hard to use.

00:50:48.200 | And you need to know a bunch about Linux, basically,

00:50:50.360 | because you're running a bunch of CLI commands.

00:50:52.360 | You need to know a bunch of Linux and best practices,

00:50:54.600 | how apt works, and all this kind of stuff.

00:50:56.480 | So we're like, OK, we can't get to that level.

00:50:58.200 | We need something that machine learning researchers will

00:50:59.960 | be able to understand.

00:51:00.880 | People who are used to Colab notebooks.

00:51:03.400 | And what they understand is they're like,

00:51:05.200 | I need this version of Python.

00:51:06.440 | I need these Python packages.

00:51:08.000 | And somebody told me to apt-get install something.

00:51:11.320 | You know?

00:51:11.820 | MARK MANDEL: And throw sudo in there when I don't really

00:51:14.120 | know what that means.

00:51:15.320 | So we tried to create a format that was at that level.

00:51:17.560 | And that's what cog.yaml is.

00:51:19.480 | And we're really kind of trying to imagine,

00:51:21.480 | what is that machine learning researcher

00:51:23.240 | going to understand and trying to build for them?

00:51:26.280 | And then the productionizing machine learning models thing

00:51:30.680 | is like, OK, how can we package up

00:51:33.360 | all of the complexity of productionizing machine

00:51:35.480 | learning models?

00:51:36.800 | Like picking CUDA versions, like hooking it up to GPUs,

00:51:41.040 | writing an inference server, defining a schema,

00:51:44.940 | doing batching, all of these just really gnarly things

00:51:48.840 | that everyone does again and again,

00:51:51.000 | and just provide that as a tool.

00:51:56.040 | And that's where that side of it came from.

00:51:58.600 | So it's like combining those user needs

00:52:00.400 | with the world need of needing a common standard for what

00:52:06.200 | a machine learning model is.

00:52:07.440 | And that's how we thought about the design.

00:52:09.200 | I don't know whether that answers the question.

00:52:10.560 | FRANCESC CAMPOY: Yeah.

00:52:10.820 | So your idea was like, hey, you really

00:52:12.880 | want what Docker stands for in terms of standard,

00:52:17.080 | but you actually don't want people

00:52:18.800 | to do all the work that goes into Docker.

00:52:21.200 | MARK MANDEL: It needs to be higher level.

00:52:24.240 | FRANCESC CAMPOY: So I want to, for the listener,

00:52:26.680 | you're not the only standard that is out there.

00:52:28.600 | As with any standard, there must be 14 of them.

00:52:31.960 | You are surprisingly friendly with Olama,

00:52:34.040 | who is your former colleagues from Docker, who

00:52:36.840 | came out with the model file.

00:52:38.680 | Mozilla came out with the Lama file.

00:52:41.040 | And then I don't know if this is in the same category even,

00:52:43.640 | but I'm just going to throw it in there.

00:52:44.520 | Like Hugging Face has the Transformers and Diffusers

00:52:46.480 | library, which is a way of disseminating models

00:52:48.480 | that, obviously, people use.

00:52:51.080 | How would you compare your contrast, your approach

00:52:53.320 | of Cog versus all these?

00:52:54.520 | MARK MANDEL: It's kind of complementary, actually,

00:52:56.560 | which is kind of neat.

00:52:57.520 | It's a lot of--

00:52:59.640 | Transformers, for example, is lower level than Cog.

00:53:01.920 | So it's a Python library, effectively.

00:53:05.360 | But you still need to--

00:53:07.440 | FRANCESC CAMPOY: Expose them.

00:53:08.400 | MARK MANDEL: Yeah, you still need

00:53:09.000 | to turn that into an inference server.

00:53:10.240 | You still need to install the Python packages

00:53:11.880 | and that kind of thing.

00:53:12.800 | So lots of Replicate models are Transformers models

00:53:17.800 | and Diffusers models inside Cog.

00:53:20.520 | So that's the level that that sits.

00:53:22.320 | So it's very complementary in some sense.

00:53:24.040 | And we're kind of working on integration with Hugging Face

00:53:26.560 | such that you can deploy models from Hugging Face

00:53:29.020 | into Cog models and stuff like that and to Replicate.

00:53:32.840 | And so some of these things, like Llamafile

00:53:38.320 | and what Llama are working on, are also very complementary

00:53:41.280 | in that they're doing a lot of the running these things

00:53:46.880 | locally on laptops, which is not a thing that

00:53:50.280 | works very well with Cog.

00:53:51.480 | Cog is really designed around servers

00:53:53.400 | and attaching to CUDA devices and NVIDIA GPUs

00:53:56.480 | and this kind of thing.

00:53:58.160 | So we're trying to figure out-- we're actually

00:54:03.720 | figuring out ways that those things can

00:54:06.080 | be interoperable because they should be.

00:54:09.000 | And I think they are quite complementary

00:54:11.580 | in that you should be able to take a model and Replicate

00:54:13.880 | and run it on your local machine.

00:54:14.840 | You should be able to take a model on your local machine

00:54:17.140 | and run it in the cloud.

00:54:18.280 | So, yeah.

00:54:19.480 | FRANCESC CAMPOY: Is the base layer something like--

00:54:22.720 | is it at the GGUF level?

00:54:24.680 | Which, by the way, I need to get a primer

00:54:26.920 | on the different formats that have emerged.

00:54:29.800 | Or is it at the star.file level, which

00:54:32.760 | is model file, Llamafile, whatever?

00:54:34.980 | Or is it at the Cog level?

00:54:36.760 | I don't know, to be honest.

00:54:37.960 | And I think this is something we still

00:54:39.540 | have to figure out.

00:54:41.200 | I think there's a lot yet.

00:54:42.960 | Exactly where those lines are drawn, I don't know exactly.

00:54:45.440 | I think this is something we're trying to figure out ourselves.

00:54:47.960 | But I think there's certainly a lot of promise

00:54:50.000 | about these systems interoperating.

00:54:51.880 | I think we just want things to work together.

00:54:54.000 | We want to try and reduce the number of standards

00:54:56.080 | so the more these things can interoperate and convert

00:54:58.880 | between each other and that kind of stuff at the minute.

00:55:01.160 | FRANCESC CAMPOY: Andreas comes out of Spotify.

00:55:03.360 | Eric from Modo also comes out of Spotify.

00:55:07.680 | You work at Docker and the Llama guys work at Docker.

00:55:13.000 | Did you know that these ideas were--

00:55:15.120 | did both you and Andreas know that there

00:55:16.840 | was somebody else you work with that

00:55:18.480 | had a kind of like similar-- not similar idea,

00:55:20.880 | but was interested in the same thing?

00:55:22.680 | Or did you then just say, oh, I know those people.

00:55:26.160 | They're doing something very similar.

00:55:28.320 | We learned about both early on, actually.

00:55:31.000 | Because we know them both quite well.

00:55:33.480 | And it's funny how I think we're all seeing the same problems

00:55:36.120 | and just applying, trying to fix the same problems that we're

00:55:39.720 | all seeing.

00:55:40.720 | I think the Llama one's particularly

00:55:42.720 | funny because I joined Docker through my startup.

00:55:48.300 | Funnily, actually, the thing which worked from my startup

00:55:50.640 | was Compose, but we were actually

00:55:52.400 | working on another thing, which was a bit like EC2 for Docker.

00:55:55.640 | So we were working on productionizing Docker

00:55:57.440 | containers, and our Llama was working

00:56:01.400 | on a thing called Kitematic, which was a bit

00:56:03.960 | like a desktop app for Docker.

00:56:08.760 | And our companies both got bought by Docker

00:56:10.560 | at the same time.

00:56:12.640 | And Kitematic turned into Docker Desktop,

00:56:15.880 | and then our thing then turned into Compose.

00:56:19.400 | And it's funny how we're both applying the things we saw

00:56:23.480 | at Docker to the AI world, where they're

00:56:25.640 | building the local environment for us,

00:56:28.000 | and we're building the cloud for it.

00:56:30.960 | And yeah, so that's just really pleasing.

00:56:33.640 | And I think we're collaborating closely

00:56:36.460 | because there's just so much opportunity for working there.

00:56:39.720 | FRANCESC CAMPOY: When you have a hammer, everything's a nail.

00:56:42.260 | MARK MANDEL: Yeah, exactly, exactly.

00:56:43.840 | And I think a lot of--

00:56:46.560 | this is-- I mean, where we're coming from a lot with AI

00:56:50.560 | is we're taking a lot of things that--

00:56:52.880 | because we're all kind of, on the Replicator team,

00:56:55.680 | we're all kind of people who have built developer

00:56:57.960 | tools in the past.

00:56:58.720 | So we've got a team--

00:56:59.600 | like, I worked at Docker.

00:57:01.240 | We've got people who worked at Heroku, and GitHub,

00:57:04.000 | and the iOS ecosystem, and all this kind of thing.

00:57:07.160 | Like, the previous generation of developer tools,

00:57:10.600 | where we figured out a bunch of stuff,

00:57:13.320 | and then AI has come along.

00:57:14.960 | And we just don't yet have those tools and abstractions

00:57:18.840 | to make it easy to use.

00:57:20.440 | So we're trying to take the lessons

00:57:22.080 | that we learned from the previous generation of stuff

00:57:24.440 | and apply it to this new generation of stuff.

00:57:26.840 | And obviously, there's a bit of nuance there,

00:57:28.720 | because the trick is to take the right lessons

00:57:30.960 | and do new stuff where it makes sense.

00:57:33.320 | You can't just cut and paste, you know?

00:57:36.280 | But that's how we're approaching this,

00:57:38.120 | is we're trying to, as much as possible,

00:57:40.280 | take some of those lessons we learned from how Heroku and

00:57:44.280 | GitHub was built, for example, and apply them to AI.

00:57:48.960 | FRANCESC CAMPOY: Excellent.

00:57:50.200 | We should also talk a little bit about your compute

00:57:54.360 | availability.

00:57:55.040 | We're trying to ask this of all--

00:57:56.420 | it's Compute Provider Month.

00:57:58.560 | Do you own your own GPUs?

00:57:59.960 | How many do you have access to?

00:58:02.080 | What do you feel about the tightness of the GPU market?

00:58:05.800 | ALEX DANILO: We don't own our own GPUs.

00:58:07.440 | We've got a few that we play around with,

00:58:09.160 | but not for production workloads.

00:58:11.620 | And we are primarily built on just public clouds,

00:58:14.560 | so primarily GCP and CoreWeave, and some smatterings elsewhere.

00:58:18.960 | And--

00:58:21.360 | FRANCESC CAMPOY: Not from NVIDIA, which is your newest

00:58:23.560 | investor?

00:58:24.160 | ALEX DANILO: We work with NVIDIA.

00:58:25.720 | So they're kind of helping us get GPU availability.

00:58:29.400 | GPUs are hard to get hold of if you go to AWS

00:58:38.880 | and ask for one A100, they won't give you an A100.

00:58:42.480 | But if you go to AWS and say, I would like 100 A100s in two

00:58:45.080 | years, they're like, sure, we've got some.

00:58:47.760 | I think the problem is the cloud providers.

00:58:50.600 | The cloud providers, that makes sense from their point of view.

00:58:54.440 | They want just reliable, sustained usage.

00:58:57.160 | They don't want spiky usage and wastage

00:58:59.160 | in their infrastructure, which makes total sense.

00:59:01.280 | But that makes it really hard for startups

00:59:04.080 | who are wanting to just get a hold of GPUs.

00:59:06.380 | I think we're in a fortunate position

00:59:07.880 | where we can aggregate demand, so we can make commits

00:59:11.040 | to cloud providers.

00:59:12.920 | And then we actually have good availability.

00:59:16.600 | It's not-- we don't have infinite availability,

00:59:20.720 | obviously, but if you want an A100 from Replicate,

00:59:22.840 | you can get it.

00:59:23.480 | But we're seeing other companies pop up as well.

00:59:30.040 | I guess SfCompute's a great example of this,

00:59:31.880 | where they're doing the same idea for training almost,

00:59:34.120 | where a lot of startups need to be able to train a model,

00:59:37.560 | but they can't get hold of GPUs from large cloud providers.

00:59:39.980 | So SfCompute is letting people rent 10 H100s for two days,

00:59:44.760 | which is just impossible otherwise.

00:59:46.200 | And what they're effectively doing there

00:59:47.880 | is they're aggregating demand such that they can make

00:59:49.880 | a big commit to the cloud provider

00:59:50.960 | and then let people use smaller chunks of it.

00:59:52.920 | And that's what we're doing with Replicate as well,

00:59:54.540 | where we're aggregating demand such that we make big commits

00:59:57.080 | to the cloud providers.

00:59:58.280 | And then people can run a 100-millisecond API request

01:00:03.080 | on an A100.

01:00:04.200 | FRANCESC CAMPOY: Coming from a finance background,

01:00:06.280 | this sounds surprisingly similar to banks,

01:00:08.900 | where the job of a bank is maturity transformation,

01:00:12.400 | is what you call it.

01:00:14.040 | You take short-term deposits, which technically

01:00:16.000 | can be withdrawn at any time, and you turn that

01:00:17.920 | into long-term loans for mortgages and stuff.

01:00:20.440 | And you pocket the difference in interest.

01:00:22.560 | And that's the bank.

01:00:24.000 | MARK MANDEL: Yeah, that's exactly what we're doing.

01:00:26.080 | FRANCESC CAMPOY: So you run a bank.

01:00:27.080 | MARK MANDEL: Yeah, a GPU bank.

01:00:28.360 | Yeah, and it's so much a finance problem

01:00:31.000 | as well, because we have to make bets on the future demand

01:00:35.240 | and the value of GPUs.

01:00:37.560 | FRANCESC CAMPOY: What are you--

01:00:39.200 | OK, I don't know how much you can disclose,

01:00:41.000 | but what are you forecasting?

01:00:43.520 | Up, down?

01:00:44.840 | Up a lot?

01:00:45.720 | Up 10x?

01:00:46.680 | MARK MANDEL: I can't really--

01:00:48.120 | we're projecting our growth with some educated guesses

01:00:50.640 | about what kind of models are going to come out

01:00:52.600 | and what kind of models these will run.

01:00:54.720 | So we need to bet that, OK, maybe language

01:00:58.160 | models are getting larger.

01:00:59.200 | So we need to have GPUs with a lot of RAM, or multi-GPU nodes,

01:01:03.220 | or maybe models are getting smaller.

01:01:04.720 | We actually need smaller GPUs.

01:01:06.040 | We have to make some educated guesses

01:01:07.280 | about that kind of stuff.

01:01:08.360 | FRANCESC CAMPOY: Yeah.

01:01:09.280 | Speaking of which, the mixture of experts' models

01:01:11.760 | must be throwing a spanner into the planning.

01:01:15.800 | MARK MANDEL: Not so much.

01:01:16.840 | I mean, we've got multi-node A100 machines,

01:01:20.320 | which can run this, and multi-node H100 machines,

01:01:22.400 | which can run those, no problem.

01:01:23.740 | So we're set up for that world.

01:01:30.440 | FRANCESC CAMPOY: OK, I didn't expect it to be so easy.

01:01:33.920 | My impression was that the amount of RAM per model

01:01:37.280 | is increasing a lot, especially on a sort of per parameter

01:01:40.960 | basis, per active parameter basis,

01:01:43.640 | going from Mixed Trial being eight experts to the Deep

01:01:47.840 | Seek MOE models--

01:01:48.740 | I don't know if you saw them--

01:01:50.040 | being like 30, 60 experts.

01:01:53.480 | And you can see it keep going up, I guess.

01:01:55.360 | MARK MANDEL: I think we might run into problems at some point.

01:01:58.200 | And yeah, I don't know exactly what's going on there.

01:02:04.080 | I think something that we're finding, which is kind of

01:02:06.600 | interesting-- like, I don't know this in depth.

01:02:10.440 | But we're certainly seeing a lot of good results

01:02:16.000 | from lower-precision models.

01:02:19.920 | So 90% of the performance with just much less RAM required.

01:02:25.840 | And that means that we can run them on GPUs we have available.

01:02:30.720 | And it's good for customers as well, because it runs faster.

01:02:33.400 | And they want that trade-off of where it's just slightly worse,

01:02:37.080 | but way faster and cheaper.

01:02:39.480 | FRANCESC CAMPOY: Do you see a lot of GPU waste

01:02:41.760 | in terms of people running the thing on a GPU that

01:02:44.560 | is too advanced?

01:02:45.440 | I think we use a T4 to run Whisper.

01:02:48.360 | So we're at the bottom end of it.

01:02:51.280 | Yeah, any thoughts?

01:02:52.080 | I think at one of the hackathons we were at,

01:02:54.920 | people were like, oh, how do I get access to like H100s?

01:02:57.880 | And it's like, you need to run [INTERPOSING VOICES]

01:03:00.440 | It's like, you don't need an H100.

01:03:02.320 | Yeah, yeah.

01:03:03.560 | Well, if you want low latency, sure.

01:03:06.840 | Like, spend a lot of money on an H100.

01:03:09.840 | Yeah, we see a ton of that kind of stuff.

01:03:11.560 | And it's surprisingly hard to optimize these models right now.

01:03:19.360 | So a lot of people are just running really

01:03:21.160 | unoptimized models.

01:03:22.120 | We're doing the same, honestly.

01:03:23.420 | Like, we're a lot of models on Replicate

01:03:25.080 | have just not been optimized very well.

01:03:28.200 | So something we want to be able to help people with

01:03:31.600 | is optimizing those models.

01:03:33.840 | Like, either we show people how to with guides,

01:03:37.960 | or we make it easier to use some of these more optimized

01:03:41.000 | inference servers, or we show people

01:03:43.200 | how to compile the models, or we do that automatically,

01:03:46.520 | or something like that.

01:03:47.600 | But that's only something we're exploring.

01:03:49.340 | Like, there's so much wastage.

01:03:50.780 | Like, it's not just wasting the GPUs.

01:03:52.520 | It's also a bad experience, and the models run slow.

01:03:55.440 | So a lot of the models on Replicate--

01:03:57.560 | some of the most popular models on Replicate we have--

01:04:00.640 | so the models on Replicate are almost all

01:04:03.840 | pushed by our community.

01:04:05.280 | Like, people have pushed those models themselves.

01:04:07.600 | But it's like a big head of distribution,

01:04:09.560 | where there's like a long tail of lots of models

01:04:11.560 | that people have pushed, and then a big head of the models

01:04:15.160 | most people run.

01:04:16.520 | So models like Llama 2, like Stable Diffusion,

01:04:23.460 | we work with Meta and Stability to maintain those models.

01:04:26.480 | And we've done a ton of optimization

01:04:28.020 | to make those really fast.

01:04:29.660 | So yeah, those models are optimized,

01:04:31.820 | but the long tail is not.

01:04:32.900 | And there's a lot of wastage there.

01:04:35.260 | And going into the-- well, it's already the new year.

01:04:38.620 | Do you see the customer demand and the GPU hardware

01:04:42.660 | demand kind of staying together?

01:04:44.300 | Because I think a lot of people are saying,

01:04:46.100 | there's like hundreds of thousands of GPUs

01:04:48.140 | being shipped this year.

01:04:49.140 | Like, the crunch is going to be over.

01:04:51.220 | But you also have millions of people

01:04:52.720 | that now care about using AI.

01:04:55.220 | How do you see the two lines progressing?

01:04:57.100 | Are you seeing customer demand is

01:04:59.900 | going to outpace the GPU growth?

01:05:01.460 | Do you see them together?

01:05:02.820 | Do you see maybe a lot of this model improvement work

01:05:06.900 | kind of helping alleviate that?

01:05:09.420 | From our point of view, demand is not

01:05:11.060 | outpacing supply of GPUs.

01:05:12.740 | We have enough-- from my point of view,

01:05:14.400 | we have enough GPUs to go around.

01:05:15.780 | And that might change, for sure.

01:05:18.180 | Yeah.

01:05:18.680 | That's a very nicely put way, as a startup founder, to respond.

01:05:25.460 | Yeah, I'll maybe get into a little bit of this on the--

01:05:28.320 | you said optimizing models.

01:05:29.500 | Actually, so when Alessio talked about GPU waste, he was more--

01:05:33.540 | oh, that's you.

01:05:34.380 | Sorry.

01:05:36.020 | Yeah, it is getting a little bit warm in here,

01:05:38.260 | some greenhouse gas effect.

01:05:40.740 | So Alessio framed it more as sort

01:05:42.660 | of picking the wrong box model, whereas yours

01:05:44.780 | is more about maybe the inference stack,

01:05:48.340 | if you can call it.

01:05:49.380 | Were you referencing VLM?

01:05:52.100 | What other sort of techniques are you referencing?

01:05:55.100 | And also keeping in mind that when

01:05:57.340 | I talk to your competitors, and I don't know if--

01:05:59.940 | we don't have to name any of them,

01:06:01.360 | but they are working on trying to optimize

01:06:03.340 | the kinds of models.

01:06:04.460 | Basically, they'll quantize their models for you

01:06:06.740 | with their special stack.

01:06:08.200 | So you basically use their versions of LamaTool.

01:06:11.420 | You use their versions of Mistral.

01:06:13.260 | And that's one way to approach it.

01:06:16.180 | I don't see it as the Replicate DNA to do that,

01:06:18.460 | because that would be sort of--

01:06:20.140 | you would have to slap the Replicate house brand

01:06:22.180 | on something, which--

01:06:23.820 | I mean, just comment on any of that.

01:06:25.380 | Like, what do you mean when you say optimize models?

01:06:27.700 | Yeah, I mean, things like quantizing the models,

01:06:30.240 | you can imagine a way that we could help people quantize

01:06:32.580 | their models if we want to.

01:06:38.140 | We've had success using inference servers like VLM

01:06:43.700 | and TRT-LLM, and we're using those kind of things

01:06:47.220 | to serve language models.

01:06:48.980 | We've had success with things like AI templates, which

01:06:52.180 | compile the models, all of those kind of things.

01:06:57.340 | And there's some even really just boring things

01:06:59.380 | of just making the code more efficient.

01:07:02.860 | Like, some people, when they're just writing some Python code,

01:07:05.780 | it's really easy to just write an efficient Python code.

01:07:09.140 | There's really boring things like that as well.

01:07:11.580 | But it's like a whole smash of things like that.

01:07:14.220 | FRANCESC CAMPOY: So you will do that for a customer?

01:07:16.380 | You look at their code and--

01:07:17.620 | MARK MANDEL: Yeah, we've certainly

01:07:18.700 | helped some of our customers be able to do that some stuff.

01:07:21.140 | That's some stuff, yeah.

01:07:22.080 | And a lot of the models on--

01:07:23.820 | like, the popular models on Replicate,

01:07:25.460 | we've rewritten them to use that stuff as well.

01:07:28.860 | And the stable diffusion that we run, for example,

01:07:31.260 | is compiled with AI template to make it super fast.

01:07:34.260 | And it's all open source, that you

01:07:35.720 | can see all of this stuff on GitHub

01:07:37.220 | if you want to see how we do it.

01:07:40.420 | But you can imagine ways that we could help people.

01:07:43.340 | It's almost like built into the Cog layer maybe,

01:07:45.380 | where we could help people use these fast inference servers

01:07:48.420 | or use AI template to compile their models to make it faster.

01:07:51.980 | Whether it's manual, semi-manual, or automatic,

01:07:54.480 | we're not really sure.

01:07:55.360 | But that's something we want to explore,

01:07:57.020 | because that benefits everyone.

01:07:58.480 | FRANCESC CAMPOY: Yeah, awesome.

01:07:59.780 | And then on the competitive piece,

01:08:02.060 | there was a price war on McStraw last year, this last December.

01:08:06.620 | As far as I can tell, you guys did not enter that war.

01:08:09.780 | You have McStraw, but it's just regular pricing.

01:08:13.940 | I think also some of these players

01:08:16.660 | are probably losing money on their pricing.

01:08:20.260 | You don't have to say anything, but the break-even

01:08:23.020 | is somewhere between $0.50 to $0.75 per million tokens served.

01:08:28.500 | How are you thinking about just the overall competitiveness

01:08:31.020 | in the market?

01:08:32.340 | How should people choose when everyone's an API?

01:08:36.420 | So for Lama2 and McStraw--

01:08:41.540 | I think not McStraw, but I can't remember exactly--

01:08:44.340 | we have similar performance and similar price

01:08:48.420 | to some of these other services.

01:08:50.540 | We're not bargain basement to some of the others,

01:08:54.220 | because to your point, we don't want to burn tons of money.

01:08:58.780 | But we're pricing it sensibly and sustainably to a point

01:09:02.940 | where we think it's competitive with other people, such

01:09:05.980 | that-- the thing we don't want--

01:09:08.260 | we want developers using Replicate.

01:09:10.700 | And we don't want to price it such that it's only

01:09:14.740 | affordable by big companies.

01:09:16.120 | We want to make it cheap enough such

01:09:18.340 | that the developers can afford it.

01:09:19.740 | But we also don't want the super cheap prices,

01:09:22.700 | because then it's almost like then your customers are hostile.

01:09:26.780 | And the more customers you get, the worse it gets.

01:09:29.980 | So we're pricing it sensibly, but still to the point

01:09:33.020 | where hopefully it's cheap enough to build on.

01:09:39.020 | And I think the thing we really care about--

01:09:43.700 | obviously, we want models and Replicate

01:09:45.980 | to be comparable to other people.

01:09:48.100 | But I think the really crucial thing about Replicate

01:09:50.460 | and the way I think we think about it

01:09:52.260 | is that it's not just the API for them.

01:09:55.120 | Particularly in open source, it's

01:09:56.460 | not just the API for the model that is the important bit.

01:09:59.260 | Because quite often with open source models,

01:10:03.900 | the whole point of open source is that you can tinker on it,

01:10:05.860 | and you can customize it, and you can fine tune it,

01:10:07.940 | and you can smush it together with another model,

01:10:10.180 | like Lava, for example.

01:10:13.020 | And you can't do that if it's just a hosted API,

01:10:15.620 | because you can't touch the code.

01:10:21.900 | So what we want to do with Replicate

01:10:26.700 | is build a platform that's actually open.

01:10:29.260 | So we've got all of these models where the performance and price

01:10:32.940 | is on par with everything else.

01:10:35.220 | But if you want to customize it, you can fine tune it.

01:10:37.540 | You can go to GitHub and get the source code for it,

01:10:39.660 | and edit the source code, and push up your own custom

01:10:41.260 | version, and this kind of thing.

01:10:42.980 | Because that's the crucial thing for open source machine

01:10:47.500 | learning, is be able to tinker on it and customizing it.

01:10:50.180 | And we think that's really important to make

01:10:55.180 | open source AI work.

01:10:58.020 | You mentioned open source.

01:10:59.820 | How do you think about levels of openness?

01:11:01.820 | When Lama 2 came out, I wrote a post about this.

01:11:05.620 | It's like open source, and there's open weights,

01:11:07.940 | then there's restrictive weights.

01:11:09.860 | It was on the front page of "Hacker News,"

01:11:11.620 | so there was all sort of comments from people.

01:11:14.140 | So I'm always curious to hear your thoughts.

01:11:16.740 | What do you think is OK for people to license?

01:11:20.500 | What's OK for people to not release?

01:11:23.740 | Yeah.

01:11:24.500 | Yeah, we're seeing-- I mean, before, it

01:11:26.620 | was just like closed source, big models,

01:11:29.060 | open source, little models, purely open source stuff.

01:11:33.660 | And we're now seeing lots of variations

01:11:35.780 | where model companies putting restrictive licenses

01:11:40.060 | on their models.

01:11:41.980 | That means it can only be used for non-commercial use.

01:11:44.460 | And a lot of the open source crowd

01:11:47.940 | is complaining it's not true open source,

01:11:50.620 | all this kind of thing.

01:11:52.420 | And I think a lot of that is coming from philosophy,

01:11:56.180 | the sort of free software movement kind of philosophy.

01:11:59.260 | And I don't think it's necessarily a bad thing.

01:12:01.940 | I think it's good that model companies can make money out

01:12:04.500 | of their models.

01:12:05.340 | That's how this model incentivized people

01:12:07.420 | to make more models and this kind of thing.

01:12:09.180 | And I think it's totally fine if somebody made something

01:12:11.540 | to ask for some money in return if you're

01:12:13.820 | making money out of it.

01:12:14.780 | And I think that's totally OK.

01:12:16.140 | And I think there's some really interesting midpoints, as well,

01:12:18.720 | where people are releasing the code.

01:12:20.220 | So you can still tinker on it.

01:12:21.460 | But the person who trained the model

01:12:23.140 | still wants to get a cut of it if you're making

01:12:25.140 | a bunch of money out of it.

01:12:26.260 | And I think that's good.

01:12:28.300 | And that's going to make the ecosystem more sustainable.

01:12:31.600 | And I think we're just going to see--

01:12:33.220 | I don't think anybody's really figured it out yet.

01:12:34.780 | And we're going to see more experimentation with this

01:12:37.420 | and more people try to figure out, hmm,

01:12:39.780 | what are the business models around building models?

01:12:42.020 | And how can I make money out of this?

01:12:43.900 | And we'll just see where it ends up.

01:12:45.860 | And I think it's something we want to support as Replicate,

01:12:49.420 | as well, because we believe in open source.

01:12:51.980 | We think it's great.

01:12:53.140 | But there's also going to be lots of models which

01:12:56.820 | are closed source, as well.

01:12:57.980 | And these companies might not be--

01:13:00.240 | there's probably going to be a long tail

01:13:01.940 | of a bunch of people building models that don't have

01:13:04.700 | the reach that OpenAI have.

01:13:06.620 | And hopefully, as Replicate, we can

01:13:08.380 | help those people find developers

01:13:10.980 | and help them make money and that kind of thing.

01:13:13.460 | I think the compute requirements of AI kind of changed the thing.

01:13:16.860 | I started an open source company.

01:13:18.420 | I'm a big open source fan.

01:13:19.780 | And before, it was kind of man hours was really

01:13:23.060 | all that went into open source.

01:13:24.300 | It wasn't much monetary investment.

01:13:27.340 | Well, not that man hours are not worth a lot.

01:13:30.260 | But if you think about Llama 2, it's like $25 million all in.

01:13:36.420 | It's like you can't just spin up a Discord

01:13:38.620 | and spend $25 million.

01:13:40.140 | So I think it's net positive for everybody

01:13:43.100 | that Llama 2 is open source.

01:13:44.340 | And, well, is the open source term--

01:13:48.860 | I think people, like you're saying,

01:13:50.460 | they kind of argue on the semantics of it.

01:13:53.740 | But all we care about is that Llama 2 is open.

01:13:56.620 | Because if Llama 2 wasn't open source today,

01:13:59.180 | if Mistral was not open source, we would be in a bad spot.

01:14:03.100 | And I think the nuance here is making sure

01:14:05.060 | that these models are still tinkerable.

01:14:06.860 | Because the beautiful thing about Llama 2 as a base model

01:14:11.260 | is that, yeah, it costs $25 million to train to start with.

01:14:14.260 | But then you can fine tune it for like $50.

01:14:17.700 | And that's what's so beautiful about the open source ecosystem

01:14:21.340 | and something I think is really surprising as well.

01:14:23.420 | It completely surprised me.

01:14:25.500 | I think a lot of people assumed that it's not

01:14:30.220 | going to be-- open source machine learning is just not

01:14:32.660 | going to be practical because it's so

01:14:33.740 | expensive to train these models.

01:14:35.080 | But fine tuning is unreasonably effective.

01:14:37.540 | And people are getting really good results out of it.

01:14:39.740 | And it's really cheap.

01:14:40.900 | So people can effectively create open source models really

01:14:46.000 | cheaply.

01:14:46.580 | And there's going to be this sort of ecosystem

01:14:49.180 | of tons of models being made.

01:14:50.860 | And I think the risk there from a licensing point of view

01:14:53.260 | is we need to make sure that the licenses let people do that.

01:14:56.380 | Because if you release a big model

01:14:58.020 | under a non-commercial license and people can't fine tune it,

01:15:00.860 | you've lost the magic of it being open.

01:15:03.460 | And I'm sure there are ways to structure that such

01:15:05.500 | that the person paying $25 million

01:15:07.020 | feels like they're compensated somehow

01:15:10.220 | and they can feel like they should keep on training models.

01:15:14.300 | And people can keep on fine tuning it.

01:15:15.900 | But I guess we just have to figure out

01:15:17.620 | exactly how that plays out.

01:15:20.060 | FRANCESC CAMPOY: Excellent.

01:15:21.380 | So just wanted to round it out.

01:15:23.340 | You've been an excellent, very open guest so far.

01:15:28.620 | I actually kind of--

01:15:29.580 | I should have started my intro with this.

01:15:31.700 | But I feel like you found the AI engineer crew before I did.

01:15:36.140 | And something that really resonated

01:15:37.940 | with you in the Series B announcement

01:15:40.480 | was that you put in some stats here

01:15:42.260 | about how there are two orders of magnitude more software

01:15:45.300 | engineers than there are machine learning engineers, about 30

01:15:47.800 | million software engineers and 500,000 machine learning

01:15:50.060 | engineers.

01:15:50.900 | You can maybe plus/minus one of those orders of magnitude,

01:15:53.360 | but it's around that ballpark.

01:15:54.700 | And so obviously, there will be a lot more AI engineers

01:15:57.280 | than there will be ML engineers.

01:15:59.460 | How do you see this group?

01:16:01.540 | Like, is it all software engineers?

01:16:03.620 | Are they going to specialize?

01:16:06.940 | What would you advise someone trying

01:16:08.940 | to become an AI engineer?

01:16:10.620 | Is this a legitimate career path?

01:16:14.020 | Yeah, absolutely.

01:16:14.980 | I mean, it's very clear that AI is

01:16:18.580 | going to be a large part of how we build software in the future

01:16:20.380 | now.

01:16:21.540 | It's a bit like being a software developer in the '90s

01:16:24.340 | and ignoring the internet.

01:16:26.580 | You just need to learn about this stuff.

01:16:28.660 | You need to figure this stuff out.

01:16:31.060 | I don't think it needs to be--

01:16:33.260 | you don't need to be super low level.

01:16:36.700 | You don't need to be like--

01:16:38.300 | the metaphor here is that you don't

01:16:40.540 | need to be digging down into this sort of PyTorch level

01:16:45.280 | if you don't want to.

01:16:46.880 | In the same way as a software engineer in the '90s,

01:16:49.700 | you don't need to be understanding

01:16:51.260 | how network stacks work to be able to build a website.

01:16:53.380 | But you need to understand the shape of this thing

01:16:55.000 | and how to hold it and what it's good at and what it's not.

01:16:57.780 | And that's really important.

01:17:02.340 | So yeah, I certainly just advise people

01:17:04.580 | to just start playing around with it.

01:17:06.260 | Get a feel of how language models work.

01:17:08.340 | Get a feel of how these diffusion models work.

01:17:11.900 | Get a feel of what fine tuning is

01:17:14.620 | and how it works, because some of your job

01:17:17.500 | might be building data sets.

01:17:18.940 | Get a feeling of how prompting works,

01:17:20.500 | because some of your job might be writing a prompt.

01:17:22.620 | And those are just all really important skills

01:17:26.340 | to sort of figure out.

01:17:29.900 | Well, thanks for building the definitive platform

01:17:32.180 | for doing all that.

01:17:33.620 | Yeah, of course.

01:17:34.980 | Any final call to actions?

01:17:37.020 | Who should come work at Replicate?

01:17:39.460 | Yeah, anything for the audience?

01:17:41.740 | Yeah, I mean, we're hiring.

01:17:43.900 | If you click on Jobs at the bottom of replicate.com,

01:17:47.780 | there's some jobs.

01:17:50.380 | And I just encourage you to just try out AI,

01:17:54.100 | even if you think you're not smart enough.

01:17:56.260 | Like, the whole reason I started this company

01:17:58.100 | is because I was looking at the cool stuff

01:17:59.560 | that Andreas was making.

01:18:00.320 | Like, Andreas is like a proper machine learning person

01:18:02.560 | with a PhD.

01:18:03.660 | And I was just like a sort of lonely software engineer.

01:18:07.180 | And I was like, you're doing really cool stuff,

01:18:09.140 | and I want to be able to do that.

01:18:11.380 | And by us working together, we've

01:18:13.740 | now made it accessible to dummies like me.

01:18:17.260 | And I just encourage anyone who wants to try this stuff out,

01:18:20.700 | just give it a try.

01:18:21.580 | And I think I would also encourage

01:18:24.020 | people who are tool builders.

01:18:25.660 | Like, the limiting factor now on AI is not like the technology.

01:18:29.460 | Like, the technology has made incredible advances.

01:18:31.980 | And there's just so many incredible machine learning

01:18:34.980 | models that can do a ton of stuff.

01:18:37.500 | The limiting factor is just like making that accessible

01:18:40.380 | to people who build products.

01:18:41.900 | Because it's really hard to use this stuff right now.

01:18:44.300 | And obviously, we're building some of that stuff

01:18:45.860 | as Replicate.

01:18:46.380 | But there's just like a ton of other tooling and abstractions

01:18:49.180 | that need to be built out to make this stuff usable.

01:18:51.580 | So I just encourage people who like building developer tools

01:18:54.620 | to just like get stuck into it as well.

01:18:56.220 | Because that's going to make this stuff accessible

01:18:58.260 | to everyone.

01:18:58.900 | FRANCESC CAMPOY: Yeah.

01:18:59.820 | I especially want to highlight you have a Hacker-in-Residence

01:19:02.380 | job opening available, which not every company has,

01:19:04.580 | which means just join you and hack stuff.

01:19:07.380 | I think Charlie Holtz is doing a fantastic job of that.

01:19:09.660 | CHRIS BANES: Yep.

01:19:10.380 | Effectively, most of our--

01:19:12.060 | a lot of our job is just like showing people how to use AI.

01:19:15.660 | So we've just got a team of software developers.

01:19:17.700 | And people have kind of figured this stuff out

01:19:19.820 | who are writing about it, who are making videos about it,

01:19:25.340 | who are making example applications

01:19:26.740 | to show people what you can do with this stuff.

01:19:28.700 | FRANCESC CAMPOY: Yeah.

01:19:29.240 | In my world, that used to be called DevRel.

01:19:31.440 | But now it's Hacker-in-Residence.

01:19:33.620 | And that's--

01:19:34.460 | [LAUGHTER]

01:19:34.960 | CHRIS BANES: Yeah, this came from--

01:19:37.540 | Zeke is another one of our hackers.

01:19:40.680 | FRANCESC CAMPOY: Tell me this came from Chroma.

01:19:42.980 | To start that one.

01:19:43.780 | CHRIS BANES: We developed--

01:19:45.220 | Antoine actually was like, hey, we came up with that first.

01:19:47.900 | But I think we came up with it independently.

01:19:49.140 | FRANCESC CAMPOY: Yeah, I made that page, yeah.

01:19:50.060 | CHRIS BANES: I think we came up with it independently.

01:19:52.300 | Because the story behind this is we originally

01:19:55.700 | called it the DevRel team.

01:19:56.940 | And--

01:19:58.020 | FRANCESC CAMPOY: DevRel is cursed now.

01:19:59.620 | No one wants to listen to DevRel.

01:20:00.980 | CHRIS BANES: And Zeke was like, that sounds so boring.

01:20:03.300 | I have to go to someone and say I'm a developer relations

01:20:05.860 | person.

01:20:06.360 | FRANCESC CAMPOY: You don't want to be a hacker man.

01:20:07.500 | CHRIS BANES: Or a developer advocate or something.

01:20:09.340 | So we were like, OK, what's the way

01:20:11.020 | we can make this sound the most fun?

01:20:13.020 | All right, you're a hacker.

01:20:14.920 | I would say that is consistently the vibe

01:20:17.020 | I get from Replicate, everyone on your team I interact with.

01:20:20.260 | When I go to your San Francisco office,

01:20:22.740 | that's the vibe that you're generating.

01:20:24.660 | It's a hacker space more than an office.

01:20:27.060 | And you hold fantastic meetups there.

01:20:28.860 | And I think you're a really positive presence

01:20:30.660 | in our community.

01:20:31.460 | So thank you for doing all that.

01:20:33.020 | And it's instilling the hacker vibe and culture into AI.

01:20:36.980 | I'm really glad that's working.

01:20:38.700 | Yeah.

01:20:39.340 | Cool.

01:20:39.820 | That's a wrap, I think.

01:20:41.140 | Thank you so much for coming on, man.

01:20:42.660 | Yeah, of course.

01:20:43.060 | Thank you.

01:20:43.560 | This was a lot of fun.

01:20:45.340 | [MUSIC PLAYING]

01:20:48.700 | [MUSIC ENDS]

01:20:52.040 | [MUSIC PLAYING]

01:20:55.400 | [MUSIC ENDS]

01:20:59.320 | [MUSIC ENDS]

01:21:02.520 | [MUSIC ENDS]

01:21:05.560 | [BLANK_AUDIO]

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate

Chapters