back to index

A Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate


Chapters

0:0 Introductions
1:22 Low latency is all you need
4:39 Evolution of CLIs
6:47 How building ArxivVanity led to Replicate
13:13 Making ML research replicable with containers
19:47 Doing YC in 2020 and pivoting to tools for COVID
23:11 Launching the first version of Replicate
29:26 Embracing the generative image community
31:58 Getting reverse engineered into an API product
35:54 Growing to 2 million users
39:37 Indie vs Enterprise customers
42:58 How customers uses Replicate
44:30 Learnings from Docker that went into Cog
52:24 Creating AI standards
57:49 Replicate's compute availability
62:38 Fixing GPU waste
70:58 What's open source AI?
75:19 Building for AI engineers
77:33 Hiring at Replicate

Whisper Transcript | Transcript Only Page

00:00:00.000 | Hey, everyone.
00:00:00.800 | Welcome to the Latent Space Podcast.
00:00:02.640 | This is Alessio, partner and CTO of Residence
00:00:05.040 | at Decibel Partners.
00:00:06.080 | And I'm joined by my co-host, Swoogs, founder of Small.ai.
00:00:09.360 | Hey, and today we have Ben Fershman in the studio.
00:00:12.080 | Welcome, Ben.
00:00:12.760 | Hey, good to be here.
00:00:14.560 | Ben, you're a co-founder and CEO of Replicate.
00:00:17.280 | Before that, you were most notably creator of Fig,
00:00:21.120 | or founder of Fig, which became Docker Compose.
00:00:24.160 | You also did a couple other things before that.
00:00:26.480 | But that's what a lot of people know you for.
00:00:29.520 | What should people know about you outside
00:00:32.640 | of your LinkedIn profile?
00:00:35.720 | Yeah, good question.
00:00:37.600 | I think I'm a builder and tinkerer in a very broad sense.
00:00:40.680 | And I love using my hands to make things.
00:00:43.320 | So I work on things maybe a bit closer to tech,
00:00:46.720 | like electronics.
00:00:48.040 | But I also build things out of wood.
00:00:50.480 | And I fix cars, and I fix my bike,
00:00:55.040 | and build bicycles, and all this kind of stuff.
00:00:58.320 | And there's so much I think I've learned
00:01:01.560 | from transferable skills, from just working in the real world
00:01:04.160 | to building things in software.
00:01:08.400 | And there's so much about being a builder, both in real life
00:01:11.440 | and in software, that crosses over.
00:01:14.240 | Is there a real-world analogy that you use often
00:01:16.160 | when you're thinking about a code architecture problem?
00:01:22.040 | I like to build software tools as if they were something real.
00:01:30.720 | I like to imagine--
00:01:33.240 | so I wrote this thing called the command line interface
00:01:36.520 | guidelines, which was a bit like sort of the Mac human interface
00:01:39.560 | guidelines, but for command line interfaces.
00:01:41.400 | I did it with the guy I created Docker Compose with
00:01:47.680 | and a few other people.
00:01:50.240 | And I think something in there--
00:01:53.120 | I think I described that your command line interface should
00:01:55.460 | feel like a big iron machine, where you pull a lever
00:01:58.920 | and it goes clunk.
00:02:00.520 | And things should respond within 50 milliseconds,
00:02:04.160 | as if it was a real-life thing.
00:02:07.120 | And another analogy here is in the real life,
00:02:10.040 | you know when you press a button on an electronic device
00:02:13.000 | and it's like a soft switch, and you press it,
00:02:15.080 | and nothing happens, and there's no physical feedback
00:02:17.800 | about anything happening?
00:02:19.040 | And then half a second later, something happens?
00:02:21.240 | That's how a lot of software feels.
00:02:22.700 | But instead, software should feel more
00:02:24.540 | like something that's real, where you touch--
00:02:26.440 | you pull a physical lever and the physical lever moves.
00:02:29.760 | And I've taken that lesson of human interface
00:02:33.400 | to software a ton.
00:02:35.320 | It's all about latency, things feeling
00:02:37.600 | really solid and robust, both the command lines and user
00:02:42.800 | interfaces as well.
00:02:44.240 | And how did you operationalize that for Fig or Docker?
00:02:49.000 | A lot of it's just low latency.
00:02:50.320 | Actually, we didn't do it very well for Fig and [INAUDIBLE]
00:02:53.040 | in the first place.
00:02:53.840 | We used Python, which was a big mistake,
00:02:56.000 | where Python's really hard to get booting up fast,
00:02:58.840 | because you have to load up the whole Python runtime
00:03:01.000 | before it can run anything.
00:03:02.800 | Go is much better at this, where Go just instantly starts.
00:03:07.880 | So you have to be under 500 milliseconds to start up?
00:03:12.120 | Yeah, effectively.
00:03:13.080 | I mean, perception of human things
00:03:16.200 | being immediate is something like 100 milliseconds.
00:03:19.280 | So anything like that is good enough.
00:03:23.520 | Yeah.
00:03:24.640 | Also, I should mention, since we're
00:03:26.100 | talking about your side projects--
00:03:27.560 | well, one thing is I am maybe one of a few fellow people who
00:03:30.280 | have actually written something about CLI design principles,
00:03:33.600 | because I was in charge of the Netlify CLI back in the day
00:03:37.480 | and had many thoughts.
00:03:39.360 | One of my fun thoughts--
00:03:40.560 | I'll just share it in case you have thoughts--
00:03:42.480 | is I think CLIs are effectively starting points
00:03:45.680 | for scripts that are then run.
00:03:48.060 | And the moment one of the script's preconditions
00:03:50.200 | are not fulfilled, typically they end.
00:03:53.920 | So the CLI developer will just exit the program.
00:03:58.920 | And the way that I really wanted to create the Netlify dev
00:04:01.760 | workflow was for it to be kind of a state machine that
00:04:04.560 | would resolve itself.
00:04:06.640 | If it detected a precondition wasn't fulfilled,
00:04:09.480 | it would actually delegate to a subprogram that
00:04:11.960 | would then fulfill that precondition,
00:04:13.640 | asking for more info or waiting until a condition is fulfilled.
00:04:16.640 | Then it would go back to the original flow
00:04:18.440 | and continue that.
00:04:20.360 | Don't know if that was ever tried
00:04:21.800 | or is there a more formal definition of it,
00:04:23.800 | because I just came up with it randomly.
00:04:25.920 | But it felt like the beginnings of AI,
00:04:27.960 | in the sense that when you run a CLI command,
00:04:30.000 | you have an intent to do something.
00:04:32.040 | And you may not have given the CLI all the things
00:04:34.220 | that it needs to execute that intent.
00:04:37.640 | So that was my two cents.
00:04:39.520 | Yeah, that reminds me of a thing we sort of thought
00:04:43.160 | about when writing the CLI guidelines, where CLIs were
00:04:47.720 | designed in a world where the CLI was really
00:04:50.080 | a programming environment.
00:04:51.680 | And it was primarily designed for machines
00:04:54.840 | to use all of these commands and scripts.
00:04:57.560 | Whereas over time, the CLI has evolved to humans--
00:05:05.240 | it was back in a world where the primary way of using
00:05:08.360 | and computers was writing shell scripts, effectively.
00:05:12.440 | And we've transitioned to a world
00:05:14.960 | where, actually, humans are using CLI programs
00:05:17.240 | much more than they used to.
00:05:19.320 | And the current best practices about how Unix was designed--
00:05:27.280 | there's lots of design documents about Unix
00:05:29.240 | from the '70s and '80s, where they say things like,
00:05:33.080 | command line commands should not output anything on success.
00:05:37.560 | It should be completely silent, which
00:05:40.040 | makes sense if you're using it in a shell script.
00:05:42.120 | But if a user is using that, it just looks like it's broken.
00:05:45.640 | If you type copy and it just doesn't say anything,
00:05:47.780 | you assume that it didn't work as a new user.
00:05:52.120 | And yeah, so I think what's really interesting about the CLI
00:06:00.040 | is that it's actually a really good--
00:06:02.160 | to your point, it's a really good user interface
00:06:05.960 | where it can be like a conversation, where
00:06:08.920 | it feels like you're-- instead of just you
00:06:10.840 | telling the computer to do this thing
00:06:12.800 | and either silently succeeding or saying, no, you did--
00:06:16.720 | failed, it can guide you in the right direction
00:06:20.280 | and tell you what your intent might be
00:06:22.680 | and that kind of thing in a way that's actually--
00:06:25.680 | it's almost more natural to a CLI
00:06:27.280 | than it is in a graphical user interface
00:06:28.960 | because it feels like this back and forth with the computer,
00:06:31.760 | almost funnily like a language model.
00:06:36.920 | So I think there's some interesting intersection
00:06:39.040 | of CLIs and language models actually
00:06:41.160 | being very closely related and a good fit for each other.
00:06:46.240 | FRANCESC CAMPOY: Yeah.
00:06:47.200 | I would say one of the surprises from last year--
00:06:49.600 | I worked on a coding agent, but I
00:06:51.120 | think the most successful coding agent of my cohort
00:06:53.800 | was Open Interpreter, which was a CLI implementation.
00:06:56.740 | And I have chronically-- even as a CLI person,
00:06:59.280 | I have chronically underestimated the CLI
00:07:01.320 | as a useful interface.
00:07:05.020 | You also developed ArchiveVanity,
00:07:06.480 | which you recently retired after a glorious seven years.
00:07:10.120 | Something like that, yeah.
00:07:11.200 | Something like that, which is nice, I guess, HTML PDFs.
00:07:16.360 | Yeah, that was actually the start
00:07:19.240 | of where Replicate came from.
00:07:20.720 | OK, we can tell that story.
00:07:22.120 | Which-- so when I quit Docker, I got really interested
00:07:26.920 | in science infrastructure, just as a problem area,
00:07:32.560 | because it is--
00:07:36.080 | science has created so much progress in the world.
00:07:38.760 | The fact that we can talk to each other on a podcast,
00:07:42.680 | and we use computers, and the fact that we're alive
00:07:44.760 | is probably thanks to medical research.
00:07:46.800 | But science is just completely archaic and broken.
00:07:49.800 | And it's like 19th century processes
00:07:51.560 | that just happen to be copied to the internet
00:07:55.080 | rather than taken into account that we can transfer information
00:07:57.740 | at the speed of light now.
00:07:59.740 | And the whole way science is funded
00:08:01.240 | and all this kind of thing is all very broken.
00:08:04.040 | There's just so much potential for making science work better.
00:08:06.840 | And I realized that I wasn't a scientist,
00:08:08.560 | and I didn't really have any time to go and get a PhD
00:08:11.000 | and become a researcher.
00:08:12.520 | But I'm a tool builder, and I could make existing scientists
00:08:15.400 | better at their job.
00:08:16.240 | And if I could make a bunch of scientists a little bit better
00:08:19.040 | at their job, maybe that's the kind of equivalent
00:08:21.240 | of being a researcher.
00:08:23.600 | So one particular thing I dialed in on
00:08:27.040 | is just how science is disseminated,
00:08:28.960 | in that it's all of these PDFs, quite often behind paywalls
00:08:37.600 | on the internet.
00:08:39.400 | And that's a whole thing, because it's
00:08:41.400 | funded by national grants, government grants,
00:08:44.320 | and then they're put behind paywalls.
00:08:46.120 | Yeah, exactly.
00:08:47.080 | That's like a whole--
00:08:48.040 | yeah, I could talk for hours about that.
00:08:49.660 | But the particular thing we got dialed in on was--
00:08:54.080 | or I got kind of--
00:08:55.680 | but interestingly, these PDFs are also--
00:08:58.600 | there's a bunch of open science that happens as well.
00:09:00.800 | So math, physics, computer science, machine learning,
00:09:03.680 | notably, is all published on the Archive, which is actually
00:09:09.480 | a surprisingly old institution.
00:09:10.960 | Some random Cornell science project.
00:09:12.520 | Yeah, it was just like somebody in Cornell who started
00:09:13.940 | a mailing list in the '80s.
00:09:15.720 | And then when the web was invented,
00:09:17.940 | they built a web interface around it.
00:09:19.440 | Like, it's super old.
00:09:22.000 | And it's kind of like a user group thing, right?
00:09:24.480 | That's why all these numbers and stuff.
00:09:26.320 | Yeah, exactly.
00:09:27.080 | Like, it's a bit like Usenet or something.
00:09:31.040 | And that's where basically all of math, physics,
00:09:34.520 | and computer science happens.
00:09:36.600 | But it's still PDFs published to this thing, which
00:09:39.200 | is just so infuriating.
00:09:42.200 | So the web was invented at CERN, a physics institution,
00:09:50.280 | to share academic writing.
00:09:52.040 | Like, there are figure tags.
00:09:54.240 | There are author tags.
00:09:55.920 | There are heading tags.
00:09:56.920 | There are site tags.
00:09:57.920 | Hyperlinks are effectively citations,
00:10:00.100 | because you want to link to another academic paper.
00:10:02.280 | But instead, you have to copy and paste these things
00:10:04.320 | and try and get around paywalls.
00:10:05.600 | Like, it's absurd.
00:10:07.920 | And now we have social media and things,
00:10:10.040 | but still academic papers as PDFs.
00:10:13.840 | It's just like, why?
00:10:14.800 | This is not what the web was for.
00:10:17.720 | So anyway, I got really frustrated with that.
00:10:19.600 | And I went on vacation with my old friend Andreas.
00:10:22.080 | So we used to work together in London
00:10:23.840 | on a startup at somebody else's startup.
00:10:26.600 | And we were just on vacation in Greece for fun.
00:10:29.800 | And he was trying to read a machine learning
00:10:32.280 | paper on his phone.
00:10:33.520 | We had to zoom in and scroll line by line on the PDF.
00:10:37.400 | And he was like, this is fucking stupid.
00:10:39.720 | And I was like, I know.
00:10:40.760 | This is something.
00:10:41.480 | We discovered our mutual hatred for this.
00:10:44.880 | And we spent our vacation sitting by the pool,
00:10:48.080 | making LaTeX to HTML converters and making the first version
00:10:53.160 | of Archive Vanity.
00:10:55.240 | Anyway, that was then a whole thing.
00:10:56.920 | And the story-- we shut it down recently
00:10:59.920 | because they caught the eye of Archive, who were like,
00:11:03.520 | oh, this is great.
00:11:04.320 | We just haven't had the time to work on this.
00:11:06.200 | And what's tragic about the Archive is it is like this
00:11:09.400 | department of--
00:11:10.320 | it's like this project of Cornell that's like,
00:11:12.920 | they can barely scrounge together enough money to survive.
00:11:15.320 | I think it might be better funded now
00:11:16.860 | than when we were collaborating with them.
00:11:19.480 | And compared to these scientific journals,
00:11:21.760 | this is actually where the work happens.
00:11:23.480 | But they just have a fraction of the money
00:11:25.240 | that these big scientific journals have,
00:11:27.440 | which is just so tragic.
00:11:29.240 | But anyway, they were like, yeah, this is great.
00:11:31.240 | We can't afford to do it.
00:11:32.560 | But do you want to, as a volunteer,
00:11:34.060 | integrate Archive Vanity into Archive?
00:11:36.600 | Oh, you did the work.
00:11:37.720 | We didn't do the work.
00:11:38.680 | We started doing the work.
00:11:39.600 | We did some.
00:11:40.200 | I think we worked on this for a few months
00:11:42.280 | to actually get it integrated into Archive.
00:11:44.760 | And then we got distracted by Replicate.
00:11:48.980 | So a guy called Dayan picked up the work
00:11:52.420 | and made it happen, like somebody
00:11:54.380 | who works on one of the libraries that
00:11:56.820 | powers Archive Vanity.
00:11:58.340 | OK, and relationship with Archive Sanity?
00:12:02.100 | None.
00:12:02.900 | Did you predate them?
00:12:03.980 | I actually don't know the lineage.
00:12:05.540 | We were after-- we both were both users of Archive Sanity,
00:12:08.300 | which is like a sort of Archive--
00:12:09.740 | Which is Andre's--
00:12:10.540 | [INTERPOSING VOICES]
00:12:11.700 | --like Rexis on top of Archive.
00:12:13.100 | Yeah, yeah.
00:12:13.780 | And we were both users of that.
00:12:15.240 | And I think we were trying to come up
00:12:16.240 | with a working name for Archive.
00:12:17.680 | And Andreas just like cracked a joke of like,
00:12:19.900 | oh, let's call it Archive Vanity.
00:12:21.280 | Let's make the papers look nice.
00:12:22.840 | Yeah, yeah.
00:12:23.360 | And that was the working name.
00:12:24.200 | And it just stuck.
00:12:25.240 | Got it.
00:12:25.800 | Got it.
00:12:27.040 | Yeah.
00:12:28.000 | And then from there, tell us more
00:12:29.440 | about why you got distracted, right?
00:12:31.480 | So Replicate maybe feels like an overnight success
00:12:34.240 | to a lot of people.
00:12:35.740 | But you've been building this since 2019.
00:12:38.480 | Yeah.
00:12:39.000 | So what prompted the start?
00:12:40.800 | And we've been collaborating for even longer.
00:12:42.680 | We created Archive Vanity in 2017.
00:12:45.840 | So in some sense, we've been doing this almost like six,
00:12:48.180 | seven years now.
00:12:49.320 | A classic seven-year--
00:12:50.760 | Overnight success.
00:12:51.600 | Yeah.
00:12:52.100 | [LAUGHTER]
00:12:53.640 | Yes, we did Archive Vanity, and then worked
00:12:55.900 | on a bunch of surrounding projects.
00:12:57.360 | I was still really interested in science publishing
00:12:59.440 | at that point.
00:13:01.520 | And I'm trying to remember--
00:13:02.800 | because I tell a lot of the condensed story to people,
00:13:04.960 | because I can't really tell a seven-year history.
00:13:06.720 | So I'm trying to figure out the right--
00:13:08.360 | Oh, we got room for you.
00:13:09.640 | --the right length to--
00:13:10.760 | We want to nail the definitive Replicate story here.
00:13:13.240 | One thing that's really interesting about these machine
00:13:15.480 | learning papers is that these machine learning papers
00:13:19.240 | are published on the archive.
00:13:21.080 | And a lot of them are actual fundamental research,
00:13:23.520 | so should be prose describing a theory.
00:13:27.280 | But a lot of them are just running pieces of software
00:13:31.440 | that a machine learning researcher
00:13:32.880 | made that did something.
00:13:34.240 | It was like an image classification
00:13:38.940 | model or something.
00:13:40.040 | And they managed to make an image classification
00:13:42.380 | model that was better than the existing state of the art.
00:13:46.360 | And they've made an actual running piece of software
00:13:48.520 | that does image segmentation.
00:13:50.960 | And then what they had to do is they then
00:13:52.920 | had to take that piece of software
00:13:55.040 | and write it up as prose and math in a PDF.
00:13:58.640 | And what's frustrating about that is if you want to--
00:14:04.960 | so this was Andreas.
00:14:06.640 | Andreas was a machine learning engineer at Spotify.
00:14:09.480 | And some of his job was--
00:14:11.780 | he did pure research as well.
00:14:13.120 | He did a PhD, and he was doing a lot of stuff internally.
00:14:15.480 | But part of his job was also being an engineer
00:14:17.720 | and taking some of these existing things
00:14:20.340 | that people have made and published
00:14:22.120 | and trying to apply them to actual problems at Spotify.
00:14:25.760 | And he was like--
00:14:26.920 | you get given a paper, which describes
00:14:30.680 | roughly how the model works.
00:14:31.960 | It's probably listing lots of crucial information.
00:14:34.360 | There's sometimes code on GitHub.
00:14:36.060 | More and more, there's code on GitHub.
00:14:37.640 | Back then, it was kind of relatively rare.
00:14:40.880 | But it was quite often just scrappy research code
00:14:42.880 | and didn't actually run.
00:14:44.960 | And there was maybe the weights that were on Google Drive,
00:14:47.520 | but they accidentally deleted the weights off Google Drive.
00:14:50.240 | And it was really hard to take this stuff
00:14:52.560 | and actually use it for real things.
00:14:54.760 | And we just started talking together
00:14:57.520 | about his problems at Spotify.
00:15:00.200 | And I connected this back to my work at Docker as well.
00:15:03.640 | I was like, oh, this is what we created containers for.
00:15:07.560 | We solved this problem for normal software
00:15:09.040 | by putting the thing inside a container
00:15:10.740 | so you could ship it around and it kept on running.
00:15:13.000 | So we were sort of hypothesizing about,
00:15:16.880 | hmm, what if we put machine learning
00:15:18.440 | models inside containers so that they could actually
00:15:21.280 | be shipped around and they could be defined
00:15:23.080 | in some production-ready formats?
00:15:25.720 | And other researchers could run them to generate baselines.
00:15:29.040 | And people who wanted to actually apply them
00:15:31.200 | to real problems in the world could just pick up the container
00:15:33.800 | and run it.
00:15:36.800 | And we then thought, this is where it gets--
00:15:40.980 | normally, in this part of the story,
00:15:42.520 | I skip forward to be like, and then we
00:15:44.140 | created Cog, this container stuff for machine learning
00:15:47.060 | models.
00:15:47.560 | And we created Replicate, the place
00:15:48.920 | for people to publish these machine learning models.
00:15:50.480 | But there's actually like two or three years between that.
00:15:53.640 | The thing we then got dialed into
00:15:55.640 | was Andreas was like, what if there was a CI
00:15:58.840 | system for machine learning?
00:16:00.200 | Because one of the things he really
00:16:01.660 | struggled with as a researcher is generating baselines.
00:16:05.000 | So when he's writing a paper, he needs
00:16:06.680 | to get five other models that are existing in work
00:16:10.960 | and get them running--
00:16:12.460 | FRANCESC CAMPOY: On the same evals.
00:16:13.920 | MARK MANDEL: Exactly, on the same evals
00:16:15.400 | so you can compare apples to apples,
00:16:16.540 | because you can't trust the numbers in the paper.
00:16:19.040 | FRANCESC CAMPOY: Or you can be Google
00:16:20.540 | and just publish them anyway.
00:16:21.800 | [LAUGHTER]
00:16:24.560 | MARK MANDEL: So he was like, what if you could--
00:16:28.340 | I think this was coming from the thinking of,
00:16:30.180 | there should be containers for machine learning,
00:16:31.840 | but why are people going to use that?
00:16:33.520 | OK, maybe we can create a supply of containers
00:16:36.200 | by creating this useful tool for researchers.
00:16:39.080 | And the useful tool was like, let's get researchers
00:16:41.960 | to package up their models and push them
00:16:43.580 | to the central place where we run a standard set of benchmarks
00:16:46.560 | across the models so that you can trust those results
00:16:51.200 | and you can compare these models apples to apples.
00:16:53.200 | And for a researcher, for Andreas,
00:16:54.600 | doing a new piece of research, he could trust those numbers.
00:16:57.800 | And he could pull down those models,
00:17:02.440 | confirm it on his machine, use the standard benchmark
00:17:04.560 | to then measure his model, and all this kind of stuff.
00:17:08.480 | And so we started building that.
00:17:10.320 | That's what we applied to YC with.
00:17:12.560 | We got into YC, and we started building a prototype of this.
00:17:16.000 | And then this is where it all starts to fall apart.
00:17:19.160 | We were like, OK, that sounds great.
00:17:20.680 | And we talked to a bunch of researchers,
00:17:21.680 | and they really wanted that.
00:17:22.320 | And that sounds brilliant.
00:17:22.960 | That's a great way to create a supply of models
00:17:24.920 | on this research platform.
00:17:26.520 | But how the hell is this a business?
00:17:28.480 | How are we even going to make any money out of this?
00:17:30.640 | And we're like, oh, shit, that's the real unknown here
00:17:33.200 | of what the business is.
00:17:35.560 | So we thought it would be a really good idea to--
00:17:42.200 | OK, before we get too deep into this,
00:17:44.880 | let's try and reduce the risk of this turning into a business.
00:17:49.720 | So let's try and research what the business could
00:17:51.880 | be for this research tool, effectively.
00:17:57.360 | So we went and talked to a bunch of companies trying
00:18:00.660 | to sell them something which didn't exist.
00:18:02.360 | So we're like, hey, do you want a way
00:18:04.040 | to share research inside your company
00:18:06.320 | so that other researchers, or say the product manager,
00:18:09.000 | can test out the machine learning model?
00:18:11.240 | Maybe.
00:18:12.760 | And we were like, do you want a deployment platform
00:18:18.240 | for deploying models?
00:18:20.360 | Do you want a central place for versioning models?
00:18:22.880 | We're trying to think of lots of different products
00:18:24.960 | we could sell that were related to this thing.
00:18:27.640 | And terrible idea.
00:18:29.700 | We're not salespeople, and people
00:18:32.100 | don't want to buy something that doesn't exist.
00:18:34.640 | I think some people can pull this off,
00:18:36.180 | but we were just a bunch of product people, products
00:18:39.540 | and engineer people, and we just couldn't pull this off.
00:18:42.620 | So we then got halfway through our YC batch.
00:18:44.620 | We didn't have-- we hadn't built a product.
00:18:46.540 | We had no users.
00:18:47.480 | We had no idea what our business was going to be,
00:18:49.560 | because we couldn't get anybody to buy something
00:18:51.560 | which didn't exist.
00:18:53.860 | And actually, this was quite a way through our--
00:18:55.860 | I think it was like 2/3 the way through our YC batch
00:18:57.780 | or something.
00:18:58.300 | So we're like, OK, well, we're kind of screwed now,
00:19:00.460 | because we don't have anything to show at demo day.
00:19:02.740 | And then we then tried to figure out, OK,
00:19:05.780 | what can we build in two weeks that will be something?
00:19:09.060 | So we desperately tried to--
00:19:10.260 | I can't remember what we tried to build at that point.
00:19:13.300 | And then two weeks before demo day, I just remember this.
00:19:16.740 | I remember it was all--
00:19:22.580 | we were going down to Mountain View every week for dinners,
00:19:25.000 | and we got called on to an all-hands Zoom
00:19:26.620 | call, which was super weird.
00:19:27.860 | We're like, what's going on?
00:19:29.100 | And they were like, don't come to dinner tomorrow.
00:19:33.900 | And we realized-- we kind of looked at the news,
00:19:37.400 | and we were like, oh, there's a pandemic going on.
00:19:40.420 | We were so deep in our startup, we
00:19:42.020 | were just completely oblivious to what was going on around us.
00:19:44.640 | Was this Jan or Feb 2020?
00:19:46.540 | This was March 2020.
00:19:48.020 | March 2020.
00:19:48.540 | 2020, yeah.
00:19:49.340 | Because I remember Silicon Valley at the time
00:19:51.260 | was early to COVID.
00:19:52.940 | Like, they started locking down a lot faster
00:19:55.060 | than the rest of the US.
00:19:56.060 | And I remember soon after that, there
00:19:58.740 | was the San Francisco lockdowns.
00:20:00.580 | And then the YC batch just stopped.
00:20:02.900 | There wasn't demo day.
00:20:05.180 | And it was, in a sense, a blessing for us,
00:20:07.820 | because we just kind of couldn't raise money anyway.
00:20:11.520 | FRANCESC CAMPOY: In the normal course of events,
00:20:13.480 | you're actually allowed to defer to a future demo day.
00:20:16.140 | Yeah.
00:20:16.620 | So we didn't even take any defer,
00:20:17.700 | because it just kind of didn't happen.
00:20:19.240 | [LAUGHTER]
00:20:21.940 | So was YC helpful?
00:20:24.380 | We completely screwed up the batch,
00:20:25.840 | and that was our fault. I think the thing
00:20:27.620 | that YC has become incredibly valuable for us
00:20:30.220 | has been after YC.
00:20:33.700 | I think there was a reasonable argument
00:20:36.860 | that we didn't need to do YC to start with, because we
00:20:40.660 | were quite experienced.
00:20:42.340 | We had done some startups before.
00:20:45.220 | We were kind of well-connected with VCs.
00:20:47.020 | It was relatively easy to raise money,
00:20:48.600 | because we were a known quantity.
00:20:50.180 | If you go to a VC and be like, hey, I made this piece of--
00:20:53.020 | It's Docker-composed for AI.
00:20:54.280 | Exactly.
00:20:54.780 | Yeah, and people can pattern match like that,
00:20:59.020 | and they can have some trust you know what you're doing.
00:21:01.380 | Whereas it's much harder for people straight out of college,
00:21:03.540 | and that's where YC's sweet spot is helping people straight
00:21:05.740 | out of college who are super promising figure out
00:21:07.780 | how to do that.
00:21:08.300 | Yeah, no credentials.
00:21:09.140 | Yeah, exactly.
00:21:09.640 | So in some sense, we didn't need that.
00:21:11.180 | But the thing that's been incredibly useful for us
00:21:13.220 | since YC has been--
00:21:15.980 | this was actually, I think--
00:21:17.980 | so Docker was a YC company.
00:21:20.500 | And Solomon, the founder of Docker, I think, told me this.
00:21:22.900 | He was like, a lot of people underestimate the value of YC
00:21:26.940 | after you finish the batch.
00:21:29.140 | And his biggest regret was not staying in touch with YC.
00:21:32.780 | I might be misattributing this, but I think it was him.
00:21:35.620 | And so we made a point of that, and we just
00:21:37.360 | stayed in touch with our batch partner, who--
00:21:39.900 | Jared at YC has been fantastic.
00:21:41.660 | Jared Harris.
00:21:43.120 | Jared Friedman.
00:21:43.860 | Friedman.
00:21:44.740 | And all of the team at YC--
00:21:47.540 | there was the growth team at YC when they were still there,
00:21:49.980 | and they've been super helpful.
00:21:52.660 | And two things that have been super helpful about that
00:21:57.060 | is raising money.
00:21:58.260 | They just know exactly how to raise money,
00:22:00.100 | and they've been super helpful during that process
00:22:01.580 | in all of our rounds.
00:22:02.540 | We've done three rounds since we did YC,
00:22:04.180 | and they've been super helpful during the whole process.
00:22:06.860 | And also just reaching a ton of customers.
00:22:09.500 | So the magic of YC is that you have all of--
00:22:11.700 | there's thousands of YC companies,
00:22:13.220 | I think, on the order of thousands, I think.
00:22:16.180 | And they're all of your first customers.
00:22:18.700 | And they're super helpful, super receptive,
00:22:20.980 | really want to try out new things.
00:22:23.900 | You have a warm intro to every one of them, basically.
00:22:26.500 | And there's this mailing list where
00:22:27.960 | you can post about updates to your product, which
00:22:31.860 | is really receptive.
00:22:33.620 | And that's just been fantastic for us.
00:22:35.340 | We've just got so many of our users and customers
00:22:39.940 | through YC.
00:22:41.580 | Yeah, well, so the classic criticism
00:22:43.780 | or the pushback is people don't buy you
00:22:47.740 | because you are both from YC.
00:22:51.380 | But at least they'll open the email.
00:22:52.980 | Yeah.
00:22:53.480 | Right?
00:22:53.980 | That's the-- OK.
00:22:54.780 | Yeah, effectively.
00:22:56.820 | And yeah, so that's been a really, really positive
00:23:00.980 | experience for us.
00:23:02.100 | And sorry, I interrupted with the YC question.
00:23:05.340 | You just made it out of the YC, survived the pandemic.
00:23:09.540 | And you-- yeah.
00:23:10.580 | I'll try and condense this a little bit.
00:23:12.780 | Then we started building tools for COVID, weirdly.
00:23:15.100 | We were like, OK, we don't have a startup.
00:23:16.420 | We haven't figured out anything.
00:23:17.820 | What's the most useful thing we could be doing right now?
00:23:20.900 | Save lives.
00:23:21.700 | So yeah, let's try and save lives.
00:23:23.640 | I think we failed at that as well.
00:23:25.020 | We had a bunch of products that didn't really go anywhere.
00:23:28.340 | We worked on a bunch of stuff, like contact tracing,
00:23:31.940 | which didn't really be a useful thing.
00:23:36.060 | Andreas worked on a door dash for people delivering food
00:23:42.780 | to people who are vulnerable.
00:23:45.380 | What else did we do?
00:23:46.220 | We met a problem of helping people direct their efforts
00:23:48.540 | to what was most useful and a few other things like that.
00:23:51.740 | It didn't really go anywhere.
00:23:52.980 | So we're like, OK, this is not really working either.
00:23:55.820 | We were considering actually just doing work for COVID.
00:23:58.780 | We have this decision document early on in our company, which
00:24:01.300 | is like, should we become a government app contracting
00:24:04.220 | shop?
00:24:06.860 | We decided no.
00:24:07.700 | Because you also did work for the US--
00:24:09.740 | for the gov.uk.
00:24:10.740 | Yeah, exactly.
00:24:11.340 | We had experience doing some--
00:24:14.780 | And the Guardian and all that.
00:24:16.060 | Yeah, for government stuff.
00:24:18.100 | And we were just really good at building stuff.
00:24:20.060 | We were just product people.
00:24:21.460 | I was the front end product side,
00:24:23.060 | and Andreas was the back end side.
00:24:24.520 | So we were just a product.
00:24:25.940 | And we were working with a designer at the time,
00:24:28.140 | a guy called Mark, who did our early designs for Replicate.
00:24:30.660 | And we were like, hey, what if we just team up and become it
00:24:33.240 | and build stuff?
00:24:35.020 | But yeah, we gave up on that in the end for--
00:24:36.940 | I can't remember the details.
00:24:38.540 | So we went back to machine learning.
00:24:42.340 | And then we were like, well, we're
00:24:44.580 | not really sure if this is going to work.
00:24:46.340 | And one of my most painful experiences
00:24:49.420 | from previous startups is shutting them down,
00:24:51.380 | when you realize it's not really working
00:24:52.540 | and having to shut it down.
00:24:53.380 | It's a ton of work.
00:24:54.260 | And people hate you.
00:24:55.700 | And it's just sort of, you know--
00:24:59.380 | so we were like, how can we make something
00:25:01.040 | we don't have to shut down?
00:25:02.460 | And even better, how can we make something
00:25:04.240 | that won't page us in the middle of the night?
00:25:07.820 | So we made an open source project.
00:25:10.700 | We made a thing which was an open source weights and biases,
00:25:14.940 | because we had this theory that people want open source tools.
00:25:18.780 | There should be an open source version control experiment
00:25:21.580 | tracking-like thing.
00:25:22.860 | And it was intuitive to us.
00:25:24.300 | And we were like, oh, we're software developers.
00:25:26.300 | And we like command line tools.
00:25:27.340 | Everyone loves command line tools and open source stuff.
00:25:30.100 | But machine learning researchers just really didn't care.
00:25:31.940 | They just wanted to click on buttons.
00:25:33.480 | They didn't mind that it was a cloud service.
00:25:35.420 | It was all very visual as well, that you
00:25:37.380 | need lots of graphs and charts and stuff like this.
00:25:40.620 | So it just didn't-- it wasn't right.
00:25:43.460 | Like, it was right.
00:25:44.300 | We were actually rebuilding something
00:25:45.340 | that Andreas made at Spotify for just saving experiments
00:25:47.980 | to cloud storage automatically.
00:25:50.020 | But other people didn't really want this.
00:25:52.420 | So we kind of gave up on that.
00:25:54.680 | And then that was actually originally called Replicate.
00:25:56.940 | And we renamed that out of the way.
00:25:58.400 | So it's now called Keepsake.
00:25:59.680 | And I think some people still use it.
00:26:01.220 | Then we sort of came back-- we looped back
00:26:03.140 | to our original idea.
00:26:05.900 | So we were like, oh, maybe there was a thing.
00:26:07.820 | And that thing we were originally
00:26:09.220 | thinking about of researchers sharing
00:26:11.020 | their work in containers for machine learning models.
00:26:13.740 | So we just built that.
00:26:15.000 | And at that point, we were kind of running out of the YC money.
00:26:17.700 | So we were like, OK, this feels good, though.
00:26:19.580 | Let's give this a shot.
00:26:20.980 | So that was the point we raised a seed round.
00:26:23.260 | We raised--
00:26:24.820 | - Pre-launch.
00:26:25.900 | - We raised pre-launch, pre-launch and pre-team.
00:26:29.060 | It was an idea, basically.
00:26:30.140 | We had a little prototype.
00:26:31.220 | It was just an idea and a team.
00:26:34.060 | But we were like, OK, bootstrapping this thing
00:26:38.700 | is getting hard, so let's actually raise some money.
00:26:42.460 | And then we made Cog and Replicate.
00:26:46.940 | It initially didn't have APIs, interestingly.
00:26:49.500 | It was just the bit that I was talking about before,
00:26:52.080 | of helping researchers share their work.
00:26:53.780 | So it was a way for researchers to put their work on a web page
00:26:59.180 | such that other people could try it out,
00:27:02.420 | and so that you could download the Docker container.
00:27:04.580 | So that we didn't have--
00:27:05.940 | we cut the benchmarks thing of it, because we thought
00:27:08.140 | it was too complicated.
00:27:09.580 | But it had a Docker container that Andreas, in a past life,
00:27:13.300 | could download and run with his benchmark.
00:27:15.740 | And you could compare all these models apples to apples.
00:27:18.220 | So that was the theory behind it.
00:27:20.300 | And that kind of started to work.
00:27:24.500 | It was still when it was long time pre-AI hype.
00:27:29.740 | And there was lots of interesting stuff going on.
00:27:31.740 | But it was very much in the classic deep learning era,
00:27:35.060 | so image segmentation models, and sentiment analysis,
00:27:39.700 | and all these kind of things that people were using deep
00:27:43.900 | learning models for.
00:27:44.900 | And we were very much building for research,
00:27:46.740 | because all of this stuff was happening
00:27:48.580 | in research institutions.
00:27:49.860 | These are people who'd be publishing to archive.
00:27:51.820 | So we were creating an accompanying material
00:27:54.500 | for their models, basically.
00:27:55.720 | They wanted a demo for their models.
00:27:57.900 | And we were creating accompanying material for it.
00:28:01.540 | And what was funny about that is they
00:28:04.140 | were not very good users.
00:28:06.500 | They were doing great work, obviously.
00:28:09.100 | But the way the research worked is
00:28:10.600 | that they just made one thing every six months,
00:28:13.820 | and they just fired and forgot it.
00:28:16.060 | They published this piece of paper, and like, done.
00:28:18.780 | I've published it.
00:28:20.660 | So they output it to Replicate.
00:28:22.940 | And then they just stopped using Replicate.
00:28:25.100 | They were like once every six monthly users.
00:28:28.420 | And that wasn't great for us.
00:28:30.940 | But we stumbled across this early community.
00:28:34.220 | This was early 2021, when people started--
00:28:41.540 | OpenAI created this-- created Clip.
00:28:43.980 | And people started smushing Clip and GANs together
00:28:46.940 | to produce image generation models.
00:28:49.420 | And this started with--
00:28:51.940 | it was just a bunch of tinkerers on Discord, basically.
00:28:56.860 | It was-- there was an early model called BigSleep
00:29:01.740 | by AdVadNown.
00:29:03.980 | And then there was VQGAN Clip, which
00:29:05.900 | was a bit more popular, by Rivers Have Wings.
00:29:08.560 | And it was all just people tinkering on stuff in Colabs.
00:29:10.900 | And it was very dynamic.
00:29:11.940 | And it was people just making copies of Colabs
00:29:13.420 | and playing around with things and forking.
00:29:15.300 | And to me, I saw this, and I was like, oh, this
00:29:17.300 | feels like open source software, so much more
00:29:19.180 | than the research world, where people
00:29:21.620 | are publishing these papers.
00:29:22.820 | Yeah, you don't know their real names,
00:29:24.020 | and it's just like a Discord.
00:29:25.420 | Yeah, exactly.
00:29:26.100 | But crucially, it was like people
00:29:27.440 | were tinkering and forking.
00:29:28.620 | And people were-- things were moving really fast.
00:29:30.780 | And it just felt like this creative, dynamic,
00:29:34.940 | collaborative community in a way that research wasn't really.
00:29:41.460 | Like, it was still stuck in this kind of six-month publication
00:29:44.040 | cycle.
00:29:44.940 | So we just kind of latched onto that
00:29:46.660 | and started building for this community.
00:29:51.220 | And a lot of those early models were published on Replicate.
00:29:55.460 | I think the first one that was really primarily on Replicate
00:29:58.580 | was one called Pixray, which was sort of mid-2021.
00:30:04.500 | And it had a really cool pixel art output,
00:30:06.420 | but it also just produced--
00:30:07.880 | they weren't crisp in images, but they
00:30:09.460 | were quite aesthetically pleasing,
00:30:10.880 | like some of these early image generation models.
00:30:13.140 | And that was published primarily on Replicate.
00:30:18.100 | And then a few other models around that
00:30:19.780 | were published on Replicate.
00:30:21.620 | And that's where we really started
00:30:23.040 | to find our early community and where we really found,
00:30:25.300 | oh, we've actually built a thing that people want.
00:30:29.100 | And they were great users as well,
00:30:30.700 | and people really want to try out these models.
00:30:32.700 | Lots of people were running the models on Replicate.
00:30:35.020 | We still didn't have APIs, though, interestingly.
00:30:37.220 | And this is another really complicated part of the story.
00:30:39.340 | We had no idea what a business model was still at this point.
00:30:41.660 | I don't think people could even pay for it.
00:30:43.460 | It's just these web forms where people could run the model.
00:30:47.020 | FRANCESC CAMPOY: Just before this API bit continues,
00:30:48.940 | just for historical interests, which discords were they,
00:30:51.420 | and how did you find them?
00:30:52.140 | Was this the Lion Discord?
00:30:53.620 | MARK MANDEL: Yeah, Lion--
00:30:54.140 | FRANCESC CAMPOY: This is Alutha.
00:30:54.740 | MARK MANDEL: Alutha, yeah.
00:30:55.620 | It was the Alutha one.
00:30:56.300 | FRANCESC CAMPOY: These two, right?
00:30:56.860 | MARK MANDEL: Alutha, I particularly remember.
00:30:58.780 | There was a channel where VikiGangClip--
00:31:00.860 | this was early 2021-- where VikiGangClip
00:31:02.700 | was set up as a Discord bot.
00:31:06.860 | And I just remember being completely just captivated
00:31:11.260 | by this thing.
00:31:11.980 | I was just playing around with it all afternoon
00:31:13.820 | and the sort of thing--
00:31:14.580 | FRANCESC CAMPOY: In Discord.
00:31:15.060 | MARK MANDEL: --where, oh, shit, it's 2 AM.
00:31:16.580 | FRANCESC CAMPOY: This is the beginnings of MidJourney.
00:31:18.340 | MARK MANDEL: Yeah, exactly.
00:31:18.860 | FRANCESC CAMPOY: And it was instability.
00:31:20.560 | MARK MANDEL: It was the start of MidJourney.
00:31:22.740 | And it's where that kind of user interface came from.
00:31:25.020 | What was beautiful about the user interface
00:31:26.540 | is you could see what other people are doing.
00:31:28.900 | And you could riff off other people's ideas.
00:31:32.180 | And it was just so much fun to just play around
00:31:35.260 | with this in a channel full of 100 people.
00:31:38.540 | And yeah, that just completely captivated me.
00:31:40.620 | And I'm like, OK, this is something.
00:31:43.620 | So we should get these things on Replicate.
00:31:46.780 | And yeah, that's where that all came from.
00:31:49.320 | FRANCESC CAMPOY: OK, sorry.
00:31:50.860 | I just wanted to capture that.
00:31:52.260 | MARK MANDEL: Yeah, yeah.
00:31:53.260 | FRANCESC CAMPOY: And then you moved on to--
00:31:54.780 | so was it APIs Next or was it Stable Diffusion Next?
00:31:56.940 | MARK MANDEL: It was APIs Next.
00:31:58.200 | And the APIs happened because one of our users--
00:32:02.700 | our web form had an internal API for making the web form work,
00:32:05.860 | like with an API that was called from JavaScript.
00:32:08.740 | And somebody reverse engineered that
00:32:12.820 | to start generating images with a script.
00:32:15.020 | They did web inspector, copy as curl,
00:32:18.500 | figure out what the API request was.
00:32:22.180 | And it wasn't secured or anything.
00:32:24.460 | FRANCESC CAMPOY: Of course not.
00:32:25.800 | MARK MANDEL: And they started generating a bunch of images.
00:32:28.300 | And we got tons of traffic.
00:32:29.500 | We're like, what's going on?
00:32:31.700 | And I think a sort of usual reaction to that would be like,
00:32:36.820 | hey, you're abusing our API to shut them down.
00:32:39.420 | And instead we're like, oh, this is interesting.
00:32:41.420 | Like, people want to run these models.
00:32:44.220 | So we documented the API in a Notion document,
00:32:48.500 | like our internal API in a Notion document,
00:32:50.860 | and messaged this person being like, hey,
00:32:54.620 | you seem to have found our API.
00:32:57.140 | Here's the documentation.
00:32:58.180 | That'll be like $1,000 a month, please, with a straight form
00:33:01.980 | that we just clicked some buttons to make.
00:33:04.540 | And they were like, sure, that sounds great.
00:33:06.340 | So that was our first customer.
00:33:08.780 | FRANCESC CAMPOY: $1,000 a month?
00:33:10.140 | MARK MANDEL: It was a surprising amount of money, yeah.
00:33:11.780 | FRANCESC CAMPOY: That's not casual.
00:33:13.300 | MARK MANDEL: It was on the order of $1,000 a month.
00:33:14.860 | FRANCESC CAMPOY: So was it a business?
00:33:17.020 | MARK MANDEL: It was the creator of PixRay.
00:33:20.420 | He generated NFT art.
00:33:23.220 | And so he made a bunch of art with these models
00:33:27.180 | and was selling these NFTs, effectively.
00:33:31.300 | And I think lots of people in his community
00:33:33.100 | were doing similar things.
00:33:34.220 | And he then referred us to other people
00:33:35.840 | who were also generating NFTs and trying to save models.
00:33:39.860 | And that was the start of our API business, yeah.
00:33:44.860 | And then we made an official API and actually
00:33:47.620 | added some billing to it so it wasn't just like a fixed fee.
00:33:52.720 | FRANCESC CAMPOY: And now people think of you as the host
00:33:55.020 | and models API business.
00:33:56.140 | MARK MANDEL: Yeah, exactly.
00:33:57.660 | But that just turned out to be our business.
00:33:59.500 | But what ended up being beautiful about this
00:34:02.380 | is it was really fulfilling, like the original goal of what
00:34:05.820 | we wanted to do is that we wanted to make this research
00:34:08.220 | that people were making accessible to other people
00:34:12.220 | and for it to be used in the real world.
00:34:14.460 | And this was just ultimately the right way
00:34:17.860 | to do it because all of these people making
00:34:19.900 | these generative models could publish them to replicate,
00:34:22.460 | and they wanted a place to publish it.
00:34:24.380 | And software engineers, like myself--
00:34:26.900 | I'm not a machine learning expert,
00:34:28.300 | but I want to use this stuff--
00:34:30.180 | could just run these models with a single line of code.
00:34:32.500 | And we thought, oh, maybe the Docker image is enough.
00:34:34.380 | But it's actually super hard to get the Docker image running
00:34:36.380 | on a GPU and stuff.
00:34:37.300 | So it really needed to be the hosted API for this to work
00:34:40.060 | and to make it accessible to software engineers.
00:34:42.100 | And we just wound our way to this--
00:34:45.340 | FRANCESC CAMPOY: Yeah, two years to the first paying customer.
00:34:47.940 | MARK MANDEL: Yeah, exactly.
00:34:49.620 | FRANCESC CAMPOY: Did you ever think about becoming
00:34:51.580 | MidJourney during that time?
00:34:53.220 | You have so much interest in image generation.
00:34:55.140 | MARK MANDEL: What could have been?
00:34:57.020 | I mean, you're doing fine, for the record, but you know.
00:35:01.020 | It was right there.
00:35:01.820 | You were playing with it.
00:35:04.100 | Yeah, I don't think it was our expertise.
00:35:06.740 | I think our expertise was DevTools rather than--
00:35:08.740 | MidJourney is almost like a consumer products.
00:35:12.420 | So I don't think it was our expertise.
00:35:14.300 | It certainly occurred to us.
00:35:16.220 | I think at the time, we were thinking about,
00:35:18.060 | like, oh, maybe we could hire some of these people
00:35:19.940 | in this community and make great models and stuff like this.
00:35:22.440 | But we ended up more being at the tooling.
00:35:26.380 | I think before, I was saying, I'm not really a researcher.
00:35:28.740 | I'm more like the tool builder, the behind the scenes.
00:35:30.500 | And I think both me and Andreas are like that.
00:35:32.420 | FRANCESC CAMPOY: Yeah.
00:35:33.380 | I think this is also like an illustration
00:35:35.260 | of the tool builder philosophy, something
00:35:37.220 | where you latch onto in DevTools, which
00:35:39.500 | is when you see people behaving weird,
00:35:41.500 | it's not their fault, it's yours.
00:35:43.940 | And you want to pave the cow paths, is what they say, right?
00:35:46.380 | Like, the unofficial paths that people are making,
00:35:48.540 | like, make it official and make it easy for them,
00:35:50.580 | and then maybe charge a bit of money.
00:35:52.780 | Yeah.
00:35:54.300 | And now fast forward a couple of years,
00:35:56.460 | you have two million developers using Replicate, maybe more.
00:35:59.940 | That was the last public number that I found.
00:36:01.980 | Two million-- I think that got mangled, actually, by--
00:36:04.700 | it's two million users.
00:36:05.900 | Not all those people are developers,
00:36:07.100 | but a lot of them are developers.
00:36:08.460 | Yeah.
00:36:09.780 | And then 30,000 paying customers was the number.
00:36:13.660 | That's awesome.
00:36:14.940 | Latent Space runs on Replicate.
00:36:16.620 | So we had a small podcaster, and we host--
00:36:18.860 | FRANCESC CAMPOY: We do a transcription on--
00:36:20.620 | MARK MANDEL: --Whisper diarization on Replicate.
00:36:23.300 | And we're paying.
00:36:24.180 | So we're late in space, and this is in the 30,000.
00:36:27.860 | You raised $40 million, Series B.
00:36:31.620 | I would say that maybe the stable diffusion time, August
00:36:34.740 | '22, was really when the company started to break out.
00:36:39.320 | Tell us a bit about that and the community that came out.
00:36:41.820 | And I know now you're expanding beyond just image generation.
00:36:45.500 | Yeah.
00:36:48.500 | I think we kind of set ourselves--
00:36:50.220 | we saw there was this really interesting generative image
00:36:52.660 | world going on.
00:36:53.300 | So we're building the tools for that community already, really.
00:36:59.420 | And we knew stable diffusion was coming out.
00:37:03.420 | We knew it was a really exciting thing.
00:37:05.040 | It was the best generative image model so far.
00:37:08.060 | I think the thing we underestimated
00:37:10.020 | was just what an inflection point it would be,
00:37:12.860 | where it was--
00:37:13.660 | it was-- I think Simon Willison put it this way,
00:37:19.020 | where he said something along the lines of,
00:37:20.820 | it was a model that was open source and tinkerable
00:37:24.580 | and good enough that it was just good enough
00:37:32.260 | and open source and tinkerable, such that it just took off
00:37:34.900 | in a way that none of the models had before.
00:37:37.540 | And what was really neat about stable diffusion
00:37:40.520 | is it was open source, so you could--
00:37:44.580 | compared to DALI, for example, which was equivalent quality,
00:37:48.340 | it was open source.
00:37:49.180 | So you could fork it and tinker on it.
00:37:50.760 | And the first week, we saw people making animation models
00:37:53.580 | out of it.
00:37:54.080 | We saw people make game texture models
00:37:57.420 | that use circular convolutions to make repeatable textures.
00:38:00.460 | We saw-- what else did we see?
00:38:03.740 | A few weeks later, people were fine-tuning it
00:38:05.620 | so you could put your face in these models.
00:38:07.820 | And all of these other--
00:38:09.940 | Textual inversion.
00:38:11.220 | Yeah, exactly.
00:38:12.020 | That happened a bit before that.
00:38:13.940 | And all of this innovation was happening all of a sudden.
00:38:18.140 | And people were publishing on Replicate
00:38:19.860 | because you could just publish arbitrary models on Replicate.
00:38:22.400 | So we had this supply of interesting stuff being built.
00:38:25.140 | But because it was a sufficiently good model,
00:38:28.580 | there was also just a ton of people building with it.
00:38:33.100 | They were like, oh, we can build products with this thing.
00:38:35.500 | And this was about the time where people were starting
00:38:37.380 | to get really interested in AI.
00:38:38.420 | So tons of product builders wanted to build stuff with it.
00:38:40.980 | And we were just sitting in there
00:38:41.900 | in the middle as the interface layer between all these people
00:38:44.580 | who wanted to build and all these machine learning
00:38:46.660 | experts who were building cool models.
00:38:48.860 | And that's really where it took off.
00:38:50.580 | We were just incredible supply, incredible demand.
00:38:53.000 | And we were just in the middle.
00:38:55.260 | And then, yeah, since then we've just grown and grown, really.
00:38:58.840 | And we've been building a lot for the indie hacker community,
00:39:02.080 | these individual tinkerers, but also startups,
00:39:04.340 | and a lot of large companies as well who are exploring
00:39:07.200 | and building AI things.
00:39:09.560 | And then the same thing happened middle of last year
00:39:13.880 | with language models and LLAMA2, where
00:39:16.280 | the same stable diffusion effect happened with LLAMA.
00:39:19.840 | And LLAMA2 was our biggest week of growth
00:39:21.640 | ever because tons of people wanted to tinker with it
00:39:23.760 | and run it.
00:39:25.000 | And since then, we've just been seeing a ton of growth
00:39:27.320 | in language models as well as image models.
00:39:29.720 | And yeah, we're just riding a lot of the interest that's
00:39:33.880 | going on in AI and all the people building in AI.
00:39:37.560 | FRANCESC CAMPOY: That's-- yeah, kudos.
00:39:39.160 | Right place, right time.
00:39:40.160 | But also took a while to position for the right place
00:39:44.480 | before the wave came.
00:39:46.360 | I'm curious if you have any insights
00:39:49.240 | on these different markets.
00:39:51.880 | So Peter Levels, notably a very loud person,
00:39:56.200 | very picky about his tools.
00:39:58.120 | I wasn't sure, actually, if he used you.
00:39:59.760 | He does because you cited him on your Series B blog post,
00:40:02.680 | and Danny Post might as well, his competitor,
00:40:05.240 | all in that wave.
00:40:07.040 | What are their needs versus the more enterprise or B2B type
00:40:12.760 | needs?
00:40:14.080 | Did you come to a decision point where you're like,
00:40:16.240 | OK, how serious are these indie hackers
00:40:18.280 | versus the actual businesses that
00:40:20.040 | are bigger and perhaps better customers because they're
00:40:22.720 | less churny?
00:40:23.840 | They're surprisingly similar because I
00:40:25.960 | think a lot of people right now want to use and build with AI,
00:40:29.480 | but they're not AI experts.
00:40:32.040 | And they're not infrastructure experts either.
00:40:34.080 | So they want to be able to use this stuff
00:40:35.780 | without having to figure out all the internals of the models
00:40:38.680 | and touch PyTorch and whatever.
00:40:42.040 | And they also don't want to be setting up and booting up
00:40:44.680 | servers.
00:40:46.800 | And that's the same all the way from indie hackers just
00:40:51.360 | getting started-- because obviously, you just
00:40:53.280 | want to get started as quickly as possible--
00:40:55.200 | all the way through to large companies
00:40:57.160 | who want to be able to use this stuff,
00:41:00.000 | but don't have all of the experts on stuff.
00:41:04.200 | I think some companies are quite--
00:41:07.680 | big companies like Google and so on that
00:41:09.380 | do actually have a lot of experts on stuff,
00:41:10.840 | but the vast majority of companies don't.
00:41:12.560 | And they're all software engineers
00:41:13.600 | who want to be able to use this AI stuff,
00:41:15.300 | but they just don't know how to use it.
00:41:17.080 | And it's like, you really need to be an expert.
00:41:19.320 | And it takes a long time to learn the skills
00:41:20.800 | to be able to use that.
00:41:21.760 | So they're surprisingly similar in that sense.
00:41:24.200 | And I think it's kind of also unfair of the indie community.
00:41:30.160 | They're not churning, surprisingly, or churny
00:41:32.360 | or spiky, surprisingly.
00:41:33.600 | They're building real established businesses,
00:41:36.040 | which is like, kudos to them of building
00:41:39.240 | these really large, sustainable businesses, often just
00:41:44.760 | as solo developers.
00:41:47.800 | And it's kind of remarkable how they can do that, actually.
00:41:50.260 | And it's in credit to a lot of their product skills.
00:41:53.320 | And we're just there to help them,
00:41:55.600 | being their machine learning team, effectively,
00:41:57.640 | to help them use all of this stuff.
00:41:59.760 | So we're actually making--
00:42:02.280 | a lot of these indie hackers are some of our largest customers,
00:42:04.880 | alongside some of our biggest customers
00:42:06.720 | that you would think would be spending a lot more money
00:42:12.560 | than them.
00:42:13.200 | Yeah.
00:42:13.720 | And we should name some of these.
00:42:14.760 | You have them on your landing page.
00:42:16.180 | You have BuzzFeed.
00:42:16.920 | You have Unsplash, Character AI.
00:42:21.400 | What do they power?
00:42:22.720 | What can you say about their usage?
00:42:24.260 | Yeah, totally.
00:42:24.880 | It's kind of various things.
00:42:28.200 | I'm trying to think.
00:42:28.960 | Let me actually think.
00:42:33.360 | What can I say about what customers?
00:42:35.680 | Well, I mean, I'm naming them because they're
00:42:37.120 | on your landing page.
00:42:38.000 | So you have logo rights.
00:42:40.480 | It's useful for people to--
00:42:42.640 | I'm not imaginative.
00:42:43.640 | I see-- monkeys see monkeys do, right?
00:42:45.480 | Like, if I see someone doing something that I want to do,
00:42:47.840 | then I'm like, OK, Replicate's great for that.
00:42:50.040 | So that's what I think about case studies on company
00:42:52.320 | landing pages, is that it's just a way of explaining,
00:42:55.000 | like, yep, this is something that we are good for.
00:42:57.800 | Yeah, totally.
00:42:58.800 | I mean, these companies are doing things
00:43:01.800 | all the way up and down the stack
00:43:03.300 | at different levels of sophistication.
00:43:05.320 | So Unsplash, for example, they actually
00:43:10.160 | publicly posted this story on Twitter
00:43:12.440 | where they're using Blip to annotate all
00:43:17.280 | of the images in their catalog.
00:43:19.180 | So they have lots of images in the catalog,
00:43:20.920 | and they want to create a text description of it
00:43:22.920 | so you can search for it.
00:43:24.160 | And they're annotating images with off-the-shelf open source
00:43:26.880 | model.
00:43:27.400 | We have this big library of open source models that you can run.
00:43:30.360 | And we've got lots of people who are running these open source
00:43:32.940 | models off the shelf.
00:43:34.280 | And then most of our larger customers
00:43:37.920 | are doing more sophisticated stuff.
00:43:39.880 | So they're fine-tuning the models.
00:43:42.200 | They're running completely custom models on us.
00:43:45.000 | And so a lot of these larger companies
00:43:47.640 | are using us for a lot of their inference.
00:43:54.000 | But it's a lot of custom models and them
00:43:56.400 | writing the Python themselves, because they've
00:43:58.800 | got machine learning experts on the team.
00:44:01.280 | And they're using us for their inference infrastructure
00:44:04.640 | effectively.
00:44:05.840 | So it's lots of different levels of sophistication,
00:44:08.080 | where some people are using these off-the-shelf models.
00:44:10.760 | Some people are fine-tuning models.
00:44:13.080 | Peter Levels is a great example, where a lot of his products
00:44:15.540 | are based off fine-tuning image models, for example.
00:44:19.420 | And then we've also got larger customers
00:44:21.120 | who are just using us as infrastructure,
00:44:23.060 | effectively, as servers.
00:44:25.760 | So yeah, it's all things up and down the stack.
00:44:30.000 | Let's talk a bit about Cog and the technical layer.
00:44:33.520 | So there are a lot of GPU clouds.
00:44:37.080 | I think people have different pricing points.
00:44:39.520 | And I think everybody tries to offer a different developer
00:44:41.940 | experience on top of it, which then lets you charge a premium.
00:44:46.080 | Why did you want to create Cog?
00:44:48.120 | What were some of the-- you worked at Docker.
00:44:49.960 | What were some of the issues with traditional container
00:44:52.240 | runtimes?
00:44:53.920 | And maybe, yeah, what were you surprised with as you built it?
00:44:57.080 | Cog came right from the start, actually,
00:44:58.920 | when we were thinking about this evaluation,
00:45:05.600 | the benchmarking system for machine learning researchers,
00:45:08.760 | where we wanted researchers to publish their models
00:45:11.520 | in a standard format that was guaranteed to keep on running,
00:45:16.000 | that you could replicate the results of.
00:45:17.920 | That's where the name came from.
00:45:19.640 | And we realized that we needed something like Docker
00:45:22.800 | to make that work.
00:45:24.920 | And I think it was just natural, from my point of view,
00:45:28.360 | obviously, that should be open source,
00:45:29.940 | that we should try and create some kind of open standard
00:45:32.080 | here that people can share.
00:45:33.200 | Because if more people use this format,
00:45:35.400 | then that's great for everyone involved.
00:45:38.560 | I think the magic of Docker is not really in the software.
00:45:41.560 | It's just the standard that people have agreed on.
00:45:44.640 | Here are a bunch of keys for a JSON document, basically.
00:45:49.000 | And that was the magic of the metaphor of real containerization
00:45:53.280 | as well.
00:45:53.760 | It's not the containers that are interesting.
00:45:55.640 | It's like the size and shape of the damn box.
00:45:59.540 | And it's a similar thing here, where really we just
00:46:01.280 | wanted to get people to agree on this is what
00:46:03.320 | a machine learning model is.
00:46:04.800 | This is how a prediction works.
00:46:07.760 | This is what the inputs are.
00:46:09.000 | This is what the outputs are.
00:46:10.480 | So Cog is really just a Docker container
00:46:13.120 | that attaches to a CUDA device, if it needs a GPU, that
00:46:17.400 | has a open API specification as a label on the Docker image.
00:46:21.920 | And the open API specification defines the interface
00:46:26.800 | for the machine learning model, like the inputs and outputs
00:46:32.440 | effectively, or the params in machine learning terminology.
00:46:36.680 | And we just wanted to get people to agree on this thing.
00:46:40.040 | And it's general purpose enough.
00:46:41.440 | We weren't saying-- some of the existing things
00:46:43.620 | were at the graph level.
00:46:45.200 | But we really wanted something general purpose
00:46:47.160 | enough that you could just put anything inside this.
00:46:48.680 | And it was future compatible.
00:46:49.980 | And it was just like arbitrary software.
00:46:51.900 | And it'd be future compatible with future inference servers
00:46:54.360 | and future machine learning model formats
00:46:56.080 | and all this kind of stuff.
00:46:57.760 | So that was the intent behind it.
00:47:00.000 | And it just came naturally that we
00:47:04.800 | wanted to define this format.
00:47:06.080 | And that's been really working for us.
00:47:08.520 | A bunch of people have been using Cog outside of Replicates,
00:47:11.240 | which is kind of our original intention.
00:47:13.080 | This should be how machine learning models are packaged
00:47:15.320 | and how people should use it.
00:47:16.520 | It's common to use Cog in situations
00:47:19.000 | where maybe they can't use the SAS service because they're
00:47:23.080 | in a big company.
00:47:23.800 | And they're not allowed to use a SAS service.
00:47:28.760 | But they can use Cog internally still.
00:47:30.300 | And they can download the models from Replicates
00:47:32.260 | and run them internally in their org, which
00:47:34.680 | we've been seeing happen.
00:47:35.720 | That works really well.
00:47:37.240 | People who want to build custom inference pipelines
00:47:39.440 | but don't want to reinvent the world,
00:47:40.980 | they can use Cog off the shelf and use
00:47:42.520 | it as a component in their inference pipelines.
00:47:45.720 | We've been seeing tons of usage like that.
00:47:48.900 | And it's just been kind of happening organically.
00:47:50.900 | We haven't really been trying.
00:47:52.080 | But it's there if people want it.
00:47:53.440 | And we've been seeing people use it.
00:47:54.980 | So that's great.
00:47:56.680 | And yeah, so a lot of it is just sort of philosophical.
00:48:00.360 | This is how it should work from my experience at Docker.
00:48:03.120 | And there's just a lot of value from the core being open,
00:48:05.640 | I think, and that other people can share it.
00:48:06.920 | And it's like an integration point.
00:48:08.380 | So if Replicate, for example, wanted
00:48:12.760 | to work with a testing system, like a CI system or whatever,
00:48:18.520 | we can just interface at the Cog level.
00:48:21.080 | That system just needs to put Cog models.
00:48:22.840 | And then you can test your models on that CI system
00:48:25.280 | before they get deployed to Replicate.
00:48:26.860 | And it's just a format that we can get everyone to agree on.
00:48:30.040 | What do you think, I guess, Docker got wrong?
00:48:33.280 | Because if I look at a Docker Compose and a Cog definition,
00:48:36.000 | first of all, the Cog is kind of like the Docker
00:48:38.800 | file plus the Compose.
00:48:40.800 | And Docker Compose are just exposing the services.
00:48:43.960 | And also, Docker Compose is very ports-driven,
00:48:47.600 | versus you have the actual predict,
00:48:51.040 | this is what you have to run.
00:48:53.120 | Yeah, any learnings and maybe tips for other people building
00:48:55.800 | container-based runtimes?
00:48:57.040 | Like, how much should you separate
00:49:00.080 | the API services versus the image building,
00:49:04.200 | or how much you want to build them together?
00:49:06.560 | I think it was coming from two sides.
00:49:09.640 | We were thinking about the design
00:49:11.860 | from the point of view of user needs.
00:49:14.800 | Like, what do users--
00:49:16.560 | what are their problems, and what problems
00:49:18.960 | can we solve for them?
00:49:20.360 | But also, what the interface should
00:49:22.040 | be for a machine learning model.
00:49:23.540 | And it's sort of the combination of two things
00:49:25.480 | that led us to this design.
00:49:27.880 | So the thing I talked about before
00:49:29.760 | was a little bit of the interface around the machine
00:49:32.000 | learning model.
00:49:32.600 | So we realized that we wanted it to be general purpose.
00:49:35.320 | We wanted it to be at the JSON human-readable things,
00:49:41.800 | rather than the tensor level.
00:49:44.400 | So it's like an open API specification that
00:49:46.240 | wrapped a Docker container.
00:49:47.360 | That's where that design came from.
00:49:49.200 | And it's really just a wrapper around Docker.
00:49:51.080 | So we were kind of building on--
00:49:52.920 | standing on shoulders there.
00:49:54.160 | But Docker's too low-level.
00:49:55.320 | So it's just like arbitrary software.
00:49:57.160 | So we wanted to be able to have a open API specification there
00:50:02.680 | that defined the function, effectively,
00:50:04.440 | that is the machine learning model,
00:50:06.200 | but also how that function is written,
00:50:09.800 | how that function is run, which is all defined in code,
00:50:12.080 | and stuff like that.
00:50:12.520 | So it's like a bunch of abstraction on top of Docker
00:50:15.040 | to make that work.
00:50:16.360 | And that's where that design came from.
00:50:18.600 | But the core problems we were solving for users
00:50:21.880 | was that Docker's really hard to use.
00:50:27.320 | And productionizing machine learning models
00:50:29.920 | is really hard.
00:50:31.600 | So on the first part of that, we knew
00:50:37.240 | we couldn't use Dockerfiles.
00:50:38.560 | Dockerfiles are hard enough for software developers to write.
00:50:41.080 | I'm saying this with love as somebody who works on Docker
00:50:43.400 | and works on Dockerfiles.
00:50:46.840 | But it's really hard to use.
00:50:48.200 | And you need to know a bunch about Linux, basically,
00:50:50.360 | because you're running a bunch of CLI commands.
00:50:52.360 | You need to know a bunch of Linux and best practices,
00:50:54.600 | how apt works, and all this kind of stuff.
00:50:56.480 | So we're like, OK, we can't get to that level.
00:50:58.200 | We need something that machine learning researchers will
00:50:59.960 | be able to understand.
00:51:00.880 | People who are used to Colab notebooks.
00:51:03.400 | And what they understand is they're like,
00:51:05.200 | I need this version of Python.
00:51:06.440 | I need these Python packages.
00:51:08.000 | And somebody told me to apt-get install something.
00:51:11.320 | You know?
00:51:11.820 | MARK MANDEL: And throw sudo in there when I don't really
00:51:14.120 | know what that means.
00:51:15.320 | So we tried to create a format that was at that level.
00:51:17.560 | And that's what cog.yaml is.
00:51:19.480 | And we're really kind of trying to imagine,
00:51:21.480 | what is that machine learning researcher
00:51:23.240 | going to understand and trying to build for them?
00:51:26.280 | And then the productionizing machine learning models thing
00:51:30.680 | is like, OK, how can we package up
00:51:33.360 | all of the complexity of productionizing machine
00:51:35.480 | learning models?
00:51:36.800 | Like picking CUDA versions, like hooking it up to GPUs,
00:51:41.040 | writing an inference server, defining a schema,
00:51:44.940 | doing batching, all of these just really gnarly things
00:51:48.840 | that everyone does again and again,
00:51:51.000 | and just provide that as a tool.
00:51:56.040 | And that's where that side of it came from.
00:51:58.600 | So it's like combining those user needs
00:52:00.400 | with the world need of needing a common standard for what
00:52:06.200 | a machine learning model is.
00:52:07.440 | And that's how we thought about the design.
00:52:09.200 | I don't know whether that answers the question.
00:52:10.560 | FRANCESC CAMPOY: Yeah.
00:52:10.820 | So your idea was like, hey, you really
00:52:12.880 | want what Docker stands for in terms of standard,
00:52:17.080 | but you actually don't want people
00:52:18.800 | to do all the work that goes into Docker.
00:52:21.200 | MARK MANDEL: It needs to be higher level.
00:52:24.240 | FRANCESC CAMPOY: So I want to, for the listener,
00:52:26.680 | you're not the only standard that is out there.
00:52:28.600 | As with any standard, there must be 14 of them.
00:52:31.960 | You are surprisingly friendly with Olama,
00:52:34.040 | who is your former colleagues from Docker, who
00:52:36.840 | came out with the model file.
00:52:38.680 | Mozilla came out with the Lama file.
00:52:41.040 | And then I don't know if this is in the same category even,
00:52:43.640 | but I'm just going to throw it in there.
00:52:44.520 | Like Hugging Face has the Transformers and Diffusers
00:52:46.480 | library, which is a way of disseminating models
00:52:48.480 | that, obviously, people use.
00:52:51.080 | How would you compare your contrast, your approach
00:52:53.320 | of Cog versus all these?
00:52:54.520 | MARK MANDEL: It's kind of complementary, actually,
00:52:56.560 | which is kind of neat.
00:52:57.520 | It's a lot of--
00:52:59.640 | Transformers, for example, is lower level than Cog.
00:53:01.920 | So it's a Python library, effectively.
00:53:05.360 | But you still need to--
00:53:07.440 | FRANCESC CAMPOY: Expose them.
00:53:08.400 | MARK MANDEL: Yeah, you still need
00:53:09.000 | to turn that into an inference server.
00:53:10.240 | You still need to install the Python packages
00:53:11.880 | and that kind of thing.
00:53:12.800 | So lots of Replicate models are Transformers models
00:53:17.800 | and Diffusers models inside Cog.
00:53:20.520 | So that's the level that that sits.
00:53:22.320 | So it's very complementary in some sense.
00:53:24.040 | And we're kind of working on integration with Hugging Face
00:53:26.560 | such that you can deploy models from Hugging Face
00:53:29.020 | into Cog models and stuff like that and to Replicate.
00:53:32.840 | And so some of these things, like Llamafile
00:53:38.320 | and what Llama are working on, are also very complementary
00:53:41.280 | in that they're doing a lot of the running these things
00:53:46.880 | locally on laptops, which is not a thing that
00:53:50.280 | works very well with Cog.
00:53:51.480 | Cog is really designed around servers
00:53:53.400 | and attaching to CUDA devices and NVIDIA GPUs
00:53:56.480 | and this kind of thing.
00:53:58.160 | So we're trying to figure out-- we're actually
00:54:03.720 | figuring out ways that those things can
00:54:06.080 | be interoperable because they should be.
00:54:09.000 | And I think they are quite complementary
00:54:11.580 | in that you should be able to take a model and Replicate
00:54:13.880 | and run it on your local machine.
00:54:14.840 | You should be able to take a model on your local machine
00:54:17.140 | and run it in the cloud.
00:54:18.280 | So, yeah.
00:54:19.480 | FRANCESC CAMPOY: Is the base layer something like--
00:54:22.720 | is it at the GGUF level?
00:54:24.680 | Which, by the way, I need to get a primer
00:54:26.920 | on the different formats that have emerged.
00:54:29.800 | Or is it at the star.file level, which
00:54:32.760 | is model file, Llamafile, whatever?
00:54:34.980 | Or is it at the Cog level?
00:54:36.760 | I don't know, to be honest.
00:54:37.960 | And I think this is something we still
00:54:39.540 | have to figure out.
00:54:41.200 | I think there's a lot yet.
00:54:42.960 | Exactly where those lines are drawn, I don't know exactly.
00:54:45.440 | I think this is something we're trying to figure out ourselves.
00:54:47.960 | But I think there's certainly a lot of promise
00:54:50.000 | about these systems interoperating.
00:54:51.880 | I think we just want things to work together.
00:54:54.000 | We want to try and reduce the number of standards
00:54:56.080 | so the more these things can interoperate and convert
00:54:58.880 | between each other and that kind of stuff at the minute.
00:55:01.160 | FRANCESC CAMPOY: Andreas comes out of Spotify.
00:55:03.360 | Eric from Modo also comes out of Spotify.
00:55:07.680 | You work at Docker and the Llama guys work at Docker.
00:55:13.000 | Did you know that these ideas were--
00:55:15.120 | did both you and Andreas know that there
00:55:16.840 | was somebody else you work with that
00:55:18.480 | had a kind of like similar-- not similar idea,
00:55:20.880 | but was interested in the same thing?
00:55:22.680 | Or did you then just say, oh, I know those people.
00:55:26.160 | They're doing something very similar.
00:55:28.320 | We learned about both early on, actually.
00:55:31.000 | Because we know them both quite well.
00:55:33.480 | And it's funny how I think we're all seeing the same problems
00:55:36.120 | and just applying, trying to fix the same problems that we're
00:55:39.720 | all seeing.
00:55:40.720 | I think the Llama one's particularly
00:55:42.720 | funny because I joined Docker through my startup.
00:55:48.300 | Funnily, actually, the thing which worked from my startup
00:55:50.640 | was Compose, but we were actually
00:55:52.400 | working on another thing, which was a bit like EC2 for Docker.
00:55:55.640 | So we were working on productionizing Docker
00:55:57.440 | containers, and our Llama was working
00:56:01.400 | on a thing called Kitematic, which was a bit
00:56:03.960 | like a desktop app for Docker.
00:56:08.760 | And our companies both got bought by Docker
00:56:10.560 | at the same time.
00:56:12.640 | And Kitematic turned into Docker Desktop,
00:56:15.880 | and then our thing then turned into Compose.
00:56:19.400 | And it's funny how we're both applying the things we saw
00:56:23.480 | at Docker to the AI world, where they're
00:56:25.640 | building the local environment for us,
00:56:28.000 | and we're building the cloud for it.
00:56:30.960 | And yeah, so that's just really pleasing.
00:56:33.640 | And I think we're collaborating closely
00:56:36.460 | because there's just so much opportunity for working there.
00:56:39.720 | FRANCESC CAMPOY: When you have a hammer, everything's a nail.
00:56:42.260 | MARK MANDEL: Yeah, exactly, exactly.
00:56:43.840 | And I think a lot of--
00:56:46.560 | this is-- I mean, where we're coming from a lot with AI
00:56:50.560 | is we're taking a lot of things that--
00:56:52.880 | because we're all kind of, on the Replicator team,
00:56:55.680 | we're all kind of people who have built developer
00:56:57.960 | tools in the past.
00:56:58.720 | So we've got a team--
00:56:59.600 | like, I worked at Docker.
00:57:01.240 | We've got people who worked at Heroku, and GitHub,
00:57:04.000 | and the iOS ecosystem, and all this kind of thing.
00:57:07.160 | Like, the previous generation of developer tools,
00:57:10.600 | where we figured out a bunch of stuff,
00:57:13.320 | and then AI has come along.
00:57:14.960 | And we just don't yet have those tools and abstractions
00:57:18.840 | to make it easy to use.
00:57:20.440 | So we're trying to take the lessons
00:57:22.080 | that we learned from the previous generation of stuff
00:57:24.440 | and apply it to this new generation of stuff.
00:57:26.840 | And obviously, there's a bit of nuance there,
00:57:28.720 | because the trick is to take the right lessons
00:57:30.960 | and do new stuff where it makes sense.
00:57:33.320 | You can't just cut and paste, you know?
00:57:36.280 | But that's how we're approaching this,
00:57:38.120 | is we're trying to, as much as possible,
00:57:40.280 | take some of those lessons we learned from how Heroku and
00:57:44.280 | GitHub was built, for example, and apply them to AI.
00:57:48.960 | FRANCESC CAMPOY: Excellent.
00:57:50.200 | We should also talk a little bit about your compute
00:57:54.360 | availability.
00:57:55.040 | We're trying to ask this of all--
00:57:56.420 | it's Compute Provider Month.
00:57:58.560 | Do you own your own GPUs?
00:57:59.960 | How many do you have access to?
00:58:02.080 | What do you feel about the tightness of the GPU market?
00:58:05.800 | ALEX DANILO: We don't own our own GPUs.
00:58:07.440 | We've got a few that we play around with,
00:58:09.160 | but not for production workloads.
00:58:11.620 | And we are primarily built on just public clouds,
00:58:14.560 | so primarily GCP and CoreWeave, and some smatterings elsewhere.
00:58:18.960 | And--
00:58:21.360 | FRANCESC CAMPOY: Not from NVIDIA, which is your newest
00:58:23.560 | investor?
00:58:24.160 | ALEX DANILO: We work with NVIDIA.
00:58:25.720 | So they're kind of helping us get GPU availability.
00:58:29.400 | GPUs are hard to get hold of if you go to AWS
00:58:38.880 | and ask for one A100, they won't give you an A100.
00:58:42.480 | But if you go to AWS and say, I would like 100 A100s in two
00:58:45.080 | years, they're like, sure, we've got some.
00:58:47.760 | I think the problem is the cloud providers.
00:58:50.600 | The cloud providers, that makes sense from their point of view.
00:58:54.440 | They want just reliable, sustained usage.
00:58:57.160 | They don't want spiky usage and wastage
00:58:59.160 | in their infrastructure, which makes total sense.
00:59:01.280 | But that makes it really hard for startups
00:59:04.080 | who are wanting to just get a hold of GPUs.
00:59:06.380 | I think we're in a fortunate position
00:59:07.880 | where we can aggregate demand, so we can make commits
00:59:11.040 | to cloud providers.
00:59:12.920 | And then we actually have good availability.
00:59:16.600 | It's not-- we don't have infinite availability,
00:59:20.720 | obviously, but if you want an A100 from Replicate,
00:59:22.840 | you can get it.
00:59:23.480 | But we're seeing other companies pop up as well.
00:59:30.040 | I guess SfCompute's a great example of this,
00:59:31.880 | where they're doing the same idea for training almost,
00:59:34.120 | where a lot of startups need to be able to train a model,
00:59:37.560 | but they can't get hold of GPUs from large cloud providers.
00:59:39.980 | So SfCompute is letting people rent 10 H100s for two days,
00:59:44.760 | which is just impossible otherwise.
00:59:46.200 | And what they're effectively doing there
00:59:47.880 | is they're aggregating demand such that they can make
00:59:49.880 | a big commit to the cloud provider
00:59:50.960 | and then let people use smaller chunks of it.
00:59:52.920 | And that's what we're doing with Replicate as well,
00:59:54.540 | where we're aggregating demand such that we make big commits
00:59:57.080 | to the cloud providers.
00:59:58.280 | And then people can run a 100-millisecond API request
01:00:03.080 | on an A100.
01:00:04.200 | FRANCESC CAMPOY: Coming from a finance background,
01:00:06.280 | this sounds surprisingly similar to banks,
01:00:08.900 | where the job of a bank is maturity transformation,
01:00:12.400 | is what you call it.
01:00:14.040 | You take short-term deposits, which technically
01:00:16.000 | can be withdrawn at any time, and you turn that
01:00:17.920 | into long-term loans for mortgages and stuff.
01:00:20.440 | And you pocket the difference in interest.
01:00:22.560 | And that's the bank.
01:00:24.000 | MARK MANDEL: Yeah, that's exactly what we're doing.
01:00:26.080 | FRANCESC CAMPOY: So you run a bank.
01:00:27.080 | MARK MANDEL: Yeah, a GPU bank.
01:00:28.360 | Yeah, and it's so much a finance problem
01:00:31.000 | as well, because we have to make bets on the future demand
01:00:35.240 | and the value of GPUs.
01:00:37.560 | FRANCESC CAMPOY: What are you--
01:00:39.200 | OK, I don't know how much you can disclose,
01:00:41.000 | but what are you forecasting?
01:00:43.520 | Up, down?
01:00:44.840 | Up a lot?
01:00:45.720 | Up 10x?
01:00:46.680 | MARK MANDEL: I can't really--
01:00:48.120 | we're projecting our growth with some educated guesses
01:00:50.640 | about what kind of models are going to come out
01:00:52.600 | and what kind of models these will run.
01:00:54.720 | So we need to bet that, OK, maybe language
01:00:58.160 | models are getting larger.
01:00:59.200 | So we need to have GPUs with a lot of RAM, or multi-GPU nodes,
01:01:03.220 | or maybe models are getting smaller.
01:01:04.720 | We actually need smaller GPUs.
01:01:06.040 | We have to make some educated guesses
01:01:07.280 | about that kind of stuff.
01:01:08.360 | FRANCESC CAMPOY: Yeah.
01:01:09.280 | Speaking of which, the mixture of experts' models
01:01:11.760 | must be throwing a spanner into the planning.
01:01:15.800 | MARK MANDEL: Not so much.
01:01:16.840 | I mean, we've got multi-node A100 machines,
01:01:20.320 | which can run this, and multi-node H100 machines,
01:01:22.400 | which can run those, no problem.
01:01:23.740 | So we're set up for that world.
01:01:30.440 | FRANCESC CAMPOY: OK, I didn't expect it to be so easy.
01:01:33.920 | My impression was that the amount of RAM per model
01:01:37.280 | is increasing a lot, especially on a sort of per parameter
01:01:40.960 | basis, per active parameter basis,
01:01:43.640 | going from Mixed Trial being eight experts to the Deep
01:01:47.840 | Seek MOE models--
01:01:48.740 | I don't know if you saw them--
01:01:50.040 | being like 30, 60 experts.
01:01:53.480 | And you can see it keep going up, I guess.
01:01:55.360 | MARK MANDEL: I think we might run into problems at some point.
01:01:58.200 | And yeah, I don't know exactly what's going on there.
01:02:04.080 | I think something that we're finding, which is kind of
01:02:06.600 | interesting-- like, I don't know this in depth.
01:02:10.440 | But we're certainly seeing a lot of good results
01:02:16.000 | from lower-precision models.
01:02:19.920 | So 90% of the performance with just much less RAM required.
01:02:25.840 | And that means that we can run them on GPUs we have available.
01:02:30.720 | And it's good for customers as well, because it runs faster.
01:02:33.400 | And they want that trade-off of where it's just slightly worse,
01:02:37.080 | but way faster and cheaper.
01:02:39.480 | FRANCESC CAMPOY: Do you see a lot of GPU waste
01:02:41.760 | in terms of people running the thing on a GPU that
01:02:44.560 | is too advanced?
01:02:45.440 | I think we use a T4 to run Whisper.
01:02:48.360 | So we're at the bottom end of it.
01:02:51.280 | Yeah, any thoughts?
01:02:52.080 | I think at one of the hackathons we were at,
01:02:54.920 | people were like, oh, how do I get access to like H100s?
01:02:57.880 | And it's like, you need to run [INTERPOSING VOICES]
01:03:00.440 | It's like, you don't need an H100.
01:03:02.320 | Yeah, yeah.
01:03:03.560 | Well, if you want low latency, sure.
01:03:06.840 | Like, spend a lot of money on an H100.
01:03:09.840 | Yeah, we see a ton of that kind of stuff.
01:03:11.560 | And it's surprisingly hard to optimize these models right now.
01:03:19.360 | So a lot of people are just running really
01:03:21.160 | unoptimized models.
01:03:22.120 | We're doing the same, honestly.
01:03:23.420 | Like, we're a lot of models on Replicate
01:03:25.080 | have just not been optimized very well.
01:03:28.200 | So something we want to be able to help people with
01:03:31.600 | is optimizing those models.
01:03:33.840 | Like, either we show people how to with guides,
01:03:37.960 | or we make it easier to use some of these more optimized
01:03:41.000 | inference servers, or we show people
01:03:43.200 | how to compile the models, or we do that automatically,
01:03:46.520 | or something like that.
01:03:47.600 | But that's only something we're exploring.
01:03:49.340 | Like, there's so much wastage.
01:03:50.780 | Like, it's not just wasting the GPUs.
01:03:52.520 | It's also a bad experience, and the models run slow.
01:03:55.440 | So a lot of the models on Replicate--
01:03:57.560 | some of the most popular models on Replicate we have--
01:04:00.640 | so the models on Replicate are almost all
01:04:03.840 | pushed by our community.
01:04:05.280 | Like, people have pushed those models themselves.
01:04:07.600 | But it's like a big head of distribution,
01:04:09.560 | where there's like a long tail of lots of models
01:04:11.560 | that people have pushed, and then a big head of the models
01:04:15.160 | most people run.
01:04:16.520 | So models like Llama 2, like Stable Diffusion,
01:04:23.460 | we work with Meta and Stability to maintain those models.
01:04:26.480 | And we've done a ton of optimization
01:04:28.020 | to make those really fast.
01:04:29.660 | So yeah, those models are optimized,
01:04:31.820 | but the long tail is not.
01:04:32.900 | And there's a lot of wastage there.
01:04:35.260 | And going into the-- well, it's already the new year.
01:04:38.620 | Do you see the customer demand and the GPU hardware
01:04:42.660 | demand kind of staying together?
01:04:44.300 | Because I think a lot of people are saying,
01:04:46.100 | there's like hundreds of thousands of GPUs
01:04:48.140 | being shipped this year.
01:04:49.140 | Like, the crunch is going to be over.
01:04:51.220 | But you also have millions of people
01:04:52.720 | that now care about using AI.
01:04:55.220 | How do you see the two lines progressing?
01:04:57.100 | Are you seeing customer demand is
01:04:59.900 | going to outpace the GPU growth?
01:05:01.460 | Do you see them together?
01:05:02.820 | Do you see maybe a lot of this model improvement work
01:05:06.900 | kind of helping alleviate that?
01:05:09.420 | From our point of view, demand is not
01:05:11.060 | outpacing supply of GPUs.
01:05:12.740 | We have enough-- from my point of view,
01:05:14.400 | we have enough GPUs to go around.
01:05:15.780 | And that might change, for sure.
01:05:18.180 | Yeah.
01:05:18.680 | That's a very nicely put way, as a startup founder, to respond.
01:05:25.460 | Yeah, I'll maybe get into a little bit of this on the--
01:05:28.320 | you said optimizing models.
01:05:29.500 | Actually, so when Alessio talked about GPU waste, he was more--
01:05:33.540 | oh, that's you.
01:05:34.380 | Sorry.
01:05:36.020 | Yeah, it is getting a little bit warm in here,
01:05:38.260 | some greenhouse gas effect.
01:05:40.740 | So Alessio framed it more as sort
01:05:42.660 | of picking the wrong box model, whereas yours
01:05:44.780 | is more about maybe the inference stack,
01:05:48.340 | if you can call it.
01:05:49.380 | Were you referencing VLM?
01:05:52.100 | What other sort of techniques are you referencing?
01:05:55.100 | And also keeping in mind that when
01:05:57.340 | I talk to your competitors, and I don't know if--
01:05:59.940 | we don't have to name any of them,
01:06:01.360 | but they are working on trying to optimize
01:06:03.340 | the kinds of models.
01:06:04.460 | Basically, they'll quantize their models for you
01:06:06.740 | with their special stack.
01:06:08.200 | So you basically use their versions of LamaTool.
01:06:11.420 | You use their versions of Mistral.
01:06:13.260 | And that's one way to approach it.
01:06:16.180 | I don't see it as the Replicate DNA to do that,
01:06:18.460 | because that would be sort of--
01:06:20.140 | you would have to slap the Replicate house brand
01:06:22.180 | on something, which--
01:06:23.820 | I mean, just comment on any of that.
01:06:25.380 | Like, what do you mean when you say optimize models?
01:06:27.700 | Yeah, I mean, things like quantizing the models,
01:06:30.240 | you can imagine a way that we could help people quantize
01:06:32.580 | their models if we want to.
01:06:38.140 | We've had success using inference servers like VLM
01:06:43.700 | and TRT-LLM, and we're using those kind of things
01:06:47.220 | to serve language models.
01:06:48.980 | We've had success with things like AI templates, which
01:06:52.180 | compile the models, all of those kind of things.
01:06:57.340 | And there's some even really just boring things
01:06:59.380 | of just making the code more efficient.
01:07:02.860 | Like, some people, when they're just writing some Python code,
01:07:05.780 | it's really easy to just write an efficient Python code.
01:07:09.140 | There's really boring things like that as well.
01:07:11.580 | But it's like a whole smash of things like that.
01:07:14.220 | FRANCESC CAMPOY: So you will do that for a customer?
01:07:16.380 | You look at their code and--
01:07:17.620 | MARK MANDEL: Yeah, we've certainly
01:07:18.700 | helped some of our customers be able to do that some stuff.
01:07:21.140 | That's some stuff, yeah.
01:07:22.080 | And a lot of the models on--
01:07:23.820 | like, the popular models on Replicate,
01:07:25.460 | we've rewritten them to use that stuff as well.
01:07:28.860 | And the stable diffusion that we run, for example,
01:07:31.260 | is compiled with AI template to make it super fast.
01:07:34.260 | And it's all open source, that you
01:07:35.720 | can see all of this stuff on GitHub
01:07:37.220 | if you want to see how we do it.
01:07:40.420 | But you can imagine ways that we could help people.
01:07:43.340 | It's almost like built into the Cog layer maybe,
01:07:45.380 | where we could help people use these fast inference servers
01:07:48.420 | or use AI template to compile their models to make it faster.
01:07:51.980 | Whether it's manual, semi-manual, or automatic,
01:07:54.480 | we're not really sure.
01:07:55.360 | But that's something we want to explore,
01:07:57.020 | because that benefits everyone.
01:07:58.480 | FRANCESC CAMPOY: Yeah, awesome.
01:07:59.780 | And then on the competitive piece,
01:08:02.060 | there was a price war on McStraw last year, this last December.
01:08:06.620 | As far as I can tell, you guys did not enter that war.
01:08:09.780 | You have McStraw, but it's just regular pricing.
01:08:13.940 | I think also some of these players
01:08:16.660 | are probably losing money on their pricing.
01:08:20.260 | You don't have to say anything, but the break-even
01:08:23.020 | is somewhere between $0.50 to $0.75 per million tokens served.
01:08:28.500 | How are you thinking about just the overall competitiveness
01:08:31.020 | in the market?
01:08:32.340 | How should people choose when everyone's an API?
01:08:36.420 | So for Lama2 and McStraw--
01:08:41.540 | I think not McStraw, but I can't remember exactly--
01:08:44.340 | we have similar performance and similar price
01:08:48.420 | to some of these other services.
01:08:50.540 | We're not bargain basement to some of the others,
01:08:54.220 | because to your point, we don't want to burn tons of money.
01:08:58.780 | But we're pricing it sensibly and sustainably to a point
01:09:02.940 | where we think it's competitive with other people, such
01:09:05.980 | that-- the thing we don't want--
01:09:08.260 | we want developers using Replicate.
01:09:10.700 | And we don't want to price it such that it's only
01:09:14.740 | affordable by big companies.
01:09:16.120 | We want to make it cheap enough such
01:09:18.340 | that the developers can afford it.
01:09:19.740 | But we also don't want the super cheap prices,
01:09:22.700 | because then it's almost like then your customers are hostile.
01:09:26.780 | And the more customers you get, the worse it gets.
01:09:29.980 | So we're pricing it sensibly, but still to the point
01:09:33.020 | where hopefully it's cheap enough to build on.
01:09:39.020 | And I think the thing we really care about--
01:09:43.700 | obviously, we want models and Replicate
01:09:45.980 | to be comparable to other people.
01:09:48.100 | But I think the really crucial thing about Replicate
01:09:50.460 | and the way I think we think about it
01:09:52.260 | is that it's not just the API for them.
01:09:55.120 | Particularly in open source, it's
01:09:56.460 | not just the API for the model that is the important bit.
01:09:59.260 | Because quite often with open source models,
01:10:03.900 | the whole point of open source is that you can tinker on it,
01:10:05.860 | and you can customize it, and you can fine tune it,
01:10:07.940 | and you can smush it together with another model,
01:10:10.180 | like Lava, for example.
01:10:13.020 | And you can't do that if it's just a hosted API,
01:10:15.620 | because you can't touch the code.
01:10:21.900 | So what we want to do with Replicate
01:10:26.700 | is build a platform that's actually open.
01:10:29.260 | So we've got all of these models where the performance and price
01:10:32.940 | is on par with everything else.
01:10:35.220 | But if you want to customize it, you can fine tune it.
01:10:37.540 | You can go to GitHub and get the source code for it,
01:10:39.660 | and edit the source code, and push up your own custom
01:10:41.260 | version, and this kind of thing.
01:10:42.980 | Because that's the crucial thing for open source machine
01:10:47.500 | learning, is be able to tinker on it and customizing it.
01:10:50.180 | And we think that's really important to make
01:10:55.180 | open source AI work.
01:10:58.020 | You mentioned open source.
01:10:59.820 | How do you think about levels of openness?
01:11:01.820 | When Lama 2 came out, I wrote a post about this.
01:11:05.620 | It's like open source, and there's open weights,
01:11:07.940 | then there's restrictive weights.
01:11:09.860 | It was on the front page of "Hacker News,"
01:11:11.620 | so there was all sort of comments from people.
01:11:14.140 | So I'm always curious to hear your thoughts.
01:11:16.740 | What do you think is OK for people to license?
01:11:20.500 | What's OK for people to not release?
01:11:23.740 | Yeah.
01:11:24.500 | Yeah, we're seeing-- I mean, before, it
01:11:26.620 | was just like closed source, big models,
01:11:29.060 | open source, little models, purely open source stuff.
01:11:33.660 | And we're now seeing lots of variations
01:11:35.780 | where model companies putting restrictive licenses
01:11:40.060 | on their models.
01:11:41.980 | That means it can only be used for non-commercial use.
01:11:44.460 | And a lot of the open source crowd
01:11:47.940 | is complaining it's not true open source,
01:11:50.620 | all this kind of thing.
01:11:52.420 | And I think a lot of that is coming from philosophy,
01:11:56.180 | the sort of free software movement kind of philosophy.
01:11:59.260 | And I don't think it's necessarily a bad thing.
01:12:01.940 | I think it's good that model companies can make money out
01:12:04.500 | of their models.
01:12:05.340 | That's how this model incentivized people
01:12:07.420 | to make more models and this kind of thing.
01:12:09.180 | And I think it's totally fine if somebody made something
01:12:11.540 | to ask for some money in return if you're
01:12:13.820 | making money out of it.
01:12:14.780 | And I think that's totally OK.
01:12:16.140 | And I think there's some really interesting midpoints, as well,
01:12:18.720 | where people are releasing the code.
01:12:20.220 | So you can still tinker on it.
01:12:21.460 | But the person who trained the model
01:12:23.140 | still wants to get a cut of it if you're making
01:12:25.140 | a bunch of money out of it.
01:12:26.260 | And I think that's good.
01:12:28.300 | And that's going to make the ecosystem more sustainable.
01:12:31.600 | And I think we're just going to see--
01:12:33.220 | I don't think anybody's really figured it out yet.
01:12:34.780 | And we're going to see more experimentation with this
01:12:37.420 | and more people try to figure out, hmm,
01:12:39.780 | what are the business models around building models?
01:12:42.020 | And how can I make money out of this?
01:12:43.900 | And we'll just see where it ends up.
01:12:45.860 | And I think it's something we want to support as Replicate,
01:12:49.420 | as well, because we believe in open source.
01:12:51.980 | We think it's great.
01:12:53.140 | But there's also going to be lots of models which
01:12:56.820 | are closed source, as well.
01:12:57.980 | And these companies might not be--
01:13:00.240 | there's probably going to be a long tail
01:13:01.940 | of a bunch of people building models that don't have
01:13:04.700 | the reach that OpenAI have.
01:13:06.620 | And hopefully, as Replicate, we can
01:13:08.380 | help those people find developers
01:13:10.980 | and help them make money and that kind of thing.
01:13:13.460 | I think the compute requirements of AI kind of changed the thing.
01:13:16.860 | I started an open source company.
01:13:18.420 | I'm a big open source fan.
01:13:19.780 | And before, it was kind of man hours was really
01:13:23.060 | all that went into open source.
01:13:24.300 | It wasn't much monetary investment.
01:13:27.340 | Well, not that man hours are not worth a lot.
01:13:30.260 | But if you think about Llama 2, it's like $25 million all in.
01:13:36.420 | It's like you can't just spin up a Discord
01:13:38.620 | and spend $25 million.
01:13:40.140 | So I think it's net positive for everybody
01:13:43.100 | that Llama 2 is open source.
01:13:44.340 | And, well, is the open source term--
01:13:48.860 | I think people, like you're saying,
01:13:50.460 | they kind of argue on the semantics of it.
01:13:53.740 | But all we care about is that Llama 2 is open.
01:13:56.620 | Because if Llama 2 wasn't open source today,
01:13:59.180 | if Mistral was not open source, we would be in a bad spot.
01:14:03.100 | And I think the nuance here is making sure
01:14:05.060 | that these models are still tinkerable.
01:14:06.860 | Because the beautiful thing about Llama 2 as a base model
01:14:11.260 | is that, yeah, it costs $25 million to train to start with.
01:14:14.260 | But then you can fine tune it for like $50.
01:14:17.700 | And that's what's so beautiful about the open source ecosystem
01:14:21.340 | and something I think is really surprising as well.
01:14:23.420 | It completely surprised me.
01:14:25.500 | I think a lot of people assumed that it's not
01:14:30.220 | going to be-- open source machine learning is just not
01:14:32.660 | going to be practical because it's so
01:14:33.740 | expensive to train these models.
01:14:35.080 | But fine tuning is unreasonably effective.
01:14:37.540 | And people are getting really good results out of it.
01:14:39.740 | And it's really cheap.
01:14:40.900 | So people can effectively create open source models really
01:14:46.000 | cheaply.
01:14:46.580 | And there's going to be this sort of ecosystem
01:14:49.180 | of tons of models being made.
01:14:50.860 | And I think the risk there from a licensing point of view
01:14:53.260 | is we need to make sure that the licenses let people do that.
01:14:56.380 | Because if you release a big model
01:14:58.020 | under a non-commercial license and people can't fine tune it,
01:15:00.860 | you've lost the magic of it being open.
01:15:03.460 | And I'm sure there are ways to structure that such
01:15:05.500 | that the person paying $25 million
01:15:07.020 | feels like they're compensated somehow
01:15:10.220 | and they can feel like they should keep on training models.
01:15:14.300 | And people can keep on fine tuning it.
01:15:15.900 | But I guess we just have to figure out
01:15:17.620 | exactly how that plays out.
01:15:20.060 | FRANCESC CAMPOY: Excellent.
01:15:21.380 | So just wanted to round it out.
01:15:23.340 | You've been an excellent, very open guest so far.
01:15:28.620 | I actually kind of--
01:15:29.580 | I should have started my intro with this.
01:15:31.700 | But I feel like you found the AI engineer crew before I did.
01:15:36.140 | And something that really resonated
01:15:37.940 | with you in the Series B announcement
01:15:40.480 | was that you put in some stats here
01:15:42.260 | about how there are two orders of magnitude more software
01:15:45.300 | engineers than there are machine learning engineers, about 30
01:15:47.800 | million software engineers and 500,000 machine learning
01:15:50.060 | engineers.
01:15:50.900 | You can maybe plus/minus one of those orders of magnitude,
01:15:53.360 | but it's around that ballpark.
01:15:54.700 | And so obviously, there will be a lot more AI engineers
01:15:57.280 | than there will be ML engineers.
01:15:59.460 | How do you see this group?
01:16:01.540 | Like, is it all software engineers?
01:16:03.620 | Are they going to specialize?
01:16:06.940 | What would you advise someone trying
01:16:08.940 | to become an AI engineer?
01:16:10.620 | Is this a legitimate career path?
01:16:14.020 | Yeah, absolutely.
01:16:14.980 | I mean, it's very clear that AI is
01:16:18.580 | going to be a large part of how we build software in the future
01:16:21.540 | It's a bit like being a software developer in the '90s
01:16:24.340 | and ignoring the internet.
01:16:26.580 | You just need to learn about this stuff.
01:16:28.660 | You need to figure this stuff out.
01:16:31.060 | I don't think it needs to be--
01:16:33.260 | you don't need to be super low level.
01:16:36.700 | You don't need to be like--
01:16:38.300 | the metaphor here is that you don't
01:16:40.540 | need to be digging down into this sort of PyTorch level
01:16:45.280 | if you don't want to.
01:16:46.880 | In the same way as a software engineer in the '90s,
01:16:49.700 | you don't need to be understanding
01:16:51.260 | how network stacks work to be able to build a website.
01:16:53.380 | But you need to understand the shape of this thing
01:16:55.000 | and how to hold it and what it's good at and what it's not.
01:16:57.780 | And that's really important.
01:17:02.340 | So yeah, I certainly just advise people
01:17:04.580 | to just start playing around with it.
01:17:06.260 | Get a feel of how language models work.
01:17:08.340 | Get a feel of how these diffusion models work.
01:17:11.900 | Get a feel of what fine tuning is
01:17:14.620 | and how it works, because some of your job
01:17:17.500 | might be building data sets.
01:17:18.940 | Get a feeling of how prompting works,
01:17:20.500 | because some of your job might be writing a prompt.
01:17:22.620 | And those are just all really important skills
01:17:26.340 | to sort of figure out.
01:17:29.900 | Well, thanks for building the definitive platform
01:17:32.180 | for doing all that.
01:17:33.620 | Yeah, of course.
01:17:34.980 | Any final call to actions?
01:17:37.020 | Who should come work at Replicate?
01:17:39.460 | Yeah, anything for the audience?
01:17:41.740 | Yeah, I mean, we're hiring.
01:17:43.900 | If you click on Jobs at the bottom of replicate.com,
01:17:47.780 | there's some jobs.
01:17:50.380 | And I just encourage you to just try out AI,
01:17:54.100 | even if you think you're not smart enough.
01:17:56.260 | Like, the whole reason I started this company
01:17:58.100 | is because I was looking at the cool stuff
01:17:59.560 | that Andreas was making.
01:18:00.320 | Like, Andreas is like a proper machine learning person
01:18:02.560 | with a PhD.
01:18:03.660 | And I was just like a sort of lonely software engineer.
01:18:07.180 | And I was like, you're doing really cool stuff,
01:18:09.140 | and I want to be able to do that.
01:18:11.380 | And by us working together, we've
01:18:13.740 | now made it accessible to dummies like me.
01:18:17.260 | And I just encourage anyone who wants to try this stuff out,
01:18:20.700 | just give it a try.
01:18:21.580 | And I think I would also encourage
01:18:24.020 | people who are tool builders.
01:18:25.660 | Like, the limiting factor now on AI is not like the technology.
01:18:29.460 | Like, the technology has made incredible advances.
01:18:31.980 | And there's just so many incredible machine learning
01:18:34.980 | models that can do a ton of stuff.
01:18:37.500 | The limiting factor is just like making that accessible
01:18:40.380 | to people who build products.
01:18:41.900 | Because it's really hard to use this stuff right now.
01:18:44.300 | And obviously, we're building some of that stuff
01:18:45.860 | as Replicate.
01:18:46.380 | But there's just like a ton of other tooling and abstractions
01:18:49.180 | that need to be built out to make this stuff usable.
01:18:51.580 | So I just encourage people who like building developer tools
01:18:54.620 | to just like get stuck into it as well.
01:18:56.220 | Because that's going to make this stuff accessible
01:18:58.260 | to everyone.
01:18:58.900 | FRANCESC CAMPOY: Yeah.
01:18:59.820 | I especially want to highlight you have a Hacker-in-Residence
01:19:02.380 | job opening available, which not every company has,
01:19:04.580 | which means just join you and hack stuff.
01:19:07.380 | I think Charlie Holtz is doing a fantastic job of that.
01:19:09.660 | CHRIS BANES: Yep.
01:19:10.380 | Effectively, most of our--
01:19:12.060 | a lot of our job is just like showing people how to use AI.
01:19:15.660 | So we've just got a team of software developers.
01:19:17.700 | And people have kind of figured this stuff out
01:19:19.820 | who are writing about it, who are making videos about it,
01:19:25.340 | who are making example applications
01:19:26.740 | to show people what you can do with this stuff.
01:19:28.700 | FRANCESC CAMPOY: Yeah.
01:19:29.240 | In my world, that used to be called DevRel.
01:19:31.440 | But now it's Hacker-in-Residence.
01:19:33.620 | And that's--
01:19:34.460 | [LAUGHTER]
01:19:34.960 | CHRIS BANES: Yeah, this came from--
01:19:37.540 | Zeke is another one of our hackers.
01:19:40.680 | FRANCESC CAMPOY: Tell me this came from Chroma.
01:19:42.980 | To start that one.
01:19:43.780 | CHRIS BANES: We developed--
01:19:45.220 | Antoine actually was like, hey, we came up with that first.
01:19:47.900 | But I think we came up with it independently.
01:19:49.140 | FRANCESC CAMPOY: Yeah, I made that page, yeah.
01:19:50.060 | CHRIS BANES: I think we came up with it independently.
01:19:52.300 | Because the story behind this is we originally
01:19:55.700 | called it the DevRel team.
01:19:56.940 | And--
01:19:58.020 | FRANCESC CAMPOY: DevRel is cursed now.
01:19:59.620 | No one wants to listen to DevRel.
01:20:00.980 | CHRIS BANES: And Zeke was like, that sounds so boring.
01:20:03.300 | I have to go to someone and say I'm a developer relations
01:20:05.860 | person.
01:20:06.360 | FRANCESC CAMPOY: You don't want to be a hacker man.
01:20:07.500 | CHRIS BANES: Or a developer advocate or something.
01:20:09.340 | So we were like, OK, what's the way
01:20:11.020 | we can make this sound the most fun?
01:20:13.020 | All right, you're a hacker.
01:20:14.920 | I would say that is consistently the vibe
01:20:17.020 | I get from Replicate, everyone on your team I interact with.
01:20:20.260 | When I go to your San Francisco office,
01:20:22.740 | that's the vibe that you're generating.
01:20:24.660 | It's a hacker space more than an office.
01:20:27.060 | And you hold fantastic meetups there.
01:20:28.860 | And I think you're a really positive presence
01:20:30.660 | in our community.
01:20:31.460 | So thank you for doing all that.
01:20:33.020 | And it's instilling the hacker vibe and culture into AI.
01:20:36.980 | I'm really glad that's working.
01:20:38.700 | Yeah.
01:20:39.340 | Cool.
01:20:39.820 | That's a wrap, I think.
01:20:41.140 | Thank you so much for coming on, man.
01:20:42.660 | Yeah, of course.
01:20:43.060 | Thank you.
01:20:43.560 | This was a lot of fun.
01:20:45.340 | [MUSIC PLAYING]
01:20:48.700 | [MUSIC ENDS]
01:20:52.040 | [MUSIC PLAYING]
01:20:55.400 | [MUSIC ENDS]
01:20:59.320 | [MUSIC ENDS]
01:21:02.520 | [MUSIC ENDS]
01:21:05.560 | [BLANK_AUDIO]