back to indexA Brief History of the Open Source AI Hacker - with Ben Firshman of Replicate
Chapters
0:0 Introductions
1:22 Low latency is all you need
4:39 Evolution of CLIs
6:47 How building ArxivVanity led to Replicate
13:13 Making ML research replicable with containers
19:47 Doing YC in 2020 and pivoting to tools for COVID
23:11 Launching the first version of Replicate
29:26 Embracing the generative image community
31:58 Getting reverse engineered into an API product
35:54 Growing to 2 million users
39:37 Indie vs Enterprise customers
42:58 How customers uses Replicate
44:30 Learnings from Docker that went into Cog
52:24 Creating AI standards
57:49 Replicate's compute availability
62:38 Fixing GPU waste
70:58 What's open source AI?
75:19 Building for AI engineers
77:33 Hiring at Replicate
00:00:02.640 |
This is Alessio, partner and CTO of Residence 00:00:06.080 |
And I'm joined by my co-host, Swoogs, founder of Small.ai. 00:00:09.360 |
Hey, and today we have Ben Fershman in the studio. 00:00:14.560 |
Ben, you're a co-founder and CEO of Replicate. 00:00:17.280 |
Before that, you were most notably creator of Fig, 00:00:21.120 |
or founder of Fig, which became Docker Compose. 00:00:24.160 |
You also did a couple other things before that. 00:00:26.480 |
But that's what a lot of people know you for. 00:00:37.600 |
I think I'm a builder and tinkerer in a very broad sense. 00:00:43.320 |
So I work on things maybe a bit closer to tech, 00:00:55.040 |
and build bicycles, and all this kind of stuff. 00:01:01.560 |
from transferable skills, from just working in the real world 00:01:08.400 |
And there's so much about being a builder, both in real life 00:01:14.240 |
Is there a real-world analogy that you use often 00:01:16.160 |
when you're thinking about a code architecture problem? 00:01:22.040 |
I like to build software tools as if they were something real. 00:01:33.240 |
so I wrote this thing called the command line interface 00:01:36.520 |
guidelines, which was a bit like sort of the Mac human interface 00:01:41.400 |
I did it with the guy I created Docker Compose with 00:01:53.120 |
I think I described that your command line interface should 00:01:55.460 |
feel like a big iron machine, where you pull a lever 00:02:00.520 |
And things should respond within 50 milliseconds, 00:02:07.120 |
And another analogy here is in the real life, 00:02:10.040 |
you know when you press a button on an electronic device 00:02:13.000 |
and it's like a soft switch, and you press it, 00:02:15.080 |
and nothing happens, and there's no physical feedback 00:02:19.040 |
And then half a second later, something happens? 00:02:24.540 |
like something that's real, where you touch-- 00:02:26.440 |
you pull a physical lever and the physical lever moves. 00:02:29.760 |
And I've taken that lesson of human interface 00:02:37.600 |
really solid and robust, both the command lines and user 00:02:44.240 |
And how did you operationalize that for Fig or Docker? 00:02:50.320 |
Actually, we didn't do it very well for Fig and [INAUDIBLE] 00:02:56.000 |
where Python's really hard to get booting up fast, 00:02:58.840 |
because you have to load up the whole Python runtime 00:03:02.800 |
Go is much better at this, where Go just instantly starts. 00:03:07.880 |
So you have to be under 500 milliseconds to start up? 00:03:16.200 |
being immediate is something like 100 milliseconds. 00:03:27.560 |
well, one thing is I am maybe one of a few fellow people who 00:03:30.280 |
have actually written something about CLI design principles, 00:03:33.600 |
because I was in charge of the Netlify CLI back in the day 00:03:40.560 |
I'll just share it in case you have thoughts-- 00:03:42.480 |
is I think CLIs are effectively starting points 00:03:48.060 |
And the moment one of the script's preconditions 00:03:53.920 |
So the CLI developer will just exit the program. 00:03:58.920 |
And the way that I really wanted to create the Netlify dev 00:04:01.760 |
workflow was for it to be kind of a state machine that 00:04:06.640 |
If it detected a precondition wasn't fulfilled, 00:04:09.480 |
it would actually delegate to a subprogram that 00:04:13.640 |
asking for more info or waiting until a condition is fulfilled. 00:04:27.960 |
in the sense that when you run a CLI command, 00:04:32.040 |
And you may not have given the CLI all the things 00:04:39.520 |
Yeah, that reminds me of a thing we sort of thought 00:04:43.160 |
about when writing the CLI guidelines, where CLIs were 00:04:57.560 |
Whereas over time, the CLI has evolved to humans-- 00:05:05.240 |
it was back in a world where the primary way of using 00:05:08.360 |
and computers was writing shell scripts, effectively. 00:05:14.960 |
where, actually, humans are using CLI programs 00:05:19.320 |
And the current best practices about how Unix was designed-- 00:05:29.240 |
from the '70s and '80s, where they say things like, 00:05:33.080 |
command line commands should not output anything on success. 00:05:40.040 |
makes sense if you're using it in a shell script. 00:05:42.120 |
But if a user is using that, it just looks like it's broken. 00:05:45.640 |
If you type copy and it just doesn't say anything, 00:05:47.780 |
you assume that it didn't work as a new user. 00:05:52.120 |
And yeah, so I think what's really interesting about the CLI 00:06:02.160 |
to your point, it's a really good user interface 00:06:12.800 |
and either silently succeeding or saying, no, you did-- 00:06:16.720 |
failed, it can guide you in the right direction 00:06:22.680 |
and that kind of thing in a way that's actually-- 00:06:28.960 |
because it feels like this back and forth with the computer, 00:06:36.920 |
So I think there's some interesting intersection 00:06:41.160 |
being very closely related and a good fit for each other. 00:06:47.200 |
I would say one of the surprises from last year-- 00:06:51.120 |
think the most successful coding agent of my cohort 00:06:53.800 |
was Open Interpreter, which was a CLI implementation. 00:06:56.740 |
And I have chronically-- even as a CLI person, 00:07:06.480 |
which you recently retired after a glorious seven years. 00:07:11.200 |
Something like that, which is nice, I guess, HTML PDFs. 00:07:22.120 |
Which-- so when I quit Docker, I got really interested 00:07:26.920 |
in science infrastructure, just as a problem area, 00:07:36.080 |
science has created so much progress in the world. 00:07:38.760 |
The fact that we can talk to each other on a podcast, 00:07:42.680 |
and we use computers, and the fact that we're alive 00:07:46.800 |
But science is just completely archaic and broken. 00:07:51.560 |
that just happen to be copied to the internet 00:07:55.080 |
rather than taken into account that we can transfer information 00:08:01.240 |
and all this kind of thing is all very broken. 00:08:04.040 |
There's just so much potential for making science work better. 00:08:08.560 |
and I didn't really have any time to go and get a PhD 00:08:12.520 |
But I'm a tool builder, and I could make existing scientists 00:08:16.240 |
And if I could make a bunch of scientists a little bit better 00:08:19.040 |
at their job, maybe that's the kind of equivalent 00:08:28.960 |
in that it's all of these PDFs, quite often behind paywalls 00:08:41.400 |
funded by national grants, government grants, 00:08:49.660 |
But the particular thing we got dialed in on was-- 00:08:58.600 |
there's a bunch of open science that happens as well. 00:09:00.800 |
So math, physics, computer science, machine learning, 00:09:03.680 |
notably, is all published on the Archive, which is actually 00:09:12.520 |
Yeah, it was just like somebody in Cornell who started 00:09:22.000 |
And it's kind of like a user group thing, right? 00:09:31.040 |
And that's where basically all of math, physics, 00:09:36.600 |
But it's still PDFs published to this thing, which 00:09:42.200 |
So the web was invented at CERN, a physics institution, 00:10:00.100 |
because you want to link to another academic paper. 00:10:02.280 |
But instead, you have to copy and paste these things 00:10:17.720 |
So anyway, I got really frustrated with that. 00:10:19.600 |
And I went on vacation with my old friend Andreas. 00:10:26.600 |
And we were just on vacation in Greece for fun. 00:10:33.520 |
We had to zoom in and scroll line by line on the PDF. 00:10:44.880 |
And we spent our vacation sitting by the pool, 00:10:48.080 |
making LaTeX to HTML converters and making the first version 00:10:59.920 |
because they caught the eye of Archive, who were like, 00:11:04.320 |
We just haven't had the time to work on this. 00:11:06.200 |
And what's tragic about the Archive is it is like this 00:11:10.320 |
it's like this project of Cornell that's like, 00:11:12.920 |
they can barely scrounge together enough money to survive. 00:11:29.240 |
But anyway, they were like, yeah, this is great. 00:12:05.540 |
We were after-- we both were both users of Archive Sanity, 00:12:17.680 |
And Andreas just like cracked a joke of like, 00:12:31.480 |
So Replicate maybe feels like an overnight success 00:12:40.800 |
And we've been collaborating for even longer. 00:12:45.840 |
So in some sense, we've been doing this almost like six, 00:12:57.360 |
I was still really interested in science publishing 00:13:02.800 |
because I tell a lot of the condensed story to people, 00:13:04.960 |
because I can't really tell a seven-year history. 00:13:10.760 |
We want to nail the definitive Replicate story here. 00:13:13.240 |
One thing that's really interesting about these machine 00:13:15.480 |
learning papers is that these machine learning papers 00:13:21.080 |
And a lot of them are actual fundamental research, 00:13:27.280 |
But a lot of them are just running pieces of software 00:13:40.040 |
And they managed to make an image classification 00:13:42.380 |
model that was better than the existing state of the art. 00:13:46.360 |
And they've made an actual running piece of software 00:13:58.640 |
And what's frustrating about that is if you want to-- 00:14:06.640 |
Andreas was a machine learning engineer at Spotify. 00:14:13.120 |
He did a PhD, and he was doing a lot of stuff internally. 00:14:15.480 |
But part of his job was also being an engineer 00:14:22.120 |
and trying to apply them to actual problems at Spotify. 00:14:31.960 |
It's probably listing lots of crucial information. 00:14:40.880 |
But it was quite often just scrappy research code 00:14:44.960 |
And there was maybe the weights that were on Google Drive, 00:14:47.520 |
but they accidentally deleted the weights off Google Drive. 00:15:00.200 |
And I connected this back to my work at Docker as well. 00:15:03.640 |
I was like, oh, this is what we created containers for. 00:15:10.740 |
so you could ship it around and it kept on running. 00:15:18.440 |
models inside containers so that they could actually 00:15:25.720 |
And other researchers could run them to generate baselines. 00:15:31.200 |
to real problems in the world could just pick up the container 00:15:44.140 |
created Cog, this container stuff for machine learning 00:15:48.920 |
for people to publish these machine learning models. 00:15:50.480 |
But there's actually like two or three years between that. 00:16:01.660 |
struggled with as a researcher is generating baselines. 00:16:06.680 |
to get five other models that are existing in work 00:16:16.540 |
because you can't trust the numbers in the paper. 00:16:24.560 |
MARK MANDEL: So he was like, what if you could-- 00:16:28.340 |
I think this was coming from the thinking of, 00:16:30.180 |
there should be containers for machine learning, 00:16:33.520 |
OK, maybe we can create a supply of containers 00:16:36.200 |
by creating this useful tool for researchers. 00:16:39.080 |
And the useful tool was like, let's get researchers 00:16:43.580 |
to the central place where we run a standard set of benchmarks 00:16:46.560 |
across the models so that you can trust those results 00:16:51.200 |
and you can compare these models apples to apples. 00:16:54.600 |
doing a new piece of research, he could trust those numbers. 00:17:02.440 |
confirm it on his machine, use the standard benchmark 00:17:04.560 |
to then measure his model, and all this kind of stuff. 00:17:12.560 |
We got into YC, and we started building a prototype of this. 00:17:16.000 |
And then this is where it all starts to fall apart. 00:17:22.960 |
That's a great way to create a supply of models 00:17:28.480 |
How are we even going to make any money out of this? 00:17:30.640 |
And we're like, oh, shit, that's the real unknown here 00:17:35.560 |
So we thought it would be a really good idea to-- 00:17:44.880 |
let's try and reduce the risk of this turning into a business. 00:17:49.720 |
So let's try and research what the business could 00:17:57.360 |
So we went and talked to a bunch of companies trying 00:18:06.320 |
so that other researchers, or say the product manager, 00:18:12.760 |
And we were like, do you want a deployment platform 00:18:20.360 |
Do you want a central place for versioning models? 00:18:22.880 |
We're trying to think of lots of different products 00:18:24.960 |
we could sell that were related to this thing. 00:18:32.100 |
don't want to buy something that doesn't exist. 00:18:36.180 |
but we were just a bunch of product people, products 00:18:39.540 |
and engineer people, and we just couldn't pull this off. 00:18:47.480 |
We had no idea what our business was going to be, 00:18:49.560 |
because we couldn't get anybody to buy something 00:18:53.860 |
And actually, this was quite a way through our-- 00:18:55.860 |
I think it was like 2/3 the way through our YC batch 00:18:58.300 |
So we're like, OK, well, we're kind of screwed now, 00:19:00.460 |
because we don't have anything to show at demo day. 00:19:05.780 |
what can we build in two weeks that will be something? 00:19:10.260 |
I can't remember what we tried to build at that point. 00:19:13.300 |
And then two weeks before demo day, I just remember this. 00:19:22.580 |
we were going down to Mountain View every week for dinners, 00:19:29.100 |
And they were like, don't come to dinner tomorrow. 00:19:33.900 |
And we realized-- we kind of looked at the news, 00:19:37.400 |
and we were like, oh, there's a pandemic going on. 00:19:42.020 |
were just completely oblivious to what was going on around us. 00:19:49.340 |
Because I remember Silicon Valley at the time 00:20:07.820 |
because we just kind of couldn't raise money anyway. 00:20:11.520 |
FRANCESC CAMPOY: In the normal course of events, 00:20:13.480 |
you're actually allowed to defer to a future demo day. 00:20:27.620 |
that YC has become incredibly valuable for us 00:20:36.860 |
that we didn't need to do YC to start with, because we 00:20:50.180 |
If you go to a VC and be like, hey, I made this piece of-- 00:20:54.780 |
Yeah, and people can pattern match like that, 00:20:59.020 |
and they can have some trust you know what you're doing. 00:21:01.380 |
Whereas it's much harder for people straight out of college, 00:21:03.540 |
and that's where YC's sweet spot is helping people straight 00:21:05.740 |
out of college who are super promising figure out 00:21:11.180 |
But the thing that's been incredibly useful for us 00:21:20.500 |
And Solomon, the founder of Docker, I think, told me this. 00:21:22.900 |
He was like, a lot of people underestimate the value of YC 00:21:29.140 |
And his biggest regret was not staying in touch with YC. 00:21:32.780 |
I might be misattributing this, but I think it was him. 00:21:37.360 |
stayed in touch with our batch partner, who-- 00:21:47.540 |
there was the growth team at YC when they were still there, 00:21:52.660 |
And two things that have been super helpful about that 00:22:00.100 |
and they've been super helpful during that process 00:22:04.180 |
and they've been super helpful during the whole process. 00:22:23.900 |
You have a warm intro to every one of them, basically. 00:22:27.960 |
you can post about updates to your product, which 00:22:35.340 |
We've just got so many of our users and customers 00:22:56.820 |
And yeah, so that's been a really, really positive 00:23:02.100 |
And sorry, I interrupted with the YC question. 00:23:05.340 |
You just made it out of the YC, survived the pandemic. 00:23:12.780 |
Then we started building tools for COVID, weirdly. 00:23:17.820 |
What's the most useful thing we could be doing right now? 00:23:25.020 |
We had a bunch of products that didn't really go anywhere. 00:23:28.340 |
We worked on a bunch of stuff, like contact tracing, 00:23:36.060 |
Andreas worked on a door dash for people delivering food 00:23:46.220 |
We met a problem of helping people direct their efforts 00:23:48.540 |
to what was most useful and a few other things like that. 00:23:52.980 |
So we're like, OK, this is not really working either. 00:23:55.820 |
We were considering actually just doing work for COVID. 00:23:58.780 |
We have this decision document early on in our company, which 00:24:01.300 |
is like, should we become a government app contracting 00:24:18.100 |
And we were just really good at building stuff. 00:24:25.940 |
And we were working with a designer at the time, 00:24:28.140 |
a guy called Mark, who did our early designs for Replicate. 00:24:30.660 |
And we were like, hey, what if we just team up and become it 00:24:35.020 |
But yeah, we gave up on that in the end for-- 00:24:49.420 |
from previous startups is shutting them down, 00:25:04.240 |
that won't page us in the middle of the night? 00:25:10.700 |
We made a thing which was an open source weights and biases, 00:25:14.940 |
because we had this theory that people want open source tools. 00:25:18.780 |
There should be an open source version control experiment 00:25:24.300 |
And we were like, oh, we're software developers. 00:25:27.340 |
Everyone loves command line tools and open source stuff. 00:25:30.100 |
But machine learning researchers just really didn't care. 00:25:33.480 |
They didn't mind that it was a cloud service. 00:25:37.380 |
need lots of graphs and charts and stuff like this. 00:25:45.340 |
that Andreas made at Spotify for just saving experiments 00:25:54.680 |
And then that was actually originally called Replicate. 00:26:05.900 |
So we were like, oh, maybe there was a thing. 00:26:11.020 |
their work in containers for machine learning models. 00:26:15.000 |
And at that point, we were kind of running out of the YC money. 00:26:17.700 |
So we were like, OK, this feels good, though. 00:26:20.980 |
So that was the point we raised a seed round. 00:26:25.900 |
- We raised pre-launch, pre-launch and pre-team. 00:26:34.060 |
But we were like, OK, bootstrapping this thing 00:26:38.700 |
is getting hard, so let's actually raise some money. 00:26:46.940 |
It initially didn't have APIs, interestingly. 00:26:49.500 |
It was just the bit that I was talking about before, 00:26:53.780 |
So it was a way for researchers to put their work on a web page 00:27:02.420 |
and so that you could download the Docker container. 00:27:05.940 |
we cut the benchmarks thing of it, because we thought 00:27:09.580 |
But it had a Docker container that Andreas, in a past life, 00:27:15.740 |
And you could compare all these models apples to apples. 00:27:24.500 |
It was still when it was long time pre-AI hype. 00:27:29.740 |
And there was lots of interesting stuff going on. 00:27:31.740 |
But it was very much in the classic deep learning era, 00:27:35.060 |
so image segmentation models, and sentiment analysis, 00:27:39.700 |
and all these kind of things that people were using deep 00:27:49.860 |
These are people who'd be publishing to archive. 00:27:57.900 |
And we were creating accompanying material for it. 00:28:10.600 |
that they just made one thing every six months, 00:28:16.060 |
They published this piece of paper, and like, done. 00:28:43.980 |
And people started smushing Clip and GANs together 00:28:51.940 |
it was just a bunch of tinkerers on Discord, basically. 00:28:56.860 |
It was-- there was an early model called BigSleep 00:29:05.900 |
was a bit more popular, by Rivers Have Wings. 00:29:08.560 |
And it was all just people tinkering on stuff in Colabs. 00:29:11.940 |
And it was people just making copies of Colabs 00:29:15.300 |
And to me, I saw this, and I was like, oh, this 00:29:17.300 |
feels like open source software, so much more 00:29:28.620 |
And people were-- things were moving really fast. 00:29:30.780 |
And it just felt like this creative, dynamic, 00:29:34.940 |
collaborative community in a way that research wasn't really. 00:29:41.460 |
Like, it was still stuck in this kind of six-month publication 00:29:51.220 |
And a lot of those early models were published on Replicate. 00:29:55.460 |
I think the first one that was really primarily on Replicate 00:29:58.580 |
was one called Pixray, which was sort of mid-2021. 00:30:10.880 |
like some of these early image generation models. 00:30:13.140 |
And that was published primarily on Replicate. 00:30:23.040 |
to find our early community and where we really found, 00:30:25.300 |
oh, we've actually built a thing that people want. 00:30:30.700 |
and people really want to try out these models. 00:30:32.700 |
Lots of people were running the models on Replicate. 00:30:35.020 |
We still didn't have APIs, though, interestingly. 00:30:37.220 |
And this is another really complicated part of the story. 00:30:39.340 |
We had no idea what a business model was still at this point. 00:30:43.460 |
It's just these web forms where people could run the model. 00:30:47.020 |
FRANCESC CAMPOY: Just before this API bit continues, 00:30:48.940 |
just for historical interests, which discords were they, 00:30:56.860 |
MARK MANDEL: Alutha, I particularly remember. 00:31:06.860 |
And I just remember being completely just captivated 00:31:11.980 |
I was just playing around with it all afternoon 00:31:16.580 |
FRANCESC CAMPOY: This is the beginnings of MidJourney. 00:31:22.740 |
And it's where that kind of user interface came from. 00:31:26.540 |
is you could see what other people are doing. 00:31:32.180 |
And it was just so much fun to just play around 00:31:38.540 |
And yeah, that just completely captivated me. 00:31:54.780 |
so was it APIs Next or was it Stable Diffusion Next? 00:31:58.200 |
And the APIs happened because one of our users-- 00:32:02.700 |
our web form had an internal API for making the web form work, 00:32:05.860 |
like with an API that was called from JavaScript. 00:32:25.800 |
MARK MANDEL: And they started generating a bunch of images. 00:32:31.700 |
And I think a sort of usual reaction to that would be like, 00:32:36.820 |
hey, you're abusing our API to shut them down. 00:32:39.420 |
And instead we're like, oh, this is interesting. 00:32:44.220 |
So we documented the API in a Notion document, 00:32:58.180 |
That'll be like $1,000 a month, please, with a straight form 00:33:10.140 |
MARK MANDEL: It was a surprising amount of money, yeah. 00:33:13.300 |
MARK MANDEL: It was on the order of $1,000 a month. 00:33:23.220 |
And so he made a bunch of art with these models 00:33:35.840 |
who were also generating NFTs and trying to save models. 00:33:39.860 |
And that was the start of our API business, yeah. 00:33:44.860 |
And then we made an official API and actually 00:33:47.620 |
added some billing to it so it wasn't just like a fixed fee. 00:33:52.720 |
FRANCESC CAMPOY: And now people think of you as the host 00:34:02.380 |
is it was really fulfilling, like the original goal of what 00:34:05.820 |
we wanted to do is that we wanted to make this research 00:34:08.220 |
that people were making accessible to other people 00:34:19.900 |
these generative models could publish them to replicate, 00:34:30.180 |
could just run these models with a single line of code. 00:34:32.500 |
And we thought, oh, maybe the Docker image is enough. 00:34:34.380 |
But it's actually super hard to get the Docker image running 00:34:37.300 |
So it really needed to be the hosted API for this to work 00:34:40.060 |
and to make it accessible to software engineers. 00:34:45.340 |
FRANCESC CAMPOY: Yeah, two years to the first paying customer. 00:34:49.620 |
FRANCESC CAMPOY: Did you ever think about becoming 00:34:53.220 |
You have so much interest in image generation. 00:34:57.020 |
I mean, you're doing fine, for the record, but you know. 00:35:06.740 |
I think our expertise was DevTools rather than-- 00:35:08.740 |
MidJourney is almost like a consumer products. 00:35:18.060 |
like, oh, maybe we could hire some of these people 00:35:19.940 |
in this community and make great models and stuff like this. 00:35:26.380 |
I think before, I was saying, I'm not really a researcher. 00:35:28.740 |
I'm more like the tool builder, the behind the scenes. 00:35:30.500 |
And I think both me and Andreas are like that. 00:35:43.940 |
And you want to pave the cow paths, is what they say, right? 00:35:46.380 |
Like, the unofficial paths that people are making, 00:35:48.540 |
like, make it official and make it easy for them, 00:35:56.460 |
you have two million developers using Replicate, maybe more. 00:35:59.940 |
That was the last public number that I found. 00:36:01.980 |
Two million-- I think that got mangled, actually, by-- 00:36:09.780 |
And then 30,000 paying customers was the number. 00:36:20.620 |
MARK MANDEL: --Whisper diarization on Replicate. 00:36:24.180 |
So we're late in space, and this is in the 30,000. 00:36:31.620 |
I would say that maybe the stable diffusion time, August 00:36:34.740 |
'22, was really when the company started to break out. 00:36:39.320 |
Tell us a bit about that and the community that came out. 00:36:41.820 |
And I know now you're expanding beyond just image generation. 00:36:50.220 |
we saw there was this really interesting generative image 00:36:53.300 |
So we're building the tools for that community already, really. 00:37:05.040 |
It was the best generative image model so far. 00:37:10.020 |
was just what an inflection point it would be, 00:37:13.660 |
it was-- I think Simon Willison put it this way, 00:37:20.820 |
it was a model that was open source and tinkerable 00:37:32.260 |
and open source and tinkerable, such that it just took off 00:37:37.540 |
And what was really neat about stable diffusion 00:37:44.580 |
compared to DALI, for example, which was equivalent quality, 00:37:50.760 |
And the first week, we saw people making animation models 00:37:57.420 |
that use circular convolutions to make repeatable textures. 00:38:03.740 |
A few weeks later, people were fine-tuning it 00:38:13.940 |
And all of this innovation was happening all of a sudden. 00:38:19.860 |
because you could just publish arbitrary models on Replicate. 00:38:22.400 |
So we had this supply of interesting stuff being built. 00:38:25.140 |
But because it was a sufficiently good model, 00:38:28.580 |
there was also just a ton of people building with it. 00:38:33.100 |
They were like, oh, we can build products with this thing. 00:38:35.500 |
And this was about the time where people were starting 00:38:38.420 |
So tons of product builders wanted to build stuff with it. 00:38:41.900 |
in the middle as the interface layer between all these people 00:38:44.580 |
who wanted to build and all these machine learning 00:38:50.580 |
We were just incredible supply, incredible demand. 00:38:55.260 |
And then, yeah, since then we've just grown and grown, really. 00:38:58.840 |
And we've been building a lot for the indie hacker community, 00:39:02.080 |
these individual tinkerers, but also startups, 00:39:04.340 |
and a lot of large companies as well who are exploring 00:39:09.560 |
And then the same thing happened middle of last year 00:39:16.280 |
the same stable diffusion effect happened with LLAMA. 00:39:21.640 |
ever because tons of people wanted to tinker with it 00:39:25.000 |
And since then, we've just been seeing a ton of growth 00:39:29.720 |
And yeah, we're just riding a lot of the interest that's 00:39:33.880 |
going on in AI and all the people building in AI. 00:39:40.160 |
But also took a while to position for the right place 00:39:59.760 |
He does because you cited him on your Series B blog post, 00:40:02.680 |
and Danny Post might as well, his competitor, 00:40:07.040 |
What are their needs versus the more enterprise or B2B type 00:40:14.080 |
Did you come to a decision point where you're like, 00:40:20.040 |
are bigger and perhaps better customers because they're 00:40:25.960 |
think a lot of people right now want to use and build with AI, 00:40:32.040 |
And they're not infrastructure experts either. 00:40:35.780 |
without having to figure out all the internals of the models 00:40:42.040 |
And they also don't want to be setting up and booting up 00:40:46.800 |
And that's the same all the way from indie hackers just 00:40:51.360 |
getting started-- because obviously, you just 00:41:17.080 |
And it's like, you really need to be an expert. 00:41:21.760 |
So they're surprisingly similar in that sense. 00:41:24.200 |
And I think it's kind of also unfair of the indie community. 00:41:30.160 |
They're not churning, surprisingly, or churny 00:41:33.600 |
They're building real established businesses, 00:41:39.240 |
these really large, sustainable businesses, often just 00:41:47.800 |
And it's kind of remarkable how they can do that, actually. 00:41:50.260 |
And it's in credit to a lot of their product skills. 00:41:55.600 |
being their machine learning team, effectively, 00:42:02.280 |
a lot of these indie hackers are some of our largest customers, 00:42:06.720 |
that you would think would be spending a lot more money 00:42:35.680 |
Well, I mean, I'm naming them because they're 00:42:45.480 |
Like, if I see someone doing something that I want to do, 00:42:47.840 |
then I'm like, OK, Replicate's great for that. 00:42:50.040 |
So that's what I think about case studies on company 00:42:52.320 |
landing pages, is that it's just a way of explaining, 00:42:55.000 |
like, yep, this is something that we are good for. 00:43:20.920 |
and they want to create a text description of it 00:43:24.160 |
And they're annotating images with off-the-shelf open source 00:43:27.400 |
We have this big library of open source models that you can run. 00:43:30.360 |
And we've got lots of people who are running these open source 00:43:42.200 |
They're running completely custom models on us. 00:43:56.400 |
writing the Python themselves, because they've 00:44:01.280 |
And they're using us for their inference infrastructure 00:44:05.840 |
So it's lots of different levels of sophistication, 00:44:08.080 |
where some people are using these off-the-shelf models. 00:44:13.080 |
Peter Levels is a great example, where a lot of his products 00:44:15.540 |
are based off fine-tuning image models, for example. 00:44:25.760 |
So yeah, it's all things up and down the stack. 00:44:30.000 |
Let's talk a bit about Cog and the technical layer. 00:44:37.080 |
I think people have different pricing points. 00:44:39.520 |
And I think everybody tries to offer a different developer 00:44:41.940 |
experience on top of it, which then lets you charge a premium. 00:44:48.120 |
What were some of the-- you worked at Docker. 00:44:49.960 |
What were some of the issues with traditional container 00:44:53.920 |
And maybe, yeah, what were you surprised with as you built it? 00:45:05.600 |
the benchmarking system for machine learning researchers, 00:45:08.760 |
where we wanted researchers to publish their models 00:45:11.520 |
in a standard format that was guaranteed to keep on running, 00:45:19.640 |
And we realized that we needed something like Docker 00:45:24.920 |
And I think it was just natural, from my point of view, 00:45:29.940 |
that we should try and create some kind of open standard 00:45:38.560 |
I think the magic of Docker is not really in the software. 00:45:41.560 |
It's just the standard that people have agreed on. 00:45:44.640 |
Here are a bunch of keys for a JSON document, basically. 00:45:49.000 |
And that was the magic of the metaphor of real containerization 00:45:53.760 |
It's not the containers that are interesting. 00:45:55.640 |
It's like the size and shape of the damn box. 00:45:59.540 |
And it's a similar thing here, where really we just 00:46:01.280 |
wanted to get people to agree on this is what 00:46:13.120 |
that attaches to a CUDA device, if it needs a GPU, that 00:46:17.400 |
has a open API specification as a label on the Docker image. 00:46:21.920 |
And the open API specification defines the interface 00:46:26.800 |
for the machine learning model, like the inputs and outputs 00:46:32.440 |
effectively, or the params in machine learning terminology. 00:46:36.680 |
And we just wanted to get people to agree on this thing. 00:46:41.440 |
We weren't saying-- some of the existing things 00:46:45.200 |
But we really wanted something general purpose 00:46:47.160 |
enough that you could just put anything inside this. 00:46:51.900 |
And it'd be future compatible with future inference servers 00:47:08.520 |
A bunch of people have been using Cog outside of Replicates, 00:47:13.080 |
This should be how machine learning models are packaged 00:47:19.000 |
where maybe they can't use the SAS service because they're 00:47:23.800 |
And they're not allowed to use a SAS service. 00:47:30.300 |
And they can download the models from Replicates 00:47:37.240 |
People who want to build custom inference pipelines 00:47:42.520 |
it as a component in their inference pipelines. 00:47:48.900 |
And it's just been kind of happening organically. 00:47:56.680 |
And yeah, so a lot of it is just sort of philosophical. 00:48:00.360 |
This is how it should work from my experience at Docker. 00:48:03.120 |
And there's just a lot of value from the core being open, 00:48:12.760 |
to work with a testing system, like a CI system or whatever, 00:48:22.840 |
And then you can test your models on that CI system 00:48:26.860 |
And it's just a format that we can get everyone to agree on. 00:48:30.040 |
What do you think, I guess, Docker got wrong? 00:48:33.280 |
Because if I look at a Docker Compose and a Cog definition, 00:48:36.000 |
first of all, the Cog is kind of like the Docker 00:48:40.800 |
And Docker Compose are just exposing the services. 00:48:43.960 |
And also, Docker Compose is very ports-driven, 00:48:53.120 |
Yeah, any learnings and maybe tips for other people building 00:49:23.540 |
And it's sort of the combination of two things 00:49:29.760 |
was a little bit of the interface around the machine 00:49:32.600 |
So we realized that we wanted it to be general purpose. 00:49:35.320 |
We wanted it to be at the JSON human-readable things, 00:49:49.200 |
And it's really just a wrapper around Docker. 00:49:57.160 |
So we wanted to be able to have a open API specification there 00:50:09.800 |
how that function is run, which is all defined in code, 00:50:12.520 |
So it's like a bunch of abstraction on top of Docker 00:50:18.600 |
But the core problems we were solving for users 00:50:38.560 |
Dockerfiles are hard enough for software developers to write. 00:50:41.080 |
I'm saying this with love as somebody who works on Docker 00:50:48.200 |
And you need to know a bunch about Linux, basically, 00:50:50.360 |
because you're running a bunch of CLI commands. 00:50:52.360 |
You need to know a bunch of Linux and best practices, 00:50:56.480 |
So we're like, OK, we can't get to that level. 00:50:58.200 |
We need something that machine learning researchers will 00:51:08.000 |
And somebody told me to apt-get install something. 00:51:11.820 |
MARK MANDEL: And throw sudo in there when I don't really 00:51:15.320 |
So we tried to create a format that was at that level. 00:51:23.240 |
going to understand and trying to build for them? 00:51:26.280 |
And then the productionizing machine learning models thing 00:51:33.360 |
all of the complexity of productionizing machine 00:51:36.800 |
Like picking CUDA versions, like hooking it up to GPUs, 00:51:41.040 |
writing an inference server, defining a schema, 00:51:44.940 |
doing batching, all of these just really gnarly things 00:52:00.400 |
with the world need of needing a common standard for what 00:52:09.200 |
I don't know whether that answers the question. 00:52:12.880 |
want what Docker stands for in terms of standard, 00:52:24.240 |
FRANCESC CAMPOY: So I want to, for the listener, 00:52:26.680 |
you're not the only standard that is out there. 00:52:28.600 |
As with any standard, there must be 14 of them. 00:52:34.040 |
who is your former colleagues from Docker, who 00:52:41.040 |
And then I don't know if this is in the same category even, 00:52:44.520 |
Like Hugging Face has the Transformers and Diffusers 00:52:46.480 |
library, which is a way of disseminating models 00:52:51.080 |
How would you compare your contrast, your approach 00:52:54.520 |
MARK MANDEL: It's kind of complementary, actually, 00:52:59.640 |
Transformers, for example, is lower level than Cog. 00:53:10.240 |
You still need to install the Python packages 00:53:12.800 |
So lots of Replicate models are Transformers models 00:53:24.040 |
And we're kind of working on integration with Hugging Face 00:53:26.560 |
such that you can deploy models from Hugging Face 00:53:29.020 |
into Cog models and stuff like that and to Replicate. 00:53:38.320 |
and what Llama are working on, are also very complementary 00:53:41.280 |
in that they're doing a lot of the running these things 00:53:46.880 |
locally on laptops, which is not a thing that 00:53:53.400 |
and attaching to CUDA devices and NVIDIA GPUs 00:53:58.160 |
So we're trying to figure out-- we're actually 00:54:11.580 |
in that you should be able to take a model and Replicate 00:54:14.840 |
You should be able to take a model on your local machine 00:54:19.480 |
FRANCESC CAMPOY: Is the base layer something like-- 00:54:42.960 |
Exactly where those lines are drawn, I don't know exactly. 00:54:45.440 |
I think this is something we're trying to figure out ourselves. 00:54:47.960 |
But I think there's certainly a lot of promise 00:54:51.880 |
I think we just want things to work together. 00:54:54.000 |
We want to try and reduce the number of standards 00:54:56.080 |
so the more these things can interoperate and convert 00:54:58.880 |
between each other and that kind of stuff at the minute. 00:55:01.160 |
FRANCESC CAMPOY: Andreas comes out of Spotify. 00:55:07.680 |
You work at Docker and the Llama guys work at Docker. 00:55:18.480 |
had a kind of like similar-- not similar idea, 00:55:22.680 |
Or did you then just say, oh, I know those people. 00:55:33.480 |
And it's funny how I think we're all seeing the same problems 00:55:36.120 |
and just applying, trying to fix the same problems that we're 00:55:42.720 |
funny because I joined Docker through my startup. 00:55:48.300 |
Funnily, actually, the thing which worked from my startup 00:55:52.400 |
working on another thing, which was a bit like EC2 for Docker. 00:56:19.400 |
And it's funny how we're both applying the things we saw 00:56:36.460 |
because there's just so much opportunity for working there. 00:56:39.720 |
FRANCESC CAMPOY: When you have a hammer, everything's a nail. 00:56:46.560 |
this is-- I mean, where we're coming from a lot with AI 00:56:52.880 |
because we're all kind of, on the Replicator team, 00:56:55.680 |
we're all kind of people who have built developer 00:57:01.240 |
We've got people who worked at Heroku, and GitHub, 00:57:04.000 |
and the iOS ecosystem, and all this kind of thing. 00:57:07.160 |
Like, the previous generation of developer tools, 00:57:14.960 |
And we just don't yet have those tools and abstractions 00:57:22.080 |
that we learned from the previous generation of stuff 00:57:24.440 |
and apply it to this new generation of stuff. 00:57:26.840 |
And obviously, there's a bit of nuance there, 00:57:28.720 |
because the trick is to take the right lessons 00:57:40.280 |
take some of those lessons we learned from how Heroku and 00:57:44.280 |
GitHub was built, for example, and apply them to AI. 00:57:50.200 |
We should also talk a little bit about your compute 00:58:02.080 |
What do you feel about the tightness of the GPU market? 00:58:11.620 |
And we are primarily built on just public clouds, 00:58:14.560 |
so primarily GCP and CoreWeave, and some smatterings elsewhere. 00:58:21.360 |
FRANCESC CAMPOY: Not from NVIDIA, which is your newest 00:58:25.720 |
So they're kind of helping us get GPU availability. 00:58:29.400 |
GPUs are hard to get hold of if you go to AWS 00:58:38.880 |
and ask for one A100, they won't give you an A100. 00:58:42.480 |
But if you go to AWS and say, I would like 100 A100s in two 00:58:50.600 |
The cloud providers, that makes sense from their point of view. 00:58:59.160 |
in their infrastructure, which makes total sense. 00:59:07.880 |
where we can aggregate demand, so we can make commits 00:59:16.600 |
It's not-- we don't have infinite availability, 00:59:20.720 |
obviously, but if you want an A100 from Replicate, 00:59:23.480 |
But we're seeing other companies pop up as well. 00:59:31.880 |
where they're doing the same idea for training almost, 00:59:34.120 |
where a lot of startups need to be able to train a model, 00:59:37.560 |
but they can't get hold of GPUs from large cloud providers. 00:59:39.980 |
So SfCompute is letting people rent 10 H100s for two days, 00:59:47.880 |
is they're aggregating demand such that they can make 00:59:50.960 |
and then let people use smaller chunks of it. 00:59:52.920 |
And that's what we're doing with Replicate as well, 00:59:54.540 |
where we're aggregating demand such that we make big commits 00:59:58.280 |
And then people can run a 100-millisecond API request 01:00:04.200 |
FRANCESC CAMPOY: Coming from a finance background, 01:00:08.900 |
where the job of a bank is maturity transformation, 01:00:14.040 |
You take short-term deposits, which technically 01:00:16.000 |
can be withdrawn at any time, and you turn that 01:00:17.920 |
into long-term loans for mortgages and stuff. 01:00:24.000 |
MARK MANDEL: Yeah, that's exactly what we're doing. 01:00:31.000 |
as well, because we have to make bets on the future demand 01:00:48.120 |
we're projecting our growth with some educated guesses 01:00:50.640 |
about what kind of models are going to come out 01:00:59.200 |
So we need to have GPUs with a lot of RAM, or multi-GPU nodes, 01:01:09.280 |
Speaking of which, the mixture of experts' models 01:01:11.760 |
must be throwing a spanner into the planning. 01:01:20.320 |
which can run this, and multi-node H100 machines, 01:01:30.440 |
FRANCESC CAMPOY: OK, I didn't expect it to be so easy. 01:01:33.920 |
My impression was that the amount of RAM per model 01:01:37.280 |
is increasing a lot, especially on a sort of per parameter 01:01:43.640 |
going from Mixed Trial being eight experts to the Deep 01:01:55.360 |
MARK MANDEL: I think we might run into problems at some point. 01:01:58.200 |
And yeah, I don't know exactly what's going on there. 01:02:04.080 |
I think something that we're finding, which is kind of 01:02:06.600 |
interesting-- like, I don't know this in depth. 01:02:10.440 |
But we're certainly seeing a lot of good results 01:02:19.920 |
So 90% of the performance with just much less RAM required. 01:02:25.840 |
And that means that we can run them on GPUs we have available. 01:02:30.720 |
And it's good for customers as well, because it runs faster. 01:02:33.400 |
And they want that trade-off of where it's just slightly worse, 01:02:39.480 |
FRANCESC CAMPOY: Do you see a lot of GPU waste 01:02:41.760 |
in terms of people running the thing on a GPU that 01:02:54.920 |
people were like, oh, how do I get access to like H100s? 01:02:57.880 |
And it's like, you need to run [INTERPOSING VOICES] 01:03:11.560 |
And it's surprisingly hard to optimize these models right now. 01:03:28.200 |
So something we want to be able to help people with 01:03:33.840 |
Like, either we show people how to with guides, 01:03:37.960 |
or we make it easier to use some of these more optimized 01:03:43.200 |
how to compile the models, or we do that automatically, 01:03:52.520 |
It's also a bad experience, and the models run slow. 01:03:57.560 |
some of the most popular models on Replicate we have-- 01:04:05.280 |
Like, people have pushed those models themselves. 01:04:09.560 |
where there's like a long tail of lots of models 01:04:11.560 |
that people have pushed, and then a big head of the models 01:04:16.520 |
So models like Llama 2, like Stable Diffusion, 01:04:23.460 |
we work with Meta and Stability to maintain those models. 01:04:35.260 |
And going into the-- well, it's already the new year. 01:04:38.620 |
Do you see the customer demand and the GPU hardware 01:05:02.820 |
Do you see maybe a lot of this model improvement work 01:05:18.680 |
That's a very nicely put way, as a startup founder, to respond. 01:05:25.460 |
Yeah, I'll maybe get into a little bit of this on the-- 01:05:29.500 |
Actually, so when Alessio talked about GPU waste, he was more-- 01:05:36.020 |
Yeah, it is getting a little bit warm in here, 01:05:42.660 |
of picking the wrong box model, whereas yours 01:05:52.100 |
What other sort of techniques are you referencing? 01:05:57.340 |
I talk to your competitors, and I don't know if-- 01:06:04.460 |
Basically, they'll quantize their models for you 01:06:08.200 |
So you basically use their versions of LamaTool. 01:06:16.180 |
I don't see it as the Replicate DNA to do that, 01:06:20.140 |
you would have to slap the Replicate house brand 01:06:25.380 |
Like, what do you mean when you say optimize models? 01:06:27.700 |
Yeah, I mean, things like quantizing the models, 01:06:30.240 |
you can imagine a way that we could help people quantize 01:06:38.140 |
We've had success using inference servers like VLM 01:06:43.700 |
and TRT-LLM, and we're using those kind of things 01:06:48.980 |
We've had success with things like AI templates, which 01:06:52.180 |
compile the models, all of those kind of things. 01:06:57.340 |
And there's some even really just boring things 01:07:02.860 |
Like, some people, when they're just writing some Python code, 01:07:05.780 |
it's really easy to just write an efficient Python code. 01:07:09.140 |
There's really boring things like that as well. 01:07:11.580 |
But it's like a whole smash of things like that. 01:07:14.220 |
FRANCESC CAMPOY: So you will do that for a customer? 01:07:18.700 |
helped some of our customers be able to do that some stuff. 01:07:25.460 |
we've rewritten them to use that stuff as well. 01:07:28.860 |
And the stable diffusion that we run, for example, 01:07:31.260 |
is compiled with AI template to make it super fast. 01:07:40.420 |
But you can imagine ways that we could help people. 01:07:43.340 |
It's almost like built into the Cog layer maybe, 01:07:45.380 |
where we could help people use these fast inference servers 01:07:48.420 |
or use AI template to compile their models to make it faster. 01:07:51.980 |
Whether it's manual, semi-manual, or automatic, 01:08:02.060 |
there was a price war on McStraw last year, this last December. 01:08:06.620 |
As far as I can tell, you guys did not enter that war. 01:08:09.780 |
You have McStraw, but it's just regular pricing. 01:08:20.260 |
You don't have to say anything, but the break-even 01:08:23.020 |
is somewhere between $0.50 to $0.75 per million tokens served. 01:08:28.500 |
How are you thinking about just the overall competitiveness 01:08:32.340 |
How should people choose when everyone's an API? 01:08:41.540 |
I think not McStraw, but I can't remember exactly-- 01:08:44.340 |
we have similar performance and similar price 01:08:50.540 |
We're not bargain basement to some of the others, 01:08:54.220 |
because to your point, we don't want to burn tons of money. 01:08:58.780 |
But we're pricing it sensibly and sustainably to a point 01:09:02.940 |
where we think it's competitive with other people, such 01:09:10.700 |
And we don't want to price it such that it's only 01:09:19.740 |
But we also don't want the super cheap prices, 01:09:22.700 |
because then it's almost like then your customers are hostile. 01:09:26.780 |
And the more customers you get, the worse it gets. 01:09:29.980 |
So we're pricing it sensibly, but still to the point 01:09:33.020 |
where hopefully it's cheap enough to build on. 01:09:48.100 |
But I think the really crucial thing about Replicate 01:09:56.460 |
not just the API for the model that is the important bit. 01:10:03.900 |
the whole point of open source is that you can tinker on it, 01:10:05.860 |
and you can customize it, and you can fine tune it, 01:10:07.940 |
and you can smush it together with another model, 01:10:13.020 |
And you can't do that if it's just a hosted API, 01:10:29.260 |
So we've got all of these models where the performance and price 01:10:35.220 |
But if you want to customize it, you can fine tune it. 01:10:37.540 |
You can go to GitHub and get the source code for it, 01:10:39.660 |
and edit the source code, and push up your own custom 01:10:42.980 |
Because that's the crucial thing for open source machine 01:10:47.500 |
learning, is be able to tinker on it and customizing it. 01:11:01.820 |
When Lama 2 came out, I wrote a post about this. 01:11:05.620 |
It's like open source, and there's open weights, 01:11:11.620 |
so there was all sort of comments from people. 01:11:16.740 |
What do you think is OK for people to license? 01:11:29.060 |
open source, little models, purely open source stuff. 01:11:35.780 |
where model companies putting restrictive licenses 01:11:41.980 |
That means it can only be used for non-commercial use. 01:11:52.420 |
And I think a lot of that is coming from philosophy, 01:11:56.180 |
the sort of free software movement kind of philosophy. 01:11:59.260 |
And I don't think it's necessarily a bad thing. 01:12:01.940 |
I think it's good that model companies can make money out 01:12:09.180 |
And I think it's totally fine if somebody made something 01:12:16.140 |
And I think there's some really interesting midpoints, as well, 01:12:23.140 |
still wants to get a cut of it if you're making 01:12:28.300 |
And that's going to make the ecosystem more sustainable. 01:12:33.220 |
I don't think anybody's really figured it out yet. 01:12:34.780 |
And we're going to see more experimentation with this 01:12:39.780 |
what are the business models around building models? 01:12:45.860 |
And I think it's something we want to support as Replicate, 01:12:53.140 |
But there's also going to be lots of models which 01:13:01.940 |
of a bunch of people building models that don't have 01:13:10.980 |
and help them make money and that kind of thing. 01:13:13.460 |
I think the compute requirements of AI kind of changed the thing. 01:13:19.780 |
And before, it was kind of man hours was really 01:13:27.340 |
Well, not that man hours are not worth a lot. 01:13:30.260 |
But if you think about Llama 2, it's like $25 million all in. 01:13:53.740 |
But all we care about is that Llama 2 is open. 01:13:59.180 |
if Mistral was not open source, we would be in a bad spot. 01:14:06.860 |
Because the beautiful thing about Llama 2 as a base model 01:14:11.260 |
is that, yeah, it costs $25 million to train to start with. 01:14:17.700 |
And that's what's so beautiful about the open source ecosystem 01:14:21.340 |
and something I think is really surprising as well. 01:14:25.500 |
I think a lot of people assumed that it's not 01:14:30.220 |
going to be-- open source machine learning is just not 01:14:37.540 |
And people are getting really good results out of it. 01:14:40.900 |
So people can effectively create open source models really 01:14:46.580 |
And there's going to be this sort of ecosystem 01:14:50.860 |
And I think the risk there from a licensing point of view 01:14:53.260 |
is we need to make sure that the licenses let people do that. 01:14:58.020 |
under a non-commercial license and people can't fine tune it, 01:15:03.460 |
And I'm sure there are ways to structure that such 01:15:10.220 |
and they can feel like they should keep on training models. 01:15:23.340 |
You've been an excellent, very open guest so far. 01:15:31.700 |
But I feel like you found the AI engineer crew before I did. 01:15:42.260 |
about how there are two orders of magnitude more software 01:15:45.300 |
engineers than there are machine learning engineers, about 30 01:15:47.800 |
million software engineers and 500,000 machine learning 01:15:50.900 |
You can maybe plus/minus one of those orders of magnitude, 01:15:54.700 |
And so obviously, there will be a lot more AI engineers 01:16:18.580 |
going to be a large part of how we build software in the future 01:16:21.540 |
It's a bit like being a software developer in the '90s 01:16:40.540 |
need to be digging down into this sort of PyTorch level 01:16:46.880 |
In the same way as a software engineer in the '90s, 01:16:51.260 |
how network stacks work to be able to build a website. 01:16:53.380 |
But you need to understand the shape of this thing 01:16:55.000 |
and how to hold it and what it's good at and what it's not. 01:17:08.340 |
Get a feel of how these diffusion models work. 01:17:20.500 |
because some of your job might be writing a prompt. 01:17:22.620 |
And those are just all really important skills 01:17:29.900 |
Well, thanks for building the definitive platform 01:17:43.900 |
If you click on Jobs at the bottom of replicate.com, 01:17:56.260 |
Like, the whole reason I started this company 01:18:00.320 |
Like, Andreas is like a proper machine learning person 01:18:03.660 |
And I was just like a sort of lonely software engineer. 01:18:07.180 |
And I was like, you're doing really cool stuff, 01:18:17.260 |
And I just encourage anyone who wants to try this stuff out, 01:18:25.660 |
Like, the limiting factor now on AI is not like the technology. 01:18:29.460 |
Like, the technology has made incredible advances. 01:18:31.980 |
And there's just so many incredible machine learning 01:18:37.500 |
The limiting factor is just like making that accessible 01:18:41.900 |
Because it's really hard to use this stuff right now. 01:18:44.300 |
And obviously, we're building some of that stuff 01:18:46.380 |
But there's just like a ton of other tooling and abstractions 01:18:49.180 |
that need to be built out to make this stuff usable. 01:18:51.580 |
So I just encourage people who like building developer tools 01:18:56.220 |
Because that's going to make this stuff accessible 01:18:59.820 |
I especially want to highlight you have a Hacker-in-Residence 01:19:02.380 |
job opening available, which not every company has, 01:19:07.380 |
I think Charlie Holtz is doing a fantastic job of that. 01:19:12.060 |
a lot of our job is just like showing people how to use AI. 01:19:15.660 |
So we've just got a team of software developers. 01:19:17.700 |
And people have kind of figured this stuff out 01:19:19.820 |
who are writing about it, who are making videos about it, 01:19:26.740 |
to show people what you can do with this stuff. 01:19:40.680 |
FRANCESC CAMPOY: Tell me this came from Chroma. 01:19:45.220 |
Antoine actually was like, hey, we came up with that first. 01:19:47.900 |
But I think we came up with it independently. 01:19:49.140 |
FRANCESC CAMPOY: Yeah, I made that page, yeah. 01:19:50.060 |
CHRIS BANES: I think we came up with it independently. 01:19:52.300 |
Because the story behind this is we originally 01:20:00.980 |
CHRIS BANES: And Zeke was like, that sounds so boring. 01:20:03.300 |
I have to go to someone and say I'm a developer relations 01:20:06.360 |
FRANCESC CAMPOY: You don't want to be a hacker man. 01:20:07.500 |
CHRIS BANES: Or a developer advocate or something. 01:20:17.020 |
I get from Replicate, everyone on your team I interact with. 01:20:28.860 |
And I think you're a really positive presence 01:20:33.020 |
And it's instilling the hacker vibe and culture into AI.