Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

00:00:00.000 | Hey, everyone.

00:00:00.960 | Welcome to the Lead in Space Podcast.

00:00:02.820 | This is Alessio, partner and CTO in Residence

00:00:05.400 | at Decibel Partners.

00:00:06.480 | And I'm joined by my co-host, Svex, founder of Small.ai.

00:00:09.480 | Hey, and today we have in the studio, Sumev Chantala.

00:00:11.720 | Welcome.

00:00:12.240 | Thanks for having me.

00:00:13.800 | On one of your rare visits from New York, where you live.

00:00:18.640 | You got your start in computer vision at NYU with Yann LeCun.

00:00:25.320 | That was a very fortuitous start.

00:00:27.300 | I was actually listening to your interview

00:00:29.460 | on the Gradient Podcast.

00:00:30.440 | So if people want to know more about the history of Sumev,

00:00:33.320 | history of PyTorch, they can go to that podcast.

00:00:35.580 | We won't spend that much time there.

00:00:37.240 | But I just was marveling at your luck,

00:00:39.400 | or I don't know if it's your luck or your drive

00:00:42.280 | to find AI early and then find the right quality mentor.

00:00:47.460 | Because I guess Yann really introduced you to that world.

00:00:51.480 | You're talking about extrinsic success, right?

00:00:54.000 | A lot of people just have drive to do things

00:00:57.920 | that they think is fun.

00:01:00.080 | And a lot of those things might or might not

00:01:02.760 | be extrinsically perceived as good and successful.

00:01:06.960 | I think I just happen to like something

00:01:10.620 | that is now like one of the coolest things in the world

00:01:15.460 | or whatever.

00:01:16.580 | But if I happen--

00:01:18.640 | the first thing I tried to become was 3D VFX artists.

00:01:25.080 | And I was really interested in doing that,

00:01:28.560 | but I turned out to be very bad at it.

00:01:31.360 | So I ended up not doing that further.

00:01:33.300 | But even if I was good at that, whatever,

00:01:36.400 | and I ended up going down that path,

00:01:39.480 | I probably would have been equally happy.

00:01:41.800 | It's just like maybe the perception of, oh,

00:01:45.060 | is this person successful or not might be different.

00:01:48.120 | But I think after a baseline, your happiness is probably

00:01:53.540 | more correlated with your intrinsic stuff.

00:01:56.140 | Yes.

00:01:57.020 | I think Dan Pink has this book on drive

00:01:59.680 | that I often refer to about the power of intrinsic motivation

00:02:03.020 | versus extrinsic and how long extrinsic lasts.

00:02:05.520 | It's not very long at all.

00:02:07.440 | But anyway, now you are an investor in Runway.

00:02:09.640 | So in a way, you're working on VFX.

00:02:12.620 | Yes.

00:02:13.160 | I mean, in a very convoluted way.

00:02:15.560 | It reminds me of the Ed Catmull.

00:02:17.840 | I don't know if you guys know.

00:02:19.680 | He actually tried to become an animator in his early years

00:02:22.520 | and failed, or didn't get accepted by Disney,

00:02:25.260 | and then went and created Pixar and then got bought by Disney

00:02:28.760 | and created Toy Story.

00:02:31.340 | So you joined Facebook in 2014 and eventually became

00:02:35.480 | creator and maintainer of PyTorch.

00:02:37.500 | And there's this long story there

00:02:39.500 | you can refer to on the gradient.

00:02:41.060 | But you also-- I think maybe people don't know that you also

00:02:44.040 | involved in more hardware and cluster decision affair.

00:02:47.000 | And we can dive into more details

00:02:48.500 | there, because we're all about hardware this month.

00:02:52.600 | And then finally, I don't know what else should people

00:02:55.860 | know about you on the personal side or the professional side.

00:02:58.360 | I think open source is definitely

00:03:00.940 | like a big passion of mine and probably forms

00:03:03.620 | a little bit of my identity at this point.

00:03:05.400 | I am irrationally interested in open source.

00:03:11.980 | It's like one of those things that I attribute to--

00:03:17.600 | I think open source has that fundamental way

00:03:21.220 | to distribute opportunity in a way that is very powerful.

00:03:27.680 | I grew up in India.

00:03:30.740 | I didn't have internet for a while.

00:03:33.940 | And in college, actually, I didn't have internet,

00:03:36.620 | except for like GPRS or whatever.

00:03:40.880 | So just having-- and knowledge was very centralized.

00:03:45.920 | But I saw that evolution of knowledge

00:03:47.820 | slowly getting decentralized.

00:03:49.940 | And that ended up helping me learn quicker and faster

00:03:54.560 | for like $0.

00:03:56.460 | And I think that was a strong reason why

00:04:00.460 | I ended up where I am.

00:04:02.420 | So the open source side of things,

00:04:04.900 | I always push regardless of what I get paid for.

00:04:10.100 | I think I would do that as a passion project on the side.

00:04:13.440 | Yeah, that's wonderful.

00:04:14.480 | And we will talk about the challenges

00:04:16.500 | as well that open source has, open models versus closed

00:04:19.780 | models.

00:04:20.860 | But maybe you want to touch a little bit on PyTorch

00:04:23.020 | before we move on to sort of meta AI in general.

00:04:25.500 | Yeah, we kind of touched on PyTorch in a lot of episodes.

00:04:28.500 | So we had George Hotz from TinyGret.

00:04:31.660 | He called PyTorch a CISC and TinyGret a RISC.

00:04:36.500 | I would love to get your thoughts on PyTorch design

00:04:40.460 | direction as far as--

00:04:42.420 | I know you talk a lot about kind of having a happy path

00:04:45.900 | to start with and then making complexity hidden away,

00:04:48.420 | but then available to the end user.

00:04:50.800 | One of the things that George mentioned

00:04:52.340 | is I think you have like 250 primitive operators in PyTorch.

00:04:56.120 | I think TinyGret is four.

00:04:57.600 | So how do you think about some of the learnings

00:05:02.020 | that maybe he's going to run into that you already

00:05:04.520 | had in the past seven, eight years almost of running PyTorch?

00:05:08.720 | Yeah, I think everyone starts--

00:05:12.480 | there's different models here, but I

00:05:13.940 | think it's two different models that people generally

00:05:16.560 | start with.

00:05:17.040 | Either they go like, I have a grand vision,

00:05:19.760 | and I'm going to build a giant system that

00:05:22.000 | achieves this grand vision.

00:05:23.060 | And my B1 is like super complex, feature complete, whatever.

00:05:28.840 | Or other people say they will get incrementally ambitious.

00:05:33.160 | They say, oh, we'll start with something simple,

00:05:35.120 | and then we'll slowly layer out complexity in a way

00:05:37.700 | that optimally applies Huffman coding or whatever.

00:05:42.860 | Where the density of users are and what they're using,

00:05:47.680 | I would want to keep it in the easy, happy path.

00:05:50.440 | And where the more niche advanced use cases,

00:05:53.520 | I still want people to try them, but they

00:05:57.140 | need to take additional frictional steps.

00:06:01.360 | George, I think, just like we started with PyTorch,

00:06:05.000 | George started with the incrementally ambitious thing.

00:06:09.660 | I remember TinyGrad used to be like we

00:06:14.780 | would be limited to 1,000 lines of code,

00:06:16.680 | and I think now it's like 5,000.

00:06:19.440 | So I think there is no real magic to which why PyTorch

00:06:25.320 | has a kind of complexity.

00:06:26.640 | I think it's probably partly necessitated and partly

00:06:32.100 | because we built with the technology available under us

00:06:35.840 | at that time.

00:06:38.320 | PyTorch is like 190,000 lines of code

00:06:41.220 | or something at this point.

00:06:43.120 | I think if we had to rewrite it, we would probably

00:06:45.960 | think about ways to rewrite it in a vastly simplified way,

00:06:51.480 | for sure.

00:06:52.980 | But a lot of that complexity comes from the fact

00:06:55.840 | that in a very simple, explainable way,

00:07:02.180 | you have memory hierarchies.

00:07:05.160 | CPU has like three levels of caches,

00:07:07.720 | and then you have DRAM and SSD, and then you have network.

00:07:13.680 | Similarly, GPU has several levels of memory,

00:07:17.220 | and then you have different levels of network hierarchies,

00:07:19.960 | NVLink plus InfiniBand or Rocky or something like that.

00:07:26.680 | And the way the flops are available on your hardware,

00:07:31.960 | they are available in a certain way,

00:07:34.040 | and your computation is in a certain way,

00:07:36.040 | and you have to retrofit your computation

00:07:37.880 | onto both the memory hierarchy and the flops available.

00:07:42.620 | When you're doing this, it is actually

00:07:45.040 | like a fairly hard mathematical problem to do this setup,

00:07:52.960 | like find the optimal thing.

00:07:55.000 | And finding the optimal thing is like, what is optimal?

00:07:58.440 | What is optimal depends on the input variables themselves.

00:08:02.440 | So like, OK, what is the shape of your input tensors,

00:08:05.240 | and what is the operation you're trying to do,

00:08:08.560 | and various things like that.

00:08:12.000 | Finding that optimal configuration

00:08:16.280 | and writing it down in code is not

00:08:20.240 | the same for every input configuration you have.

00:08:27.400 | For example, just as the shape of the tensors change,

00:08:31.560 | let's say you have three input tensors into a sparse dot

00:08:38.280 | product or something like that.

00:08:40.800 | The shape of each of these input tensors

00:08:43.000 | will vastly change how you do this optimally placing

00:08:48.640 | this operation onto the hardware in a way that will

00:08:51.440 | get you maximal throughput.

00:08:53.440 | So a lot of our complexity comes from writing out

00:08:59.240 | like hundreds of configurations for each single PyTorch

00:09:03.840 | operator and templatizing these things

00:09:07.200 | and symbolically generating the final CUDA code or CPU code.

00:09:15.000 | There's no way to avoid it, because mathematically we

00:09:17.080 | haven't found symbolic ways to do this that also

00:09:23.000 | keep compile time near zero.

00:09:26.640 | You can write a very simple framework,

00:09:29.600 | but then you also should be willing to eat

00:09:33.040 | the long compile times of searching

00:09:35.440 | for that optimal performance at runtime.

00:09:38.000 | So that's the trade-off.

00:09:40.520 | I don't think, unless we have great breakthroughs,

00:09:43.920 | George's vision is achievable.

00:09:47.640 | Or he should be thinking about a narrower problem, such as,

00:09:51.240 | I'm only going to make this work for self-driving car continents.

00:09:55.920 | Or I'm only going to make this work for LLM transformers

00:10:00.440 | of the llama style.

00:10:02.400 | If you start narrowing the problem down,

00:10:04.320 | you can make a vastly simpler framework.

00:10:07.800 | But if you don't, if you need the generality

00:10:10.480 | to power all of the AI research that is happening

00:10:13.720 | and keep zero compile time and all these other factors,

00:10:18.120 | I think it's not easy to avoid the complexity.

00:10:23.240 | That's interesting.

00:10:24.080 | We kind of touched on this with Chris Ladner

00:10:26.760 | when he was on the podcast.

00:10:28.160 | If you think about frameworks, they have the model target.

00:10:31.520 | They have the hardware target.

00:10:32.840 | They have different things to think about.

00:10:34.880 | He mentioned, when he was at Google,

00:10:36.560 | TensorFlow is trying to be optimized to make TPUs go brr

00:10:40.840 | and go as fast.

00:10:43.000 | I think George is trying to make, especially AMD stack,

00:10:46.200 | be better than Rockum.

00:10:47.880 | How come PyTorch has been such as Switzerland

00:10:50.840 | versus just making meta hardware go brr?

00:10:54.320 | First, meta is not in the business of selling hardware.

00:10:57.760 | Meta is not in the business of cloud compute.

00:11:03.640 | We kind of-- the way meta thinks about funding PyTorch is it's

00:11:11.640 | just like we're funding it because it's net good for meta

00:11:17.000 | to fund PyTorch because PyTorch has become a standard

00:11:20.280 | and a big open source project.

00:11:22.440 | And generally, it gives us a timeline edge.

00:11:27.240 | It gives us various leverage and all that within our own work.

00:11:32.920 | So why is PyTorch more of a Switzerland

00:11:38.080 | rather than being opinionated?

00:11:40.440 | I think the way we think about it is not in terms of Switzerland

00:11:43.800 | or not.

00:11:44.440 | Actually, the way we articulated to all hardware vendors

00:11:47.940 | and software vendors and all who come to us being like,

00:11:51.600 | we want to build a backend in core for PyTorch

00:11:54.080 | and ship it by default is we just only look

00:11:57.880 | at our user side of things.

00:12:00.160 | If users are using a particular piece of hardware,

00:12:03.560 | then we want to support it.

00:12:05.460 | We very much don't want to king make the hardware

00:12:09.260 | side of things.

00:12:11.720 | So as the MacBooks have GPUs and as that stuff

00:12:16.880 | started getting increasingly interesting,

00:12:19.960 | we pushed Apple to push some engineers and work

00:12:24.080 | on the MPS support.

00:12:25.040 | And we spent significant time from like meta funded

00:12:28.320 | engineers on that as well.

00:12:29.820 | Because a lot of people are using the Apple GPUs

00:12:34.040 | and there is demand.

00:12:35.360 | So we kind of mostly look at it from the demand side.

00:12:38.360 | We never look at it from like, oh,

00:12:40.960 | which hardware should we start taking opinions on?

00:12:44.480 | Is there a future in which-- because Mojo or Modulus

00:12:48.720 | Mojo is kind of a superset of Python--

00:12:50.360 | is there a future in which PyTorch might use

00:12:53.400 | Mojo features optionally?

00:12:55.560 | I think it depends on how well integrated

00:12:57.920 | it is into the Python ecosystem.

00:13:01.760 | So if Mojo is like a PIP install and it's readily available

00:13:06.960 | and users feel like they can use Mojo so smoothly

00:13:11.680 | within their workflows within--

00:13:14.840 | in a way that just is slow friction,

00:13:19.780 | we would definitely look into that.

00:13:21.440 | In the same way, PyTorch now depends on Triton,

00:13:24.720 | like OpenAI Triton.

00:13:26.500 | And we never had a conversation that was like, huh,

00:13:32.280 | that's like a dependency.

00:13:33.640 | Should we just build a Triton of our own

00:13:36.640 | or should we use Triton?

00:13:38.720 | It almost doesn't-- those conversations don't really

00:13:41.600 | come up for us.

00:13:43.000 | The conversations are more like, well, does Triton

00:13:45.200 | have 10,000 dependencies and is it hard to install?

00:13:48.760 | We almost don't look at these things

00:13:51.160 | from a strategic leverage point of view.

00:13:54.160 | We look at these things from a user experience point of view.

00:13:57.880 | Is it easy to install?

00:13:59.080 | Is it like smoothly integrated?

00:14:00.600 | If so, we should consider--

00:14:02.840 | and does it give enough benefits for us

00:14:04.680 | to start depending on it?

00:14:05.760 | If so, yeah, we should consider it.

00:14:06.800 | That's how we think about it.

00:14:07.960 | You're inclusive by default as long as it

00:14:09.720 | meets the minimum bar.

00:14:11.160 | Yeah.

00:14:12.440 | But maybe I phrased it wrongly.

00:14:14.360 | Maybe it's more like, OK, what problems

00:14:16.100 | would you look to solve that you have right now?

00:14:20.320 | I think it depends on what problems

00:14:21.880 | Mojo will be useful at.

00:14:25.680 | It's more performance, mainly a performance pitch,

00:14:28.520 | some amount of cross-compiling pitch.

00:14:30.960 | Yeah, I think the performance pitch for Mojo was like,

00:14:34.320 | we're going to performant even if you

00:14:37.760 | have a lot of custom stuff.

00:14:41.360 | You can write arbitrary custom things,

00:14:43.400 | and we will be performant.

00:14:45.760 | And that value proposition is not

00:14:49.520 | clear to us from the PyTorch side

00:14:53.840 | to consider it for PyTorch.

00:14:56.180 | So PyTorch exposes-- it's actually not 250 operators,

00:15:01.160 | like 1,000 operators.

00:15:02.360 | PyTorch exposes about 1,000 operators,

00:15:04.400 | and people write their ideas in the 1,000 operators of PyTorch.

00:15:10.080 | Mojo is like, well, maybe it's OK to completely sidestep

00:15:17.240 | those 1,000 operators of PyTorch and just write it

00:15:20.160 | in a more natural form, just write like raw Python,

00:15:23.440 | write for loops or whatever.

00:15:25.400 | So from the consideration of how do we intersect PyTorch

00:15:31.160 | with Mojo, I can see one use case

00:15:33.600 | where you have custom stuff for some parts of your program,

00:15:39.800 | but mostly it's PyTorch.

00:15:41.480 | And so we can probably figure out

00:15:42.880 | how to make it easier for, say, torch.compile to smoothly also

00:15:49.200 | consume Mojo subgraphs, and the interoperability

00:15:53.640 | being actually usable.

00:15:56.520 | That I think is valuable.

00:15:57.560 | But Mojo as a fundamental front end

00:16:00.480 | would be replacing PyTorch, not augmenting PyTorch.

00:16:06.240 | So in that sense, I don't see a synergy in more deeply

00:16:10.800 | integrating Mojo.

00:16:12.760 | So call out to Mojo whenever they

00:16:15.040 | have written something in Mojo and there's some performance

00:16:18.720 | related thing going on.

00:16:20.340 | And then since you mentioned Apple,

00:16:24.160 | what should people think of PyTorch versus MLX?

00:16:26.720 | I mean, MLX is early, and I know the folks well.

00:16:32.160 | Ani used to work at FAIR, and I used to chat with him

00:16:38.260 | all the time.

00:16:38.840 | He used to be based out of New York as well.

00:16:42.560 | The way I think about MLX is that MLX is specialized

00:16:49.800 | for Apple right now.

00:16:53.760 | It has a happy path because it's defined

00:16:58.140 | its product in a narrow way.

00:17:00.400 | At some point, MLX either says we will only

00:17:05.040 | be supporting Apple and we will just focus on enabling--

00:17:11.000 | this is a framework if you use your MacBook,

00:17:13.060 | but once you go server side or whatever, that's not my problem

00:17:16.400 | and I don't care.

00:17:19.000 | Or MLX, it enters the server side set of things as well.

00:17:24.240 | One of these two things will happen, right?

00:17:26.240 | If the first thing will happen, MLX's overall addressable

00:17:29.280 | market will be small, but it'll probably

00:17:32.040 | do well within that addressable market.

00:17:34.920 | If it enters the second phase, they're

00:17:37.640 | going to run into all the same complexities

00:17:39.440 | that we have to deal with.

00:17:42.200 | They will not have any magic wand,

00:17:44.120 | and they will have vastly more complex work to do.

00:17:49.460 | They probably wouldn't be able to move as fast in certain ways.

00:17:52.800 | Like having to deal with distributed compute.

00:17:55.020 | Distributed, NVIDIA-named GPUs, just

00:17:58.520 | like having a generalization of the concept of a back end,

00:18:02.400 | how they treat compilation with plus overheads.

00:18:07.000 | Right now, they deeply assume the whole MPS graph thing.

00:18:12.480 | So they need to think about all these additional things

00:18:16.680 | if they end up expanding onto the server side.

00:18:19.480 | And they'll probably build something like PyTorch

00:18:22.720 | as well, right?

00:18:23.760 | Eventually, that's where it will land.

00:18:26.020 | And I think there they will fail on the lack of differentiation.

00:18:31.780 | It wouldn't be obvious to people why they would want to use it.

00:18:36.200 | I mean, there are some cloud companies offering M1 and M2

00:18:39.000 | chips on servers.

00:18:41.120 | I feel like it might be interesting for Apple

00:18:43.320 | to pursue that market, but it's not their core.

00:18:45.760 | Yeah, I mean, if Apple can figure out their interconnect

00:18:49.080 | story, maybe, then it can become a thing.

00:18:52.480 | Honestly, that's more interesting than the cars.

00:18:54.360 | Yes.

00:18:56.160 | I think the mode that NVIDIA has right now, I feel like,

00:18:59.760 | is that they have the interconnect

00:19:01.940 | that no one else has.

00:19:03.820 | AMD GPUs are pretty good.

00:19:06.940 | I'm sure there is very silicon that is not bad at all.

00:19:10.660 | But the interconnect, like NVLink, is uniquely awesome.

00:19:16.340 | So I'm sure the other hardware providers are working on it.

00:19:21.060 | I feel like when you say it's uniquely awesome, you

00:19:23.260 | have some appreciation of it that the rest of us don't.

00:19:25.840 | I mean, the rest of us just like--

00:19:27.220 | we hear marketing lines, but what

00:19:28.800 | do you mean when you say NVIDIA is very good at networking?

00:19:32.000 | Obviously, they made the acquisition maybe 15 years ago.

00:19:34.420 | It's just like the bandwidth it offers

00:19:37.120 | and the latency it offers.

00:19:38.660 | I mean, TPUs also have a good interconnect,

00:19:41.660 | but you can't buy them.

00:19:43.080 | So you have to go to Google to use it.

00:19:46.700 | Who are some of the other fair PyTorch alumni that

00:19:50.100 | are building cool companies?

00:19:51.220 | I know you have Fireworks AI, Lightning AI, Lepton.

00:19:55.180 | And Youngking, you knew since college

00:19:58.420 | when he was building coffee.

00:20:00.060 | Yeah, so Yanqing and I used to be framework rivals,

00:20:03.780 | like Cafe, Torch.

00:20:06.460 | I mean, we were all a very small, close-knit community

00:20:09.180 | back then.

00:20:13.060 | Cafe, Torch, Tiano, Chainer, Keras, various frameworks.

00:20:22.820 | I mean, it used to be more like 20 frameworks.

00:20:25.820 | I can't remember all the names.

00:20:27.540 | CCB by Liu Liu, who is also based out of SF.

00:20:33.700 | And one of the ways it was interesting

00:20:37.420 | is you went into the framework guts

00:20:39.900 | and saw if someone wrote their own convolution kernel,

00:20:43.540 | or they were just copying someone else's.

00:20:47.140 | And there were four or five convolution kernels

00:20:50.940 | that were unique and interesting.

00:20:53.900 | There was one from this guy out of Russia.

00:20:57.620 | I forgot the name.

00:21:00.380 | But I remembered who was awesome enough

00:21:03.900 | to have written their own kernel.

00:21:08.180 | And at some point there, I built out these benchmarks

00:21:13.340 | called ConNet benchmarks that they were just

00:21:18.020 | benchmarking all the convolution kernels that

00:21:21.780 | were available at that time.

00:21:25.380 | And it hilariously became big enough that at that time,

00:21:30.060 | AI was getting important, but not important enough

00:21:34.020 | that industrial strength players came

00:21:37.020 | in to do these kind of benchmarking and standardization.

00:21:39.660 | Like we have MLPerf today.

00:21:41.780 | So a lot of the startups were using ConNet benchmarks

00:21:47.980 | in their pitch decks as like, oh, you know,

00:21:51.380 | on ConNet benchmarks, this is how we fare,

00:21:54.220 | so you should fund us.

00:21:55.820 | I remember Nirvana actually was at the top of the pack

00:21:58.420 | because Scott Gray wrote amazingly fast convolution

00:22:03.220 | kernels at that time.

00:22:06.540 | Very interesting, but separate times.

00:22:08.020 | But to answer your question, Alessio,

00:22:10.660 | I think mainly Lepton fireworks are the two most obvious ones.

00:22:17.460 | But I'm sure the fingerprints are a lot wider.

00:22:27.060 | They're just people who worked within the PyTorch Cafe

00:22:31.620 | to a cohort of things and now end up

00:22:35.260 | at various other places.

00:22:38.980 | I think both as an investor and people looking

00:22:45.100 | to build on top of their services,

00:22:47.940 | it's an uncomfortable slash I don't

00:22:51.740 | know what I don't know pitch.

00:22:53.500 | Because I've met Yang Ting and I've met--

00:22:56.260 | Lin Chao.

00:22:57.060 | Yeah, I've met these folks.

00:22:59.060 | And they're like, you know, we are deep in the PyTorch

00:23:02.380 | ecosystem, and we serve billions of inferences a day

00:23:05.140 | or whatever at Facebook, and now we can do it for you.

00:23:07.980 | And I'm like, OK, that's great.

00:23:10.220 | What should I be wary of or cautious of

00:23:12.740 | when these things happen?

00:23:13.900 | Because I'm like, obviously, this experience

00:23:16.580 | is extremely powerful and valuable.

00:23:20.660 | I just don't know what I don't know.

00:23:22.580 | What should people know about these sort of new inference

00:23:26.380 | as a service companies?

00:23:28.140 | At that point, you would be investing in them

00:23:30.420 | for their expertise of one kind.

00:23:33.940 | So if they've been at a large company,

00:23:38.180 | but they've been doing amazing work,

00:23:39.900 | you would be thinking about it as like, OK,

00:23:41.860 | what these people bring to the table

00:23:43.660 | is that they're really good at GPU programming

00:23:48.140 | or understanding the complexity of serving models

00:23:52.780 | once it hits a certain scale, various expertise

00:23:58.380 | from the infra and AI and GPUs point of view.

00:24:03.780 | What you would obviously want to figure out

00:24:06.980 | is whether their understanding of the external markets

00:24:12.300 | is clear, whether they know and understand

00:24:15.420 | how to think about running a business,

00:24:19.980 | understanding how to be disciplined about making money,

00:24:23.860 | or various things like that.

00:24:25.540 | Maybe I'll put it--

00:24:26.980 | actually, I will de-emphasize the investing bit,

00:24:29.020 | and just more as a potential customer.

00:24:31.820 | It's more like, OK, you're PyTorch gods, of course.

00:24:37.260 | What else should I know?

00:24:39.020 | I mean, I would not care about who's building something

00:24:42.820 | if I'm trying to be a customer.

00:24:44.220 | I would care about whether--

00:24:45.860 | The benchmarks.

00:24:46.660 | Yeah, I use it.

00:24:48.580 | And it's usability, and reliability, and speed.

00:24:53.020 | Quality as well.

00:24:53.980 | Yeah, if someone from some random unknown place

00:24:58.860 | came to me and said, user stuff is great,

00:25:04.100 | and I have the bandwidth, I probably will give it a shot.

00:25:06.780 | And if it turns out to be great, I'll just use it.

00:25:10.300 | OK, great.

00:25:11.700 | And then maybe one more thing about benchmarks,

00:25:13.660 | since we already brought it up, and you brought up

00:25:15.420 | Confnet benchmarks.

00:25:16.620 | There was some recent drama around Antiscale.

00:25:20.660 | Antiscale released their own benchmarks,

00:25:22.340 | and obviously they looked great on their own benchmarks.

00:25:24.660 | But maybe didn't give the other--

00:25:28.220 | I feel like there are two lines of criticism.

00:25:30.260 | One, which is they didn't test apples for apples

00:25:33.620 | on the kind of endpoints that the other providers

00:25:36.940 | that they are competitors with on their benchmarks.

00:25:39.900 | And that is due diligence baseline.

00:25:41.980 | And then the second would be more just

00:25:43.700 | optimizing for the right thing.

00:25:45.700 | You had some commentary on it.

00:25:46.940 | I'll just let you riff.

00:25:48.060 | Yeah, I mean, in summary, basically my criticism

00:25:53.140 | that Antiscale built these benchmarks for end users

00:25:58.780 | to just understand what they should pick.

00:26:01.340 | And that's a very good thing to do.

00:26:03.900 | I think what they didn't do a good job of

00:26:06.060 | is give that end user a full understanding of what

00:26:11.060 | they should pick.

00:26:11.780 | They just gave them a very narrow slice

00:26:14.900 | of understanding.

00:26:15.660 | I think they just gave them latency numbers,

00:26:19.340 | and that's not sufficient.

00:26:22.980 | You need to understand your total cost of ownership

00:26:26.300 | at some reasonable scale.

00:26:27.700 | Not like, oh, like one API call is like $0.01,

00:26:30.540 | but like 1,000 API calls are like $0.10.

00:26:36.580 | People can misprice to cheat on those benchmarks.

00:26:39.220 | So you want to understand, OK, how much is it

00:26:42.860 | going to cost me if I actually subscribe to you

00:26:45.980 | and do like a million API calls a month or something?

00:26:49.460 | And then you want to understand the latency and reliability,

00:26:55.340 | not just from one call you made, but an aggregate of calls

00:27:01.140 | you made over various times of the day and times of the week

00:27:05.540 | and the nature of the workloads.

00:27:08.260 | Is it just like some generic single paragraph

00:27:11.220 | that you're sending that is cashable,

00:27:13.260 | or is it like testing a real world workload?

00:27:17.540 | I think that kind of rigor in presenting

00:27:21.060 | that benchmark wasn't there.

00:27:22.460 | It was a much more narrow sliver of what should

00:27:26.300 | have been a good benchmark.

00:27:28.300 | That was my main criticism.

00:27:30.060 | And I'm pretty sure if before they released it,

00:27:33.580 | they showed it to their other stakeholders who

00:27:38.580 | would be caring about this benchmark

00:27:40.940 | because they are present in it, they

00:27:43.020 | would have easily just pointed out these gaps.

00:27:46.020 | And I think they didn't do that, and they just released it.

00:27:50.020 | So I think those were the two main criticisms.

00:27:52.620 | And I think they were fair, and Robert took it well.

00:27:54.980 | He took it very well.

00:27:56.060 | Yeah, we'll have him on at some point, and we'll discuss it.

00:27:58.820 | But I think it's important for--

00:28:00.140 | I think the market being maturing enough

00:28:01.900 | that people start caring and competing

00:28:03.500 | on these kinds of things means that we

00:28:05.340 | need to establish what best practice is,

00:28:07.740 | because otherwise everyone's going to play dirty.

00:28:09.780 | Yeah, absolutely.

00:28:11.860 | My view of the LLM inference market in general

00:28:14.380 | is that it's like the laundromat model.

00:28:19.260 | The margins are going to drive down towards the bare minimum.

00:28:23.940 | It's going to be all kinds of arbitrage between how much you

00:28:26.820 | can get the hardware for and then how much you sell the API

00:28:30.300 | and how much latency your customers are

00:28:32.660 | willing to let go.

00:28:34.500 | You need to figure out how to squeeze your margins.

00:28:36.620 | What is your unique thing here?

00:28:40.260 | I think Together and Fireworks and all these people

00:28:42.860 | are trying to build some faster CUDA kernels and faster

00:28:48.060 | hardware kernels in general.

00:28:50.540 | But those modes only last for a month or two.

00:28:53.180 | These ideas quickly propagate.

00:28:55.340 | Even if they're not published?

00:28:57.580 | Even if they're not published, the idea space is small.

00:29:03.460 | So even if they're not published,

00:29:06.460 | the discovery rate is going to be pretty high.

00:29:09.020 | It's not like we're talking about a combinatorial thing

00:29:11.620 | that is really large.

00:29:13.300 | You're talking about like llama-style LLM models,

00:29:17.460 | and we're going to beat those to death

00:29:19.900 | on a few different hardware SKUs.

00:29:23.180 | It's not even like we have a huge diversity of hardware

00:29:26.860 | you're going to aim to run it on.

00:29:28.740 | Now when you have such a narrow problem

00:29:31.020 | and you have a lot of people working on it,

00:29:32.940 | the rate at which these ideas are going to get figured out

00:29:35.740 | is going to be pretty rapid.

00:29:36.620 | Is it like a standard bag of tricks?

00:29:38.180 | The standard one that I know of is fusing operators

00:29:41.500 | and--

00:29:41.620 | Yeah, it's the standard bag of tricks

00:29:43.420 | on figuring out how to improve your memory bandwidth

00:29:48.780 | and all that.

00:29:49.580 | OK, interesting.

00:29:51.420 | Any ideas instead of things that are not being beaten to death

00:29:54.700 | that people should be paying more attention to?

00:29:56.900 | One thing I was like, you have 1,000 operators.

00:29:59.180 | What's the most interesting usage of PyTorch

00:30:01.260 | that you're seeing maybe outside of this little bubble?

00:30:04.380 | So PyTorch, it's very interesting and scary

00:30:08.140 | at the same time.

00:30:08.940 | But basically, it's used in a lot of exotic ways

00:30:13.740 | from the ML angle, like, OK, what kind of models

00:30:16.700 | are being built?

00:30:18.180 | And you get all the way from state space model

00:30:21.980 | then all these things to stuff like nth-order differentiable

00:30:29.220 | models, like neural IDs and stuff like that.

00:30:35.220 | I think there's one set of interestingness factor

00:30:39.020 | from the ML side of things.

00:30:42.500 | And then there's the other set of interesting factor

00:30:44.900 | from the applications point of view.

00:30:46.620 | It's used in Mars Rover simulations, to drug discovery,

00:30:51.620 | to Tesla cars.

00:30:54.780 | And there's a huge diversity of applications

00:31:00.020 | in which it is used.

00:31:01.940 | So in terms of the most--

00:31:06.940 | I think in terms of the most interesting application

00:31:10.260 | side of things, I think I am scared

00:31:15.020 | at how many interesting things that

00:31:17.380 | are also very critical and really important it is used in.

00:31:20.340 | I think the scariest was when I went

00:31:27.740 | to visit CERN at some point.

00:31:30.820 | And they said they were using PyTorch.

00:31:34.180 | And they were using GANs at the same time

00:31:37.220 | for particle physics research.

00:31:39.300 | And I was scared more about the fact that they were using GANs

00:31:42.100 | than they were using PyTorch.

00:31:43.620 | Because at that time, I was a researcher focusing on GANs.

00:31:47.420 | The diversity is probably the most interesting,

00:31:49.740 | how many different things it is being used in.

00:31:53.100 | I think that's the most interesting to me

00:31:55.020 | from the application's perspective.

00:31:57.060 | From the model's perspective, I think

00:32:00.540 | I've seen a lot of them.

00:32:02.340 | The really interesting ones to me

00:32:04.140 | are where we're starting to combine

00:32:09.660 | search and symbolic stuff with differentiable models.

00:32:16.300 | I think the whole AlphaGo style model is one example.

00:32:25.180 | And then I think we're attempting

00:32:26.620 | to do it for elements as well with various reward

00:32:29.180 | models and then search.

00:32:31.940 | I don't think PyTorch is being used in this,

00:32:34.340 | but the whole alpha geometry thing was interesting.

00:32:37.540 | Because again, it's an example of combining

00:32:39.380 | the symbolic models with the gradient-based ones.

00:32:45.540 | But there are stuff like alpha geometry

00:32:48.020 | that PyTorch is used at, especially

00:32:50.820 | when you intersect biology and chemistry with ML.

00:32:56.100 | In those areas, you want stronger guarantees

00:32:59.900 | on the output.

00:33:03.340 | So yeah, maybe from the ML side, those things to me

00:33:05.940 | are very interesting right now.

00:33:08.820 | - Yeah.

00:33:09.780 | People are very excited about the alpha geometry thing.

00:33:12.900 | For me, it's theoretical.

00:33:14.820 | It's great.

00:33:15.340 | You can solve some Olympiad questions.

00:33:16.940 | I'm not sure how to make that bridge over

00:33:18.740 | into the real-world applications, but I'm sure it--

00:33:21.740 | - Well, OK.

00:33:22.380 | - --will figure it out.

00:33:23.340 | - Let me give you an example of it.

00:33:25.740 | You know how the whole thing about synthetic data

00:33:29.780 | will be the next rage in LLMs is a thing?

00:33:32.380 | - Already is a rage.

00:33:34.060 | - Which I think is fairly misplaced

00:33:38.100 | in how people perceive it.

00:33:39.820 | People think synthetic data is some kind of magic wand

00:33:42.540 | that you wave, and it's going to be amazing.

00:33:45.940 | Synthetic data is useful in neural networks

00:33:50.340 | right now because we, as humans, have figured out

00:33:55.780 | a bunch of symbolic models of the world

00:34:01.460 | or made up certain symbolic models because

00:34:03.900 | of human innate biases.

00:34:06.100 | So we've figured out how to ground particle physics

00:34:11.900 | in a 30-parameter model.

00:34:16.540 | And it's just very hard to compute.

00:34:20.660 | As in, it takes a lot of flops to compute,

00:34:22.780 | but it only has 30 parameters or so.

00:34:25.420 | I mean, I'm not a physics expert,

00:34:26.780 | but it's a very low-rank model.

00:34:29.900 | We built mathematics as a field that

00:34:33.940 | basically is very low-rank.

00:34:37.900 | Language, a deep understanding of language,

00:34:40.740 | like the whole syntactic parse trees

00:34:42.540 | and just understanding how language can be broken down

00:34:46.420 | into formal symbolism is something that we've figured out.

00:34:50.660 | So we basically, as humans, have accumulated

00:34:53.060 | all this knowledge on these subjects, either synthetically--

00:34:57.340 | I mean, we created those subjects in our heads,

00:35:00.500 | or we've grounded some real-world phenomenon

00:35:03.260 | into a set of symbols.

00:35:05.380 | But we haven't figured out how to teach neural networks

00:35:09.940 | symbolic world models directly.

00:35:12.860 | The only way we have to teach them

00:35:14.700 | is generating a bunch of inputs and outputs

00:35:17.580 | and gradient descending over them.

00:35:19.820 | So in areas where we have the symbolic models

00:35:23.340 | and we need to teach all the knowledge we have

00:35:29.820 | that is better encoded in the symbolic models,

00:35:32.620 | what we're doing is we're generating

00:35:34.100 | a bunch of synthetic data, a bunch of input-output pairs,

00:35:38.580 | and then giving that to the neural network

00:35:40.420 | and asking it to learn the same thing

00:35:42.580 | that we already have a better low-rank model of

00:35:46.420 | in gradient descent in a much more overparameterized way.

00:35:50.420 | Outside of this, where we don't have good symbolic models,

00:35:55.020 | synthetic data obviously doesn't make any sense.

00:35:58.020 | So synthetic data is not a magic wand

00:36:00.020 | where it'll work in all cases and every case or whatever.

00:36:02.820 | It's just where we as humans

00:36:05.180 | already have good symbolic models of,

00:36:09.140 | we need to impart that knowledge to neural networks

00:36:12.700 | and we figured out the synthetic data is a vehicle

00:36:16.180 | to impart this knowledge to.

00:36:18.540 | But people, because maybe they don't know enough

00:36:23.940 | about synthetic data as a notion,

00:36:27.060 | but they hear the next wave of data revolution

00:36:30.100 | is synthetic data, they think it's some kind of magic

00:36:32.940 | where we just create a bunch of random data somehow.

00:36:36.900 | They don't think about how.

00:36:38.500 | And then they think that's just a revolution,

00:36:40.940 | and I think that's maybe a gap in understanding

00:36:43.820 | most people have in this hype cycle.

00:36:46.380 | - Yeah, well, it's a relatively new concept.

00:36:48.380 | - Yeah.

00:36:49.220 | - There's two more that I'll put in front of you

00:36:52.020 | and then see what you respond.

00:36:54.380 | One is, I have this joke that it's only synthetic data

00:36:58.940 | if it's from the Mistral region of France,

00:37:01.260 | otherwise it's a sparkling distillation,

00:37:03.060 | which is what news research is doing.

00:37:04.980 | They're distilling GPT-4 by creating synthetic data

00:37:07.660 | from GPT-4, creating mock textbooks inspired by Phi-2,

00:37:11.540 | and then fine-tuning open source models like LAMA.

00:37:14.900 | - Yeah.

00:37:15.940 | - And so, should we call that synthetic data?

00:37:17.580 | Should we call it something else?

00:37:18.500 | I don't know, but it's--

00:37:19.900 | - Yeah, I mean, the outputs of LLMs, are they synthetic data?

00:37:24.240 | They probably are, but I think it depends

00:37:27.660 | on the goal you have.

00:37:29.340 | If your goal is you're creating synthetic data

00:37:36.540 | with the goal of trying to distill GPT-4's superiority

00:37:40.780 | into another model, I guess you can call it synthetic data,

00:37:45.300 | but it also feels disingenuous because your goal is like,

00:37:49.580 | I need to copy the behavior of GPT-4 and--

00:37:53.740 | - It's also not just behavior, but data set.

00:37:57.100 | - Yeah.

00:37:57.980 | - I've often thought of this as data set washing.

00:38:00.100 | You need one model at the top of the chain.

00:38:02.220 | - Yeah, yeah.

00:38:03.480 | - Unnamed French company that makes a model

00:38:07.120 | that has all the data in it that we don't know

00:38:08.720 | where it's from, but it's open source, hey,

00:38:09.920 | and then we distill from that.

00:38:11.160 | - Yeah.

00:38:12.000 | - And it's great.

00:38:12.840 | (laughing)

00:38:13.680 | - Yeah.

00:38:14.920 | - But they also, to be fair, they also use larger models

00:38:18.560 | as judges or for preference ranking, right?

00:38:20.720 | - Yes.

00:38:21.560 | - That is, I think, a very, very accepted use of synthetic.

00:38:24.620 | - Correct.

00:38:25.460 | I think it's a very interesting time where we don't really

00:38:28.960 | have good social models of what is acceptable

00:38:33.960 | depending on how many bits of information you use

00:38:43.000 | from someone else, right?

00:38:44.560 | It's like, okay, you use like one bit, is that okay?

00:38:49.560 | Yeah, that's accepted to be okay.

00:38:51.920 | Okay, what about if you use like 20 bits, is that okay?

00:38:55.840 | But I don't know.

00:38:57.080 | What if you use like 200 bits?

00:38:59.280 | Like, I don't think we as society have ever been

00:39:03.160 | in this conundrum where we have to be like,

00:39:05.760 | where is the boundary of copyright

00:39:08.480 | or where is the boundary of socially accepted understanding

00:39:13.340 | of copying someone else?

00:39:15.960 | Like, we haven't been tested this mathematically before,

00:39:19.880 | in my opinion.

00:39:20.720 | - Yeah, where there's transformative use.

00:39:22.760 | - Yes.

00:39:23.600 | - So yeah, I think this New York Times open AI case

00:39:26.180 | is gonna go to the Supreme Court.

00:39:27.480 | - Yeah.

00:39:28.320 | - And we'll have to decide it 'cause--

00:39:29.320 | - I think it'll be very interesting.

00:39:30.440 | - Never had to deal with it before.

00:39:31.960 | And then finally, for synthetic data,

00:39:34.040 | the thing that I'm personally exploring

00:39:35.320 | is solving this very stark paradigm difference

00:39:38.960 | between rag and fine tuning,

00:39:40.900 | where you can kind of create synthetic data

00:39:43.600 | off of your retrieved documents.

00:39:46.000 | - Yeah.

00:39:46.840 | - And then fine tune on that.

00:39:47.720 | That's kind of synthetic.

00:39:49.180 | All you need is variation or diversity of samples

00:39:53.700 | for you to fine tune on.

00:39:55.120 | And then you can fine tune your knowledge

00:39:56.340 | into your model.

00:39:58.380 | - Yeah.

00:39:59.340 | - I don't know if you've seen that

00:40:00.300 | as a direction for synthetic data.

00:40:03.000 | - I think that is,

00:40:04.440 | that is like you're basically trying to create,

00:40:08.480 | like what you're doing is you're saying,

00:40:10.500 | well, language, I know how to parameterize language

00:40:13.660 | to an extent.

00:40:14.500 | - Yeah.

00:40:15.340 | - And I need to teach my model variations

00:40:18.260 | of this input data so that it's resilient

00:40:22.300 | or invariant to language uses of that data.

00:40:25.640 | - Yeah, it doesn't overfit on--

00:40:26.580 | - Yeah, so I think that's 100% like synthetic, right?

00:40:29.760 | You understand, like the key is like,

00:40:32.340 | you create variations of your documents

00:40:34.700 | and you know how to do that

00:40:36.100 | because you have a symbolic model

00:40:37.460 | or like some implicit symbolic model of language.

00:40:41.620 | - Okay.

00:40:42.680 | Do you think the issue with symbolic models

00:40:45.940 | is just the architecture of the language models

00:40:49.540 | that we're building?

00:40:50.380 | I think like the, maybe the thing that people grasp

00:40:52.860 | is like the inability of transformers

00:40:55.340 | to deal with numbers because of the tokenizer.

00:40:58.580 | Is it a fundamental issue there too

00:41:00.620 | and do you see alternative architectures

00:41:03.040 | that will be better with symbolic understanding?

00:41:06.180 | - I am not sure if it's a fundamental issue or not.

00:41:09.500 | I think we just don't understand transformers enough.

00:41:13.220 | I don't even mean transformers as an architecture.

00:41:15.820 | I mean like the use of transformers today,

00:41:19.460 | like combining the tokenizer and transformers

00:41:22.740 | and the dynamics of training,

00:41:24.700 | like when you show math heavy questions versus not.

00:41:29.220 | I don't have a good calibration

00:41:32.780 | of whether I know the answer or not.

00:41:35.180 | I, you know, there's common criticisms that are like,

00:41:38.340 | well, you know, transformers will just fail at X

00:41:42.260 | but then when you scale them up to sufficient scale,

00:41:46.500 | they actually don't fail at that X.

00:41:48.820 | I think this is, this entire subfield

00:41:51.940 | where they're trying to figure out these answers

00:41:53.580 | called like the science of deep learning or something.

00:41:55.720 | So we'll get to know more.

00:41:57.860 | I don't know the answer.

00:42:00.180 | - Got it.

00:42:01.380 | Let's touch a little bit on just meta AI

00:42:04.020 | and you know, stuff that's going on there.

00:42:05.480 | Maybe, I don't know how deeply

00:42:07.460 | you're personally involved in it

00:42:08.420 | but you're our first guest from meta AI

00:42:10.280 | which is really fantastic.

00:42:11.700 | And LlamaOne was, you know, you are such a believer

00:42:15.860 | in open source.

00:42:16.680 | LlamaOne was more or less like the real breakthrough

00:42:19.520 | in open source AI.

00:42:20.840 | The most interesting thing for us covering in this podcast

00:42:26.700 | was the depth of Chinchilla, as people say.

00:42:30.040 | Any interesting insights there around like

00:42:32.200 | the scaling models for open source models or smaller models

00:42:36.680 | or whatever that design decision was

00:42:38.480 | when you guys were doing it?

00:42:40.440 | - So LlamaOne was Guillaume Lample and team.

00:42:45.860 | There was OPT before, which I'm also very proud of.

00:42:50.820 | - That's true.

00:42:53.160 | - Because we bridged the gap in understanding

00:42:56.620 | of how complex it is to train these models to the world.

00:43:01.620 | Like until then, no one really,

00:43:04.200 | in gory detail, published--

00:43:06.600 | - The logs.

00:43:07.440 | - Yeah, like why is it complex?

00:43:09.460 | And everyone says like, oh, it's complex.

00:43:11.800 | But no one really talked about why it's complex.

00:43:16.800 | So I think OPT was cool.

00:43:19.820 | We probably--

00:43:20.660 | - I met Susan and she's very, very outspoken.

00:43:22.500 | - Yeah, we probably, I think,

00:43:25.900 | didn't train it for long enough, right?

00:43:28.540 | Like, you know, that's kind of obvious in retrospect.

00:43:31.800 | - For a 175B?

00:43:33.580 | - Yeah.

00:43:34.420 | - But you trained it according to Chinchilla at the time or?

00:43:38.540 | - I can't remember the details,

00:43:40.420 | but I think it's a commonly held belief at this point

00:43:42.740 | that like, well, if we trade OPT longer,

00:43:45.340 | it would actually end up being better.

00:43:47.840 | Llama one, I think was, yeah,

00:43:50.860 | Guillaume Lample and team Guillaume is fantastic

00:43:54.480 | and went on to build Mistral.

00:43:56.740 | I wasn't too involved in that side of things.

00:44:00.220 | So I don't know what you're asking me,

00:44:05.700 | which is like, well, like,

00:44:06.660 | how did they think about scaling laws and all of that?

00:44:10.600 | Llama two, I was more closely involved in.

00:44:15.600 | I helped them a reasonable amount

00:44:19.580 | with like their infrastructure needs and stuff.

00:44:24.580 | Llama two, I think was more like,

00:44:27.700 | let's get to the evolution.

00:44:31.040 | At that point, we kind of understood

00:44:35.040 | what we were missing from the industry's understanding

00:44:40.040 | of LLMs and we needed more data

00:44:45.000 | and we needed more to train the models for longer.

00:44:48.100 | And we made, I think, a few tweaks to the architecture

00:44:51.600 | and we scaled up more and like that is llama two.

00:44:56.120 | I think llama two, you can think of it as like,

00:44:58.840 | after Guillaume left,

00:45:00.160 | the team kind of rebuilt their muscle around llama two.

00:45:04.320 | And Hugo, I think, who's the first daughter is fantastic.

00:45:07.760 | And I think he did play a reasonable big role

00:45:11.320 | in llama one as well and he overlaps between llama one

00:45:13.680 | and two.

00:45:14.520 | So in llama three, obviously, hopefully will be awesome.

00:45:18.940 | - Just one question on llama two

00:45:21.680 | and then we'll try and fish llama three spoilers out of you.

00:45:25.960 | In the llama two paper,

00:45:27.080 | the loss curves of the 34 and 70B parameter,

00:45:30.840 | they still seem kind of steep,

00:45:32.880 | but they could go lower.

00:45:34.320 | How, from an infrastructure level,

00:45:37.040 | how do you allocate resources?

00:45:38.560 | Could they have just gone longer or were you just like,

00:45:41.920 | hey, this is all the GPUs that we can burn

00:45:43.920 | and let's just move on to llama three

00:45:45.480 | and then make that one better?

00:45:46.960 | - Instead of answering specifically

00:45:48.760 | about that llama two situation or whatever,

00:45:51.200 | I'll tell you how we think about things.

00:45:54.240 | Generally, we have,

00:45:58.320 | I mean, Mark really is some numbers, right?

00:46:01.800 | So let's cite those things again.

00:46:04.140 | All I remember is like 600K GPUs.

00:46:07.200 | - That is by the end of this year

00:46:08.840 | and 600K H100 equivalents with 250K H100s

00:46:13.840 | and including all of the other GPU or accelerator stuff,

00:46:20.960 | it would be 600 and something K aggregate capacity.

00:46:25.960 | That's a lot of GPUs, we'll talk about it separately,

00:46:29.840 | but the way we think about it

00:46:32.840 | is we have a train of models, right?

00:46:35.800 | Llama one, two, three, four.

00:46:38.040 | And we have a bunch of GPUs.

00:46:41.880 | I don't think we're short of GPUs.

00:46:44.280 | - Yeah, no, I wouldn't say so.

00:46:45.600 | - Yeah, so I think it's all a matter of time.

00:46:50.600 | I think time is the biggest bottleneck.

00:46:53.160 | It's like when do you stop training the previous one

00:46:55.960 | and when do you start training the next one

00:46:58.400 | and how do you make those decisions?

00:47:00.720 | The data, do you have net new data,

00:47:04.320 | better clean data for the next one

00:47:06.480 | in a way that it's not worth

00:47:08.000 | like really focusing on the previous one.

00:47:10.560 | It's just a standard iterative product.

00:47:13.220 | You're like, when is the iPhone one?

00:47:15.360 | When you start working iPhone two, where is the iPhone?

00:47:18.780 | Like so on, right?

00:47:19.740 | So mostly the considerations are time and generation

00:47:26.200 | rather than GPUs in my opinion.

00:47:28.320 | - So one of the things with the scaling laws,

00:47:30.480 | like Chinchilla is like optimal to balance

00:47:33.040 | training and inference costs.

00:47:34.520 | I think at Facebook scale or Metascale,

00:47:37.600 | you would rather pay a lot more maybe at training

00:47:39.760 | and then save on inference.

00:47:41.680 | How do you think about that

00:47:42.640 | from a infrastructure perspective?

00:47:45.220 | I think in your tweet you say you can try and guess

00:47:47.920 | on like how we're using these GPUs.

00:47:50.320 | Can you just give people a bit of understanding?

00:47:52.240 | It's like, because I've already seen a lot of VCs say,

00:47:54.640 | Llama 3 has been trained on 600,000 GPUs

00:47:56.760 | and that's obviously not true, I'm sure.

00:47:58.900 | How do you allocate between the research like FAIR

00:48:03.040 | and the Llama training, the inference on

00:48:07.000 | Instagram suggestions that got me to scroll,

00:48:09.280 | like AI generated stickers on WhatsApp and all that?

00:48:12.720 | - Yeah, we haven't talked about any of this publicly

00:48:16.760 | but like as a broad stroke,

00:48:19.120 | it's like how we would allocate resources

00:48:21.600 | of any other kinds at any company.

00:48:24.900 | You run a company, you run like a VC portfolio,

00:48:29.900 | like how do you allocate your investments

00:48:36.300 | between different companies or whatever?

00:48:38.000 | You kind of make various trade offs

00:48:39.580 | and you kind of decide should I invest in this project

00:48:42.260 | or this other project or how much should I invest

00:48:45.020 | in this project?

00:48:46.180 | It's very much like a zero sum of trade offs

00:48:52.820 | and it also comes into play like how is your,

00:48:57.300 | how are your like clusters configured?

00:48:59.700 | Like overall, like what you can fit of what size

00:49:02.960 | and what cluster and so on.

00:49:04.400 | So broadly, there's no magic sauce here.

00:49:08.460 | Like, I mean, I think the details would add more spice

00:49:12.820 | but also wouldn't add more understanding.

00:49:16.560 | It's just gonna be like, oh, okay.

00:49:18.960 | I mean, this looks like they just think about this

00:49:22.080 | as I would normally do.

00:49:24.000 | - Right, so even the GPU rich run through the same struggles

00:49:27.920 | while having to decide where to allocate things?

00:49:30.800 | - Yeah, I mean like at some point, I forgot who said it

00:49:34.760 | but it's like you kind of fit your bottles

00:49:39.760 | to the amount of compute you have.

00:49:43.320 | If you don't have enough compute,

00:49:44.700 | you figure out how to make do with smaller models

00:49:48.140 | but like no one as of today, I think would feel like

00:49:53.140 | they have enough compute.

00:49:55.180 | I don't think like I have heard any company

00:49:59.820 | within the AI space be like, oh yeah,

00:50:03.500 | like we feel like we have sufficient compute

00:50:05.920 | and we couldn't have done better.

00:50:07.760 | So like that conversation, I don't think I've heard

00:50:12.860 | from any of my friends at other companies.

00:50:16.340 | - Stella from Eleuther sometimes says that

00:50:18.900 | because she has a lot of donated compute

00:50:20.900 | and she's trying to put it to interesting uses

00:50:23.720 | but for some reason, she's decided to stop

00:50:26.900 | making large models.

00:50:28.820 | - I mean, that's a cool high conviction opinion

00:50:33.280 | that might pay out, right?

00:50:35.900 | I mean, she's taking a path that most people

00:50:39.940 | don't care to take about in this climate

00:50:42.060 | and she probably will have very differentiated ideas

00:50:46.080 | and I mean, think about the correlation of ideas

00:50:49.860 | in AI right now, it's so bad, right?

00:50:53.220 | So everyone's fighting for the same pie.

00:50:56.860 | In some weird sense, like that's partly why

00:51:01.620 | I don't really directly work on LLMs.

00:51:04.180 | I used to be a, I used to do image models and stuff

00:51:08.080 | and I actually stopped doing GANs

00:51:10.020 | because GANs were getting so hot

00:51:12.820 | that I didn't have any calibration

00:51:14.900 | of whether my work would be useful or not

00:51:17.780 | because oh yeah, someone else did the same thing you did.

00:51:21.120 | It's like, there's so much to do,

00:51:24.260 | I don't understand why I need to fight for the same pie.

00:51:27.980 | So I think Stella's decision is very smart.

00:51:32.980 | - And how do you reconcile that with how we started

00:51:36.740 | the discussion about intrinsic versus extrinsic

00:51:39.860 | kind of like accomplishment or success?

00:51:42.540 | How should people think about that one,

00:51:44.300 | especially when they're doing a PhD

00:51:45.980 | or like early in their career?

00:51:48.900 | It seems like, I think in Europe's,

00:51:50.600 | I walked through a lot of the posters and whatnot,

00:51:52.980 | there seems to be multiple apps in a way in the research,

00:51:56.100 | a lot of people working on the same things.

00:51:58.540 | Is it worth for like a PhD to not take a bet

00:52:01.480 | on something that is like maybe not as interesting,

00:52:04.500 | just because of funding and visibility and whatnot

00:52:07.300 | or yeah, what suggestions would you give?

00:52:10.260 | - I think there's a baseline level of compatibility

00:52:13.180 | you need to have with the field.

00:52:16.100 | Basically, you need to figure out

00:52:19.440 | if you will get paid enough to eat, right?

00:52:22.020 | Like, and like whatever reasonable, normal lifestyle

00:52:25.440 | you want to have as a baseline.

00:52:29.500 | So you at least have to pick a problem

00:52:31.380 | within the neighborhood of like fundable.

00:52:34.220 | Like you wouldn't want to be doing something so obscure

00:52:39.220 | that people are like, I don't know, like you can work on it.

00:52:42.960 | With a limit on fundability, I'm just like observing

00:52:47.020 | something like three months of compute, right?

00:52:49.380 | That's the top line.

00:52:50.220 | That's the like max that you can spend on any one project.

00:52:53.440 | - But like, I think that's very ill specified,

00:52:56.280 | like how much compute?

00:52:57.180 | - Yeah.

00:52:58.820 | - So I think the notion of fundability is broader.

00:53:03.820 | It's more like, hey, are these family of models

00:53:06.780 | within the acceptable set of you're not crazy

00:53:10.700 | or something, right?

00:53:11.520 | Like even something like neural RDEs,

00:53:14.540 | which is a very like boundary pushing thing

00:53:18.120 | or like state space models or whatever.

00:53:20.160 | Like all of these things I think are still

00:53:22.280 | in fundable territory.

00:53:23.760 | When you're talking about, I'm gonna do one

00:53:28.820 | of the neuromorphic models and then apply

00:53:33.820 | like image classification to them or something,

00:53:38.280 | then it becomes like a bit questionable.

00:53:41.140 | Again, it depends on your motivation.

00:53:42.640 | Maybe if you're a neuroscientist, it actually is feasible.

00:53:46.320 | But if you're like a AI engineer, like the audience

00:53:50.120 | of these podcasts, then it's less, it's more questionable.

00:53:54.760 | So I think like, the way I think about it is like,

00:53:57.680 | you need to figure out how you can be in the baseline level

00:54:01.800 | of fundability just so that you can just live.

00:54:06.400 | And then after that, really focus on intrinsic motivation

00:54:11.400 | and depends on your strengths, like how you can play

00:54:16.740 | to your strengths and your interests at the same time.

00:54:21.060 | Like you, like I try to look at a bunch of ideas

00:54:26.060 | that are interesting to me, but also try to play

00:54:29.800 | to my strengths.

00:54:31.420 | I'm not gonna go work on theoretical ML.

00:54:34.960 | I'm interested in it, but when I want to work

00:54:38.720 | on something like that, I try to partner with someone

00:54:40.800 | who is actually a good like theoretical ML person

00:54:43.440 | and see if I actually have any value to provide.

00:54:45.720 | And if they think I do, then I come in.

00:54:48.280 | So I think you'd want to find that intersection

00:54:50.840 | of ideas you like, and that also play to your strengths.

00:54:55.840 | And I'd go from there.

00:54:57.520 | Everything else, like actually finding extrinsic success

00:55:01.160 | and all of that I think is, the way I think about it

00:55:05.200 | is like somewhat immaterial.

00:55:06.820 | When you're talking about building ecosystems and stuff,

00:55:10.560 | like slightly different considerations come into play,

00:55:13.200 | but that's a different conversation.

00:55:16.600 | - Yeah, I should, we're gonna pivot a little bit

00:55:20.800 | to just talk about open source AI.

00:55:23.600 | But one more thing I wanted to establish for meta

00:55:25.720 | is like this 600K number, just kind of rounding out

00:55:28.160 | the discussion, that's for all meta.

00:55:31.060 | So including your own inference needs, right?

00:55:32.640 | It's not just about training.

00:55:33.900 | - It's for all, it's gonna be the number

00:55:36.960 | in our data centers for all of meta, yeah.

00:55:39.380 | - Yeah, so like, there's a decent amount of workload

00:55:42.400 | serving Facebook and Instagram and you know, whatever.

00:55:45.920 | And then is there interest in like your own hardware?

00:55:49.740 | - We already talked about our own hardware.

00:55:53.640 | It's called MTIA, our own silicon.

00:55:57.620 | I think we've even showed like the standard photograph

00:56:02.380 | of you holding the chip that doesn't work.

00:56:05.000 | I mean, like as in the chip that you basically

00:56:10.000 | just get like--

00:56:11.520 | - As a test?

00:56:12.680 | - Yeah, a test chip or whatever.

00:56:14.280 | So we are working on our silicon

00:56:18.800 | and we'll probably talk more about it

00:56:21.720 | when the time is right, but--

00:56:25.220 | - Like what gaps do you have that the market doesn't offer?

00:56:29.000 | - Okay, I mean this is easy to answer.

00:56:31.120 | So basically, remember how I told you about the whole,

00:56:34.680 | like there's this memory hierarchy

00:56:36.640 | and like sweet spots and all of that?

00:56:39.360 | Fundamentally, like when you build a hardware,

00:56:42.080 | like you make it general enough that a wide set of customers

00:56:46.680 | and a wide set of workloads can use it effectively

00:56:49.800 | while trying to get the maximum level of performance

00:56:53.160 | they can.

00:56:55.000 | The more specialized you make the chip,

00:56:58.460 | the more hardware efficient it's going to be,

00:57:02.660 | the more power efficient it's gonna be,

00:57:04.460 | the more easier it's going to be to find like the software,

00:57:08.820 | like the kernel's right to just map one,

00:57:14.020 | that one or two workloads to that hardware and so on.

00:57:17.080 | So it's pretty well understood across the industry

00:57:21.840 | that if you have a sufficiently large volume enough workload,

00:57:26.840 | you can specialize it and get some efficiency gains,

00:57:33.580 | like power gains and so on.

00:57:35.460 | So the way you can think about everyone building,

00:57:40.380 | every large company building silicon,

00:57:42.560 | like I think a bunch of the other large companies

00:57:46.180 | are building their own silicon as well,

00:57:48.860 | is each large company has a sufficient enough set

00:57:53.840 | of verticalized workloads that have a pattern to them

00:57:58.840 | that say a more generic accelerator

00:58:03.920 | like an Nvidia or an AMD GPU does not exploit.

00:58:07.880 | So there is some level of power efficiency

00:58:11.520 | that you're leaving on the table by not exploiting that.

00:58:14.920 | And you have sufficient skill

00:58:16.480 | and you have sufficient forecasted stability

00:58:21.120 | that those workloads will exist in the same form,

00:58:25.100 | that it's worth spending the time to build out a chip

00:58:28.520 | to exploit that sweet spot.

00:58:32.640 | Like obviously something like this is only useful

00:58:36.220 | if you hit a certain scale

00:58:38.700 | and that you're like forecasted prediction

00:58:42.040 | of those kinds of workloads being in the same kind

00:58:45.880 | of specializable exploitable way is true.

00:58:49.860 | So yeah, that's why we're building our own chips.

00:58:54.960 | - Amazing, awesome.

00:58:56.080 | Yeah, I know we've been talking a lot

00:58:59.040 | on a lot of different topics

00:59:00.560 | and going back to open source, you had a very good tweet.

00:59:03.600 | You said that a single company's close source effort

00:59:06.360 | rate limits against people's imaginations and needs.

00:59:09.160 | How do you think about that?

00:59:11.320 | How do you think about all the impact

00:59:13.960 | that some of the meta AI work in open source has been doing

00:59:17.200 | and maybe directions of the whole open source AI space?

00:59:20.120 | - Yeah.

00:59:20.960 | In general, I think first I think it's worth talking

00:59:25.280 | about this in terms of open and not just open source

00:59:28.940 | because like with the whole notion of model weights,

00:59:31.920 | no one even knows what source means for these things.

00:59:35.500 | But just for the discussion, when I say open source,

00:59:39.360 | you can assume it's just I'm talking about open.

00:59:42.240 | And then there's the whole notion of like licensing

00:59:45.040 | and all that like, you know--

00:59:46.440 | - Commercial.

00:59:47.280 | - Commercial, non-commercial, commercial with clauses

00:59:49.240 | and all that.

00:59:50.060 | I think like at a fundamental level,

00:59:53.480 | the most benefited value of open source

00:59:57.160 | is that you make the distribution to be very wide.

01:00:02.160 | Like it's just available with no friction

01:00:06.300 | and like people can do transformative things.

01:00:10.860 | In a way that's very accessible.

01:00:14.060 | Like maybe like it's open source,

01:00:17.100 | but it has a commercial license

01:00:18.660 | and I'm a student like in India.

01:00:20.860 | I don't care about the license.

01:00:22.980 | I just don't even understand the license.

01:00:25.340 | But like the fact that I can use it

01:00:27.420 | and do something with it is very transformative to me.

01:00:32.260 | Like I got this thing in a very accessible way.

01:00:38.700 | And then like so it's very, very various degrees, right?

01:00:42.260 | And then like if it's open source,

01:00:44.100 | but it's like actually like a commercial license,

01:00:47.260 | then a lot of companies are gonna benefit

01:00:50.020 | from like gaining value that they didn't previously have

01:00:54.780 | that they maybe had to pay a closed source company for it.

01:00:59.100 | So open source is just a very interesting tool

01:01:02.460 | that you can use in various ways.

01:01:04.420 | So there's, again, two kinds of open source.

01:01:06.540 | One is like some large company doing a lot of work

01:01:09.300 | and then open sourcing it.

01:01:12.260 | And that kind of effort is not really feasible

01:01:15.820 | by say like a band of volunteers doing it the same way.

01:01:19.860 | So there's both a capital and operational expenditure

01:01:22.900 | that the large company just decided to

01:01:25.220 | ignore and give it away to the world

01:01:30.780 | for some benefits of some kind.

01:01:33.740 | They're not as tangible as like direct revenue

01:01:36.300 | or something.

01:01:37.660 | So in that part, Meta has been doing incredibly good things.

01:01:42.660 | They fund a huge amount of the PyTorch development.

01:01:47.900 | They've open sourced Llama and those family of models.

01:01:52.060 | And several other fairly transformative projects.

01:01:58.060 | FICE is one, Segment Anything, Detectron,

01:02:03.140 | Detectron 2, Densepose, I mean it's--

01:02:06.900 | - Seamless. - Yeah, Seamless.

01:02:08.660 | It's just like the list is so long

01:02:10.500 | that we're not gonna cover.

01:02:12.660 | So I think Meta comes into that category

01:02:15.860 | where we spend a lot of capex and opex

01:02:19.220 | and we have a high talent density of great AI people.

01:02:24.220 | And we open our stuff.

01:02:27.700 | And the thesis for that, I remember when Fair was started,

01:02:31.420 | the common thing was like wait,

01:02:33.300 | why would Meta wanna start a open AI lab?

01:02:38.300 | What exactly is the benefit from a commercial perspective?

01:02:44.380 | And then the thesis was very simple.

01:02:46.780 | It was like AI is currently rate limiting

01:02:50.300 | Meta's ability to do things.

01:02:53.280 | Our ability to build various product integrations,

01:02:58.980 | moderation, various other factors.

01:03:01.660 | AI was the limiting factor.

01:03:04.020 | And we just wanted AI to advance more.

01:03:06.980 | And we didn't care if the IP of the AI

01:03:11.380 | was uniquely in our possession or not for us.

01:03:15.460 | However the field advances, that accelerates

01:03:17.900 | Meta's ability to build a better product.

01:03:20.620 | So we just built an open AI lab and we said,

01:03:24.180 | if this helps accelerate the progress of AI,

01:03:27.260 | that's strictly great for us.

01:03:29.340 | But very easy rational, right?

01:03:31.380 | Still the same to a large extent with the Llama stuff

01:03:35.220 | and it's a bit more, I think it's the same values,

01:03:40.220 | but the argument, it's a bit more nuanced.

01:03:46.160 | And then there's the second kind of open source,

01:03:50.420 | which is oh, we built this project nights and weekends

01:03:54.140 | and we're very smart people and we open sourced it

01:03:56.580 | and then we built a community around it.

01:03:58.140 | This is like the Linux kernel

01:03:59.660 | and various software projects like that.

01:04:03.420 | So I think about open source,

01:04:08.420 | like both of these things being beneficial

01:04:13.940 | and both of these things being different.

01:04:15.980 | They're different and beneficial in their own ways.

01:04:22.080 | The second one is really useful

01:04:24.380 | when there's an active arbitrage to be done.

01:04:28.580 | If someone's not really looking at a particular space,

01:04:33.780 | because it's not commercially viable or whatever,

01:04:35.980 | like a band of volunteers can just coordinate online

01:04:39.680 | and do something and then make that happen.

01:04:43.820 | And that's great.

01:04:44.820 | I wanna cover a little bit about open source LLMs maybe.

01:04:51.820 | So open source LLMs have been very interesting

01:04:54.620 | because I think we were trending towards an increase

01:04:58.580 | in open source in AI from 2010

01:05:02.980 | all the way to like 2017 or something.

01:05:08.200 | Like where more and more pressure within the community

01:05:11.520 | was to open source their stuff

01:05:13.080 | so that their methods and stuff get adopted.

01:05:17.580 | And then the LLM revolution kind of took the opposite effect.

01:05:22.580 | Open AI stopped open sourcing their stuff

01:05:28.020 | and DeepMind kind of like all the other cloud

01:05:33.020 | and all these other providers,

01:05:35.160 | they didn't open source their stuff.

01:05:38.300 | And it was not good in the sense that first,

01:05:46.260 | like science done in isolation

01:05:48.120 | probably will just form its own bubble

01:05:51.400 | where like people believe their own bullshit

01:05:53.020 | or whatever, right?

01:05:54.260 | So there's that problem.

01:05:56.220 | And then there was the other problem

01:05:59.180 | which was the accessibility part.

01:06:01.840 | Like, okay, I again always go back to like,

01:06:05.740 | I'm a student in India with no money.

01:06:07.800 | What is my accessibility to any of these closer models?

01:06:15.060 | At some scale I have to pay money.

01:06:18.100 | That makes it a non-starter and stuff.

01:06:22.620 | And there is also the control thing.

01:06:24.640 | I strongly believe the best,

01:06:27.280 | if you want human-aligned stuff,

01:06:31.540 | you want all humans to give feedback

01:06:35.700 | and you want all humans to have access

01:06:37.720 | to their technology in the first place.

01:06:40.380 | And I actually have seen, living in New York,

01:06:44.140 | whenever I come to Silicon Valley

01:06:45.580 | I see a different cultural bubble.

01:06:47.600 | Like all the friends I hang out with

01:06:50.060 | talk about some random thing,

01:06:52.700 | like Dyson spheres or whatever, that's a thing.

01:06:55.940 | And most of the world doesn't know

01:06:58.220 | or care about any of this stuff.

01:06:59.860 | Like it's definitely like a bubble

01:07:02.380 | and bubbles can form very easily.

01:07:04.240 | And when you make a lot of decisions

01:07:05.980 | because you're in a bubble,

01:07:07.960 | they're probably not globally optimal decisions.

01:07:11.780 | So I think open source, the distribution of open source,

01:07:15.140 | powers a certain kind of non-falsifiability

01:07:20.140 | that I think is very important.

01:07:22.120 | So I think on the open source models,

01:07:27.740 | it's going great in the fact that Laura, I think,

01:07:31.560 | came out of the necessity of open source models

01:07:36.420 | needing to be fine-tunable in some way.

01:07:41.060 | - Yeah, and I think DPO also came

01:07:44.580 | out of the academic open source side of things.

01:07:49.480 | So do any of the closed source labs,

01:07:54.480 | did any of them already have Laura or DPO internally?

01:08:00.540 | Maybe, but that does not advance humanity in any way.

01:08:05.540 | It advances some company's probability

01:08:09.780 | of doing the winner takes all

01:08:11.940 | that I talked about earlier in the podcast.

01:08:14.680 | So I don't know, it just feels fundamentally good.

01:08:19.180 | Like when people try to, people are like,

01:08:22.860 | well, what are the ways in which it is not okay?

01:08:27.300 | And this might be a little controversial,

01:08:29.180 | but I find a lot of arguments based on

01:08:33.260 | whether closed source models are safer

01:08:35.460 | or open source models are safer,

01:08:37.860 | very much related to what kind of cultural culture

01:08:42.860 | they grew up in, what kind of society they grew up in.

01:08:50.140 | If they grew up in a society that they trusted,

01:08:52.960 | then I think they take the closed source argument.

01:08:57.900 | And if they grew up in a society that they couldn't trust,

01:09:00.420 | where the norm was that you didn't trust your government,

01:09:03.260 | obviously, like it's corrupt or whatever,

01:09:05.500 | then I think the open source argument is what they take.

01:09:08.620 | I think there's a deep connection

01:09:10.360 | to people's innate biases from their childhood

01:09:15.360 | and their trust in society and governmental aspects

01:09:21.900 | that push them towards one opinion or the other.

01:09:26.260 | And I'm definitely in the camp of open source

01:09:29.900 | is definitely going to actually have

01:09:31.940 | better outcomes for society.

01:09:33.860 | Closed source to me just means that centralization of power,

01:09:37.340 | which is really hard to trust.

01:09:39.220 | So I think it's going well in so many ways.

01:09:46.180 | We're actively disaggregating the centralization of power

01:09:52.540 | to just two or three providers.

01:09:55.180 | We are, I think, benefiting from so many people

01:09:58.420 | using these models in so many ways

01:10:00.660 | that aren't allowed by say Silicon Valley left wing tropes.

01:10:05.660 | Some of these things are good or bad,

01:10:13.180 | but they're not culturally accepted universally in the world.

01:10:16.700 | So those are things worth thinking about.

01:10:20.420 | And I think open source is not winning in certain ways.

01:10:25.420 | These are all the things in which, as I mentioned,

01:10:29.980 | it's actually being very good and beneficial and winning.

01:10:33.060 | I think one of the ways in which it's not winning,

01:10:36.340 | at some point I should write a long form post about this,

01:10:39.220 | is I think it has a classic coordination problem.

01:10:43.260 | I mean, open source in general

01:10:44.420 | always has a coordination problem.

01:10:46.620 | If there's a vertically integrated provider

01:10:48.940 | with more resources,

01:10:50.480 | they will just be better coordinated than open source.

01:10:54.980 | And so now open source has to figure out

01:10:57.780 | how to have coordinated benefits.

01:10:59.560 | And the reason you want coordinated benefits

01:11:01.780 | is because these models are getting better

01:11:06.780 | based on human feedback.

01:11:09.580 | And if you see with open source models,

01:11:12.100 | like if you go to Reddit, local llama, subreddit,

01:11:16.860 | there's so many variations of models

01:11:19.020 | that are being produced from say, NOS research.

01:11:23.340 | I mean, there's so many variations

01:11:26.820 | built by so many people.

01:11:29.420 | And one common theme is they're all using these fine tuning

01:11:34.420 | or human preferences data sets that are very limited

01:11:39.660 | and someone published them somewhere

01:11:42.500 | and they're not sufficiently diverse.

01:11:46.940 | And you look at the other side,

01:11:49.080 | like say front-ends like Uber or Hugging Chat or Ollama,

01:11:54.080 | they don't really have feedback buttons.

01:11:58.900 | All the people using all of these front-ends,

01:12:01.920 | they probably want to give feedback

01:12:04.380 | but there's no way for them to give feedback.

01:12:07.600 | So these models are being built,

01:12:10.180 | they're being arbitrarily measured

01:12:13.660 | and then they are being deployed

01:12:14.940 | into all these open source front-ends

01:12:16.780 | or like apps that are closed source,

01:12:19.900 | they're serving open source models.

01:12:21.720 | And these front-ends don't have,

01:12:24.580 | they are not exposing the ability to give feedback.

01:12:27.620 | So we're just losing all of this feedback.

01:12:31.340 | Maybe open source models are being as used as GPT is

01:12:34.940 | at this point in all kinds of, in a very fragmented way.

01:12:39.700 | Like in aggregate, all the open source models together

01:12:41.980 | are probably being used as much as GPT is,

01:12:44.500 | maybe close to that.

01:12:47.180 | But the amount of feedback that is driving back

01:12:50.240 | into the open source ecosystem is like negligible,

01:12:53.140 | maybe less than 1% of the usage.

01:12:56.940 | So I think like some,

01:13:00.060 | like the blueprint here I think is,

01:13:05.000 | you'd want someone to create a sinkhole for the feedback,

01:13:08.140 | some centralized sinkhole,

01:13:09.260 | like maybe Hugging Face or someone just finds like,

01:13:12.960 | okay, like I will make available a call to log a string

01:13:17.960 | along with like a bit of information of positive or negative

01:13:22.620 | or something like that.

01:13:24.300 | And then you would want to send pull requests

01:13:26.540 | to all the open source front ends, like Uber and all,

01:13:30.860 | being like, hey, we're just integrating like a feedback UI.

01:13:34.660 | And then work with like the closed source people

01:13:37.260 | is also being like, look, it doesn't cost you anything,

01:13:40.200 | just like have a button.

01:13:42.140 | And then the sinkhole will have a bunch

01:13:45.700 | of this data coming in.

01:13:47.480 | And then I think a bunch of open source researchers

01:13:50.580 | should figure out how to filter the feedback

01:13:52.900 | into only the like high quality one.

01:13:54.640 | I'm sure like it will be exploited by spam bots

01:13:56.760 | or whatever, right?

01:13:58.000 | This is like the perfect way

01:13:59.280 | to inject your advertising product into like the next--

01:14:03.200 | - Buy Coca Cola now.

01:14:05.040 | - So there needs to be some level of that.

01:14:08.760 | In the same way, I'm sure like all the close providers

01:14:13.080 | are doing today, like OpenAI, Claude,

01:14:15.880 | like the feedback that comes in,

01:14:17.920 | I'm sure they are figuring out if that's legit or not.

01:14:21.600 | That kind of data filtering needs to be done.

01:14:24.240 | And that loop has to be set up.

01:14:28.600 | And this requires that central sinkhole

01:14:31.160 | and that like data cleaning effort both to be like there.

01:14:35.760 | They're not there right now.

01:14:37.200 | They're not there right now.

01:14:38.360 | I think for capital reasons,

01:14:42.920 | but also for coordination reasons.

01:14:44.680 | Okay, if that central sinkhole is there,

01:14:46.360 | who's gonna go coordinate all of this integration

01:14:49.840 | across all of these like open source front ends.

01:14:52.840 | But I think if we do that, if that actually happens,

01:14:57.200 | I think that probably has a real chance

01:15:00.800 | of the open source models having a runaway effect

01:15:03.080 | against OpenAI with their current

01:15:06.640 | like daily active users rumored.

01:15:10.000 | Probably doesn't have a chance against Google

01:15:13.360 | because Google has Android and Chrome and Gmail

01:15:18.360 | and Google Docs and everything.

01:15:22.000 | So people just use that a lot.

01:15:25.280 | But like I think like there's a clear chance

01:15:29.160 | we can take at truly winning open source.

01:15:34.160 | - Do you think this feedback is helpful

01:15:37.000 | to make open source models better

01:15:38.960 | or to get to like open source AGI?

01:15:41.720 | Because in a way like OpenAI's goal is to get to AGI, right?

01:15:44.920 | So versus I think in open source,

01:15:47.160 | we're more focused on personal better usage or like--

01:15:50.960 | - Yeah, I think that's a good question.

01:15:52.680 | But I think like, largely I actually don't think people

01:15:57.680 | have a good understanding of AGI

01:16:00.720 | and I don't mean definition level.

01:16:02.280 | I mean, people are like, okay, we're gonna,

01:16:05.560 | AGI means it's powering 40% of world economic output

01:16:10.000 | or something like that, right?

01:16:12.700 | But what does that mean?

01:16:14.680 | So do you think electricity is powering 40%

01:16:18.360 | of world economic output or is it not?

01:16:21.520 | Like, generally the notion of like powering X percent

01:16:26.400 | of economic output is not defined well at all

01:16:31.160 | for me to understand like how to know when we got to AGI

01:16:36.160 | or how to measure whether we're getting AGI.

01:16:40.640 | Like, you know, you can look at it in terms of intelligence

01:16:43.420 | or task automation or whatever.

01:16:46.000 | I think that's what we are doing right now.

01:16:48.200 | We're basically integrating like the current set

01:16:50.520 | of AI technologies into so many real world use cases

01:16:55.160 | where we find value that if some new version of AI comes in,

01:17:01.360 | we can find, like we can be like, ah, this helps me more.

01:17:05.300 | In that sense, I think like the whole process

01:17:10.180 | of like how we think we got to AGI will be continuous

01:17:13.700 | and not like not discontinuous like how I think

01:17:18.700 | the question is posed.

01:17:21.280 | So I think the open source thing will be very much in line

01:17:26.280 | with getting to AGI because open source has that

01:17:31.460 | like natural selection effect.

01:17:36.860 | Like if a better open source model comes,

01:17:39.900 | really no one says, huh, I don't wanna use it

01:17:43.540 | because there are ecosystem effect.

01:17:46.120 | I'm logged into my ecosystem or like,

01:17:49.080 | I don't know if I like the models, you know, whatever.

01:17:52.080 | It's just a very pure direct thing.

01:17:55.440 | So if there's a better model that comes out,

01:17:58.560 | then it will be used.

01:18:00.800 | So I definitely think it has a good chance of achieving

01:18:05.160 | how I would think about as a continuous

01:18:09.860 | path to what we might define as AGI.

01:18:13.520 | - For the listeners, I will actually mention

01:18:16.680 | a couple other maybe related notes on just

01:18:19.480 | this very interesting concept of feedbacks and coal

01:18:22.120 | for open source to really catch up in terms of

01:18:25.380 | the overall Google versus OpenAI debate.

01:18:28.060 | Open Assistant was led by Yannick Kilcher

01:18:32.480 | who recently ended his effort.

01:18:33.860 | I think the criticism there was like the kind of people

01:18:35.720 | that go to a specific website to give feedback

01:18:38.860 | is not representative of real world usage

01:18:40.760 | and that's why the models trained on Open Assistant

01:18:43.640 | didn't really seem like they'd have caught on

01:18:45.680 | in the open source world.

01:18:47.400 | The two leading candidates in my mind

01:18:48.760 | are LMSIS out of UC Berkeley who have the LMSIS arena

01:18:53.080 | which is being touted as one of the only ways,

01:18:56.040 | only reliable benchmarks anymore.

01:18:57.680 | I kinda call them non-parametric benchmarks

01:18:59.720 | 'cause there's nothing to cheat on except for ELO.

01:19:03.780 | And then the other one is OpenRouter

01:19:06.280 | which is Alex Atala's thing.

01:19:07.560 | I don't know if you've talked to any of these people.

01:19:10.020 | - I obviously know all of the efforts

01:19:12.880 | that you talked about.

01:19:14.340 | I haven't talked to them directly about this yet

01:19:17.700 | but the way I think about it is

01:19:20.060 | the way these models are going to be used

01:19:23.300 | is always going to be way more distributed than centralized.

01:19:26.880 | Like which is the power of the open source movement.

01:19:31.580 | Like the UI within which these models are going to be used

01:19:35.460 | is going to be decentralized.

01:19:37.800 | Like these models are going to be integrated

01:19:39.860 | into like hundreds and thousands of projects

01:19:43.320 | and products and all of that, right?

01:19:45.400 | And I think that is important to recognize.

01:19:50.200 | Like the LMSIS leaderboard is the best thing we have

01:19:54.320 | right now to understand whether a model is better or not

01:19:57.900 | versus another model.

01:19:59.640 | But it's also biased and only having a sliver of view

01:20:04.080 | into how people actually use these models.

01:20:05.880 | Like the people who actually end up coming

01:20:07.880 | to the LMSIS leaderboard and then using a model

01:20:10.960 | only use it for certain things.

01:20:13.000 | Like GitHub co-pilot style usage is not captured

01:20:18.000 | in say like LMSIS thing and so many other styles

01:20:22.560 | like the character AI style things is not captured in LMSIS.

01:20:26.980 | - Which OpenRouter could do.

01:20:28.440 | They don't do it right now.

01:20:29.640 | - Yeah, so like I think like yeah, my point is like

01:20:33.920 | the way these models are going to be used

01:20:35.880 | is going to be always a large surface area.

01:20:40.280 | And I think we need to figure out

01:20:41.560 | how to provide infrastructure to integrate

01:20:45.420 | with all these like ways in which it's being used.

01:20:49.460 | Even if you get like the top hundred front ends

01:20:54.180 | that the model like open source models are used through,

01:20:58.560 | subscribe to like the sinkhole.

01:21:01.860 | I think that's already like a substantial thing.

01:21:04.220 | I think like thinking one or two things

01:21:09.140 | will by themselves get a lot of data

01:21:11.700 | I think is not going to happen.

01:21:14.180 | - Yep, fair enough.

01:21:15.180 | Before we let you go,

01:21:18.620 | can we do just a quick beyond text segment?

01:21:21.740 | So you're an investor in Runway,

01:21:23.780 | which is a leader generation.

01:21:25.520 | You're an investor in 1X, which is a humanoid assistant.

01:21:29.660 | Osmo, which is focused on using AI

01:21:32.040 | for smell recognition and synthesis.

01:21:34.580 | You advise a bunch of robotics projects at NYU.

01:21:37.540 | - And he builds his own home robot.

01:21:40.240 | - Yeah, exactly.

01:21:42.040 | On a more, yeah, maybe you have another thing.

01:21:43.800 | What are like the things that you're most excited about

01:21:46.120 | beyond like text generation

01:21:47.800 | and kind of the more mundane usage?

01:21:50.040 | - Yeah, I mean in general,

01:21:51.800 | I have more things I'm generally excited about

01:21:54.420 | than I can possibly do.

01:21:56.920 | Investing is one way to try to clear those urges.

01:22:01.920 | I'm generally excited about robotics being a possibility,

01:22:09.800 | home robotics being like five to seven years away

01:22:16.080 | into commercialization.

01:22:17.560 | I think like it's not like next year or two years from now,

01:22:21.700 | but like five to seven years from now,

01:22:24.040 | I think like a lot more robotics companies might pop out.

01:22:27.960 | There's not a good consensus

01:22:31.680 | on whether hardware is a bottleneck

01:22:33.680 | or AI is a bottleneck in robotics right now.

01:22:36.300 | My view is actually hardware is still the bottleneck

01:22:40.440 | and AI is also a little bit of bottleneck,

01:22:43.240 | but like I don't think there's any like obvious

01:22:46.520 | breakthroughs we need.

01:22:50.240 | I think it just work.

01:22:51.720 | So I'm generally excited about robotics.

01:22:53.600 | I spend a lot of time, a lot of personal time,

01:22:55.980 | I spend like every Wednesday afternoon at NYU

01:22:58.960 | working with Laurel Pinto and team

01:23:01.440 | and just getting towards my like home robot

01:23:05.380 | that just does my dishes and stuff.

01:23:07.920 | - What's the status of it?

01:23:08.760 | Like what does it do for you now?

01:23:10.300 | - As of today, we just deployed a couple months ago,

01:23:15.300 | we deployed our home robotic stuff

01:23:19.520 | into like several tens of New York City homes

01:23:24.160 | and like try to make it do a bunch of tasks.

01:23:26.960 | And we're basically starting to build out a framework

01:23:31.220 | that gets to a certain level of robustness

01:23:34.120 | on fairly simple tasks, like picking this cup

01:23:39.260 | and putting it somewhere else

01:23:40.440 | or like taking a few pieces of cloth on the ground

01:23:44.560 | and put it somewhere else or open your microwave.

01:23:48.820 | Like various like baseline tasks like that

01:23:51.980 | with low sample complexity.

01:23:55.480 | So like the key thing, I think one of the things

01:23:59.080 | people don't spend their time in robotics

01:24:00.640 | is like the user experience,

01:24:02.240 | which I think in the research I do at NYU,

01:24:07.000 | we spend a huge amount of time on.

01:24:09.140 | I think the key there is sample complexity

01:24:11.180 | has to be really low.

01:24:12.740 | A lot of the current robotics research if you see,

01:24:15.660 | they're like, oh yeah, we collected like 50 demos

01:24:18.000 | and now it's able to do this task

01:24:19.800 | or we collected like 300 demos or like...

01:24:22.940 | It's a sample, the number of samples you need

01:24:24.860 | for this thing to do the task is really high.

01:24:27.220 | So we're focusing a lot on...

01:24:29.780 | You show it like two or three times

01:24:32.700 | and that's sufficient for it to actually like do the task.

01:24:35.860 | But it comes with like less generalization, right?

01:24:39.740 | Like there's some initial conditions

01:24:41.560 | that have to be true for it to do the task.

01:24:43.860 | So we're making progress.

01:24:47.360 | That's very interesting in general, the space.

01:24:49.980 | I don't think people in this space

01:24:52.880 | have settled on the hardware,

01:24:55.340 | like how the hardware looks like

01:24:57.580 | for it to be truly useful in the home or whatever.

01:25:00.120 | Or the UX or the like AI/ML stuff needed

01:25:06.580 | to make it sample efficient and all of that.

01:25:10.700 | But I think like lots of work is happening in the field.

01:25:15.500 | - Yeah, one of my friends, Carlo at Berkeley,

01:25:18.140 | he worked on a project called M3L,

01:25:20.020 | which is two CNNs, one for tactile feedback

01:25:23.400 | and one for image.

01:25:25.280 | When you say hardware,

01:25:26.120 | is it running all these things on the edge

01:25:28.960 | or is it just like the actual servos and the...

01:25:33.020 | - Yeah, by hardware I mean like the actual like servos,

01:25:37.380 | like the motors, servos, even like the sensors,

01:25:42.380 | I think we have incredible vision

01:25:47.380 | that still like is so much better compared to

01:25:53.460 | in the field of view and in resolution

01:25:55.320 | compared to any of the cameras we can buy.

01:25:59.120 | We have, our skin is like all available touch sensing

01:26:04.120 | and we have like some of the most efficient,

01:26:09.320 | some of the most high capacity motors

01:26:14.180 | that can lift large loads in like the dexterity

01:26:18.300 | of a hand and stuff.

01:26:19.860 | So in terms of hardware, I mean like in terms

01:26:24.500 | of those capabilities, like we haven't figured out

01:26:28.620 | how to do a lot of this stuff.

01:26:31.860 | I mean Tesla has been making incredible progress.

01:26:34.660 | OneX I think announced their new thing that looks incredible.

01:26:39.660 | Some of the other companies figure

01:26:42.540 | and like others are doing great work.

01:26:44.860 | But we're really not anywhere close to like the hardware

01:26:48.000 | that we feel like we need.

01:26:50.120 | And there's obviously the other thing I want to call out is

01:26:53.600 | a lot of what people show works,

01:27:00.720 | but like has to be fixed all the time.

01:27:02.400 | I mean like that's the other thing we are incredible at.

01:27:05.580 | Like we don't need any maintenance

01:27:07.980 | or like the maintenance is part of us.

01:27:10.520 | If you buy a product, an electronics product of any kind,

01:27:16.020 | you buy a PS5, you don't say, oh yeah,

01:27:18.580 | my PS5 breaks like every six days

01:27:20.900 | and I have to like do some reasonable amount of work on it.

01:27:23.940 | But like that's robotics,

01:27:26.700 | like if it's not industrial robotics

01:27:28.540 | where it's very controlled and specialized or whatever,

01:27:31.580 | like you're talking about reliability like in those ranges.

01:27:35.020 | So I think people don't talk

01:27:37.420 | about the reliability thing enough.

01:27:38.780 | Like when I mean like we're gonna enter

01:27:41.660 | the commercialization phase,

01:27:42.880 | I mean like we're gonna start thinking about,

01:27:45.320 | okay, now we have this thing and we need to figure out

01:27:47.460 | how to get reliability high enough to deploy it into homes

01:27:50.460 | and like just sell it to people

01:27:52.380 | and like Best Buy or something.

01:27:54.340 | So that's the other factor

01:27:56.020 | that we have to make a lot of progress on.

01:27:59.180 | - I just realized that Google has a play in this

01:28:02.740 | with like Palm E and stuff

01:28:04.060 | and OpenAI obviously has a long history

01:28:06.220 | of doing this stuff.

01:28:07.220 | Is there anything at Meta?

01:28:09.760 | No robotics stuff at Meta?

01:28:12.820 | - I used to, we have a small robotics program

01:28:15.620 | at Meta out of FAIR.

01:28:17.280 | I actually used to do it at FAIR a little bit

01:28:19.580 | before I moved into Infra and focused on my Meta time

01:28:23.340 | on a lot of like other infrastructural stuff.

01:28:26.940 | So yeah, Meta's robotics program is a lot smaller.

01:28:30.700 | - Seems like it would be a fit in personal computing.

01:28:36.140 | - You can think of it as like Meta has a ridiculously

01:28:40.380 | large device strategy, right?

01:28:42.380 | Like this is how our reality labs stuff.

01:28:45.840 | Like we're going at it from VR and AR

01:28:48.500 | and we showcase all that stuff.

01:28:50.920 | I think for Meta, like the robot is not as important

01:28:56.280 | as like the physical devices kind of stuff, for sure.

01:29:01.280 | - Okay, I want to touch on Osmo a bit

01:29:04.460 | because very unusual company too.

01:29:06.660 | The stuff that we normally discuss,

01:29:08.240 | not robotics, sense of smell.

01:29:10.360 | The original pitch I heard from the founder,

01:29:14.300 | maybe you can correct me,

01:29:15.140 | is that he realized that you can smell cancer.

01:29:17.580 | Is that intuitive?

01:29:20.060 | Is that what you get?

01:29:21.020 | - Yeah, I mean first like the very interesting reason

01:29:25.540 | I invested in Osmo is because Alex Wilszko,

01:29:28.740 | the founder of Osmo, also was like a,

01:29:33.740 | before PyTorch there was Torch.

01:29:38.420 | And Alex Wilszko actually worked on Torch.

01:29:41.020 | He's actually like a frameworks guy.

01:29:43.660 | He built this thing called Tangent from Google,

01:29:48.120 | like another like alternative framework and stuff.

01:29:52.560 | So I know him from that side of things.

01:29:55.100 | And then he's a neurobiologist by training.

01:29:59.540 | He just happens to also love like neural networks

01:30:03.060 | and like hacking on those frameworks.

01:30:05.140 | So incredibly smart guy, one of the smartest people I know.

01:30:08.560 | So when he was going in this direction,

01:30:11.620 | I thought it was incredible that like smell

01:30:16.620 | is something that we haven't even started to scrape

01:30:20.860 | in terms of digitization.

01:30:22.660 | When we think about audio or images or video,

01:30:26.580 | they're like so advanced that we have the concept

01:30:30.260 | of color spaces.

01:30:31.240 | We have the concept of like frequency spectrums.

01:30:34.300 | Like, you know, we figured out how ears process

01:30:37.080 | like frequencies in mouth spectrum or whatever,

01:30:39.880 | like logarithmically scaled images for like RGB, YUV.

01:30:44.100 | Like we have so many different kinds of parameterizations.

01:30:47.020 | We have formalized these two senses ridiculously.

01:30:53.020 | (laughing)

01:30:55.080 | Touch and smell, nada.

01:30:58.740 | We're like where we were with images and say in 1920

01:31:03.740 | or maybe even the 1800s, right?

01:31:06.500 | That's where we're at.

01:31:07.500 | And Alex has this incredible vision

01:31:10.060 | of like having a smell sensor just eventually

01:31:15.060 | just be part of your daily life.

01:31:18.380 | Like as of today, you don't really think about

01:31:22.100 | like when you're watching an Instagram reel or something,

01:31:24.380 | huh, like I also would love to know what it smelled like

01:31:28.700 | and if you're watching a reel of a food or something.

01:31:32.020 | You don't because we really haven't as a society

01:31:36.180 | got that muscle to even understand

01:31:38.920 | what a smell sensor can do.

01:31:41.500 | I think the more near term effects are obviously

01:31:44.360 | going to be around things that provide more obvious utility

01:31:49.420 | in the short term, like maybe smelling cancer

01:31:52.580 | or like repelling mosquitoes better or stuff like that.

01:31:57.580 | - More recently he's been talking about

01:31:58.900 | like categorizing perfumes.

01:32:00.260 | - Yeah, exactly.

01:32:01.220 | - That's a market that you can pursue.

01:32:02.420 | - Yeah, like I mean think about how you can customize

01:32:06.200 | a perfume to your own liking in the same way

01:32:09.180 | you can customize a shoe or something, right?

01:32:11.940 | So that's I think all the near term stuff.

01:32:15.780 | I think if he's able to figure out a near term value for it,

01:32:20.780 | they as a company can sustain themselves

01:32:24.480 | to then eventually like try to make progress

01:32:26.940 | on the long term which is really in uncharted territory.

01:32:31.600 | Like think about it, 50 years from now,

01:32:35.660 | it would be pretty obvious to like kids of the generation

01:32:38.900 | to just like, I guess I was saying,

01:32:41.780 | I was gonna say scroll a reel on their phone

01:32:43.660 | and maybe phones would be there.

01:32:46.020 | They're just like on their glasses,

01:32:48.540 | they're watching something and then they immediately get

01:32:53.380 | like a smell sense off that remote experience as well.

01:32:57.140 | Like we haven't really progressed enough in that dimension

01:33:02.140 | and I think they have a chance to do it.

01:33:05.460 | - Awesome.

01:33:07.300 | Awesome, I mean we touched on a lot of things.

01:33:09.060 | Anything, we're missing anything

01:33:10.980 | you wanna direct people to or?

01:33:13.360 | - Yeah, call to action, call for research,

01:33:16.660 | call for startups.

01:33:18.060 | - I don't really have a lot of calls to action

01:33:20.060 | because usually I think people

01:33:22.980 | should be intrinsically like.

01:33:24.460 | (laughing)

01:33:25.420 | - That's a good--

01:33:26.260 | - Look inside yourself.

01:33:27.100 | (laughing)

01:33:28.920 | - That's good, awesome.

01:33:30.480 | Thank you so much for coming on.

01:33:31.560 | - Yeah, for sure.

01:33:32.600 | - Thanks a bit.

01:33:33.440 | (upbeat music)

01:33:36.020 | (upbeat music)

01:33:38.600 | (upbeat music)

01:33:41.200 | (upbeat music)

01:33:43.780 | (upbeat music)

01:33:46.380 | (upbeat music)

01:33:48.960 | (upbeat music)

01:33:51.540 | (upbeat music)

01:33:54.120 | (upbeat music)

01:33:56.700 | (upbeat music)

01:33:59.280 | (upbeat music)

01:34:01.860 | (upbeat music)

01:34:04.440 | (upbeat music)

01:34:07.020 | (upbeat music)

01:34:09.600 | (upbeat music)

01:34:12.180 | (upbeat music)

01:34:14.760 | (upbeat music)

01:34:17.340 | (upbeat music)

01:34:19.920 | (upbeat music)

01:34:22.500 | (upbeat music)

01:34:25.080 | (upbeat music)

01:34:27.660 | (upbeat music)

01:34:30.240 | (upbeat music)

01:34:32.820 | (upbeat music)

01:34:35.400 | (upbeat music)

01:34:37.980 | (upbeat music)

01:34:40.560 | (upbeat music)

01:34:43.140 | (upbeat music)

01:34:45.720 | (upbeat music)

01:34:48.300 | (upbeat music)

01:34:50.880 | (upbeat music)

01:34:53.460 | (upbeat music)

01:34:56.040 | (upbeat music)

01:34:58.620 | (upbeat music)

01:35:01.200 | (upbeat music)

01:35:03.780 | (upbeat music)

01:35:06.360 | (upbeat music)

01:35:08.940 | (upbeat music)

01:35:11.520 | (upbeat music)

01:35:14.100 | (upbeat music)

01:35:16.680 | (upbeat music)

01:35:19.260 | (upbeat music)

01:35:21.840 | (upbeat music)

01:35:24.420 | (upbeat music)

01:35:27.000 | (upbeat music)

01:35:29.580 | (upbeat music)

01:35:32.160 | (upbeat music)

01:35:34.740 | (upbeat music)

01:35:37.320 | (upbeat music)

01:35:39.900 | (upbeat music)

01:35:42.480 | (upbeat music)

01:35:45.060 | (upbeat music)

01:35:47.640 | (upbeat music)

01:35:50.220 | (upbeat music)

01:35:52.800 | (upbeat music)

01:35:55.380 | (upbeat music)

01:35:57.960 | (upbeat music)

01:36:00.540 | (upbeat music)

01:36:03.120 | (upbeat music)

01:36:05.700 | (upbeat music)

01:36:08.280 | (upbeat music)

01:36:10.860 | (upbeat music)

01:36:13.440 | (upbeat music)

01:36:16.020 | (upbeat music)

01:36:18.600 | (upbeat music)

01:36:21.180 | (upbeat music)

01:36:23.760 | (upbeat music)

01:36:26.340 | (upbeat music)

01:36:28.920 | (upbeat music)

01:36:31.500 | (upbeat music)

01:36:34.080 | (upbeat music)

01:36:36.660 | (upbeat music)

01:36:39.240 | (upbeat music)

01:36:41.820 | (upbeat music)

01:36:44.400 | (upbeat music)

01:36:46.980 | (upbeat music)

01:36:49.560 | (upbeat music)

01:36:52.140 | (upbeat music)

01:36:54.720 | (upbeat music)

01:36:57.300 | (upbeat music)

01:36:59.880 | (upbeat music)

01:37:02.460 | (upbeat music)

01:37:05.040 | (upbeat music)

01:37:07.620 | (upbeat music)

01:37:10.200 | (upbeat music)

01:37:12.780 | (upbeat music)

01:37:15.360 | (upbeat music)

01:37:17.940 | (upbeat music)

01:37:20.520 | (upbeat music)

01:37:23.100 | (upbeat music)

01:37:25.680 | (upbeat music)

01:37:28.260 | (upbeat music)

01:37:30.840 | (upbeat music)

01:37:33.420 | [BLANK_AUDIO]

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI

Chapters