back to index

Open Source AI is AI we can Trust — with Soumith Chintala of Meta AI


Chapters

0:0 Introductions
0:51 Extrinsic vs Intrinsic Success
2:58 Importance of Open Source and Its Impact
4:25 PyTorch vs TinyGrad
10:23 Why PyTorch is the Switzerland of frameworks
12:44 Modular's Mojo + PyTorch?
16:12 PyTorch vs Apple's MLX
19:46 FAIR / PyTorch Alumni
22:38 How can AI inference providers differentiate?
25:48 How to build good benchmarks and learnings from AnyScale's
29:51 Most interesting unexplored ideas
33:23 What people get wrong about synthetic data
42:0 Meta AI's evolution
45:20 How do you allocate 600,000 GPUs?
49:24 Even the GPU Rich are GPU Poor
55:49 Meta's MTIA silicon
58:56 Why we need open source
67:0 Open source's coordination problem for feedback gathering
81:16 Beyond text generation
89:2 Osmo and the Future of Smell Recognition Technology

Whisper Transcript | Transcript Only Page

00:00:00.000 | Hey, everyone.
00:00:00.960 | Welcome to the Lead in Space Podcast.
00:00:02.820 | This is Alessio, partner and CTO in Residence
00:00:05.400 | at Decibel Partners.
00:00:06.480 | And I'm joined by my co-host, Svex, founder of Small.ai.
00:00:09.480 | Hey, and today we have in the studio, Sumev Chantala.
00:00:11.720 | Welcome.
00:00:12.240 | Thanks for having me.
00:00:13.800 | On one of your rare visits from New York, where you live.
00:00:18.640 | You got your start in computer vision at NYU with Yann LeCun.
00:00:25.320 | That was a very fortuitous start.
00:00:27.300 | I was actually listening to your interview
00:00:29.460 | on the Gradient Podcast.
00:00:30.440 | So if people want to know more about the history of Sumev,
00:00:33.320 | history of PyTorch, they can go to that podcast.
00:00:35.580 | We won't spend that much time there.
00:00:37.240 | But I just was marveling at your luck,
00:00:39.400 | or I don't know if it's your luck or your drive
00:00:42.280 | to find AI early and then find the right quality mentor.
00:00:47.460 | Because I guess Yann really introduced you to that world.
00:00:51.480 | You're talking about extrinsic success, right?
00:00:54.000 | A lot of people just have drive to do things
00:00:57.920 | that they think is fun.
00:01:00.080 | And a lot of those things might or might not
00:01:02.760 | be extrinsically perceived as good and successful.
00:01:06.960 | I think I just happen to like something
00:01:10.620 | that is now like one of the coolest things in the world
00:01:15.460 | or whatever.
00:01:16.580 | But if I happen--
00:01:18.640 | the first thing I tried to become was 3D VFX artists.
00:01:25.080 | And I was really interested in doing that,
00:01:28.560 | but I turned out to be very bad at it.
00:01:31.360 | So I ended up not doing that further.
00:01:33.300 | But even if I was good at that, whatever,
00:01:36.400 | and I ended up going down that path,
00:01:39.480 | I probably would have been equally happy.
00:01:41.800 | It's just like maybe the perception of, oh,
00:01:45.060 | is this person successful or not might be different.
00:01:48.120 | But I think after a baseline, your happiness is probably
00:01:53.540 | more correlated with your intrinsic stuff.
00:01:57.020 | I think Dan Pink has this book on drive
00:01:59.680 | that I often refer to about the power of intrinsic motivation
00:02:03.020 | versus extrinsic and how long extrinsic lasts.
00:02:05.520 | It's not very long at all.
00:02:07.440 | But anyway, now you are an investor in Runway.
00:02:09.640 | So in a way, you're working on VFX.
00:02:13.160 | I mean, in a very convoluted way.
00:02:15.560 | It reminds me of the Ed Catmull.
00:02:17.840 | I don't know if you guys know.
00:02:19.680 | He actually tried to become an animator in his early years
00:02:22.520 | and failed, or didn't get accepted by Disney,
00:02:25.260 | and then went and created Pixar and then got bought by Disney
00:02:28.760 | and created Toy Story.
00:02:31.340 | So you joined Facebook in 2014 and eventually became
00:02:35.480 | creator and maintainer of PyTorch.
00:02:37.500 | And there's this long story there
00:02:39.500 | you can refer to on the gradient.
00:02:41.060 | But you also-- I think maybe people don't know that you also
00:02:44.040 | involved in more hardware and cluster decision affair.
00:02:47.000 | And we can dive into more details
00:02:48.500 | there, because we're all about hardware this month.
00:02:52.600 | And then finally, I don't know what else should people
00:02:55.860 | know about you on the personal side or the professional side.
00:02:58.360 | I think open source is definitely
00:03:00.940 | like a big passion of mine and probably forms
00:03:03.620 | a little bit of my identity at this point.
00:03:05.400 | I am irrationally interested in open source.
00:03:11.980 | It's like one of those things that I attribute to--
00:03:17.600 | I think open source has that fundamental way
00:03:21.220 | to distribute opportunity in a way that is very powerful.
00:03:27.680 | I grew up in India.
00:03:30.740 | I didn't have internet for a while.
00:03:33.940 | And in college, actually, I didn't have internet,
00:03:36.620 | except for like GPRS or whatever.
00:03:40.880 | So just having-- and knowledge was very centralized.
00:03:45.920 | But I saw that evolution of knowledge
00:03:47.820 | slowly getting decentralized.
00:03:49.940 | And that ended up helping me learn quicker and faster
00:03:54.560 | for like $0.
00:03:56.460 | And I think that was a strong reason why
00:04:00.460 | I ended up where I am.
00:04:02.420 | So the open source side of things,
00:04:04.900 | I always push regardless of what I get paid for.
00:04:10.100 | I think I would do that as a passion project on the side.
00:04:13.440 | Yeah, that's wonderful.
00:04:14.480 | And we will talk about the challenges
00:04:16.500 | as well that open source has, open models versus closed
00:04:19.780 | models.
00:04:20.860 | But maybe you want to touch a little bit on PyTorch
00:04:23.020 | before we move on to sort of meta AI in general.
00:04:25.500 | Yeah, we kind of touched on PyTorch in a lot of episodes.
00:04:28.500 | So we had George Hotz from TinyGret.
00:04:31.660 | He called PyTorch a CISC and TinyGret a RISC.
00:04:36.500 | I would love to get your thoughts on PyTorch design
00:04:40.460 | direction as far as--
00:04:42.420 | I know you talk a lot about kind of having a happy path
00:04:45.900 | to start with and then making complexity hidden away,
00:04:48.420 | but then available to the end user.
00:04:50.800 | One of the things that George mentioned
00:04:52.340 | is I think you have like 250 primitive operators in PyTorch.
00:04:56.120 | I think TinyGret is four.
00:04:57.600 | So how do you think about some of the learnings
00:05:02.020 | that maybe he's going to run into that you already
00:05:04.520 | had in the past seven, eight years almost of running PyTorch?
00:05:08.720 | Yeah, I think everyone starts--
00:05:12.480 | there's different models here, but I
00:05:13.940 | think it's two different models that people generally
00:05:16.560 | start with.
00:05:17.040 | Either they go like, I have a grand vision,
00:05:19.760 | and I'm going to build a giant system that
00:05:22.000 | achieves this grand vision.
00:05:23.060 | And my B1 is like super complex, feature complete, whatever.
00:05:28.840 | Or other people say they will get incrementally ambitious.
00:05:33.160 | They say, oh, we'll start with something simple,
00:05:35.120 | and then we'll slowly layer out complexity in a way
00:05:37.700 | that optimally applies Huffman coding or whatever.
00:05:42.860 | Where the density of users are and what they're using,
00:05:47.680 | I would want to keep it in the easy, happy path.
00:05:50.440 | And where the more niche advanced use cases,
00:05:53.520 | I still want people to try them, but they
00:05:57.140 | need to take additional frictional steps.
00:06:01.360 | George, I think, just like we started with PyTorch,
00:06:05.000 | George started with the incrementally ambitious thing.
00:06:09.660 | I remember TinyGrad used to be like we
00:06:14.780 | would be limited to 1,000 lines of code,
00:06:16.680 | and I think now it's like 5,000.
00:06:19.440 | So I think there is no real magic to which why PyTorch
00:06:25.320 | has a kind of complexity.
00:06:26.640 | I think it's probably partly necessitated and partly
00:06:32.100 | because we built with the technology available under us
00:06:35.840 | at that time.
00:06:38.320 | PyTorch is like 190,000 lines of code
00:06:41.220 | or something at this point.
00:06:43.120 | I think if we had to rewrite it, we would probably
00:06:45.960 | think about ways to rewrite it in a vastly simplified way,
00:06:51.480 | for sure.
00:06:52.980 | But a lot of that complexity comes from the fact
00:06:55.840 | that in a very simple, explainable way,
00:07:02.180 | you have memory hierarchies.
00:07:05.160 | CPU has like three levels of caches,
00:07:07.720 | and then you have DRAM and SSD, and then you have network.
00:07:13.680 | Similarly, GPU has several levels of memory,
00:07:17.220 | and then you have different levels of network hierarchies,
00:07:19.960 | NVLink plus InfiniBand or Rocky or something like that.
00:07:26.680 | And the way the flops are available on your hardware,
00:07:31.960 | they are available in a certain way,
00:07:34.040 | and your computation is in a certain way,
00:07:36.040 | and you have to retrofit your computation
00:07:37.880 | onto both the memory hierarchy and the flops available.
00:07:42.620 | When you're doing this, it is actually
00:07:45.040 | like a fairly hard mathematical problem to do this setup,
00:07:52.960 | like find the optimal thing.
00:07:55.000 | And finding the optimal thing is like, what is optimal?
00:07:58.440 | What is optimal depends on the input variables themselves.
00:08:02.440 | So like, OK, what is the shape of your input tensors,
00:08:05.240 | and what is the operation you're trying to do,
00:08:08.560 | and various things like that.
00:08:12.000 | Finding that optimal configuration
00:08:16.280 | and writing it down in code is not
00:08:20.240 | the same for every input configuration you have.
00:08:27.400 | For example, just as the shape of the tensors change,
00:08:31.560 | let's say you have three input tensors into a sparse dot
00:08:38.280 | product or something like that.
00:08:40.800 | The shape of each of these input tensors
00:08:43.000 | will vastly change how you do this optimally placing
00:08:48.640 | this operation onto the hardware in a way that will
00:08:51.440 | get you maximal throughput.
00:08:53.440 | So a lot of our complexity comes from writing out
00:08:59.240 | like hundreds of configurations for each single PyTorch
00:09:03.840 | operator and templatizing these things
00:09:07.200 | and symbolically generating the final CUDA code or CPU code.
00:09:15.000 | There's no way to avoid it, because mathematically we
00:09:17.080 | haven't found symbolic ways to do this that also
00:09:23.000 | keep compile time near zero.
00:09:26.640 | You can write a very simple framework,
00:09:29.600 | but then you also should be willing to eat
00:09:33.040 | the long compile times of searching
00:09:35.440 | for that optimal performance at runtime.
00:09:38.000 | So that's the trade-off.
00:09:40.520 | I don't think, unless we have great breakthroughs,
00:09:43.920 | George's vision is achievable.
00:09:47.640 | Or he should be thinking about a narrower problem, such as,
00:09:51.240 | I'm only going to make this work for self-driving car continents.
00:09:55.920 | Or I'm only going to make this work for LLM transformers
00:10:00.440 | of the llama style.
00:10:02.400 | If you start narrowing the problem down,
00:10:04.320 | you can make a vastly simpler framework.
00:10:07.800 | But if you don't, if you need the generality
00:10:10.480 | to power all of the AI research that is happening
00:10:13.720 | and keep zero compile time and all these other factors,
00:10:18.120 | I think it's not easy to avoid the complexity.
00:10:23.240 | That's interesting.
00:10:24.080 | We kind of touched on this with Chris Ladner
00:10:26.760 | when he was on the podcast.
00:10:28.160 | If you think about frameworks, they have the model target.
00:10:31.520 | They have the hardware target.
00:10:32.840 | They have different things to think about.
00:10:34.880 | He mentioned, when he was at Google,
00:10:36.560 | TensorFlow is trying to be optimized to make TPUs go brr
00:10:40.840 | and go as fast.
00:10:43.000 | I think George is trying to make, especially AMD stack,
00:10:46.200 | be better than Rockum.
00:10:47.880 | How come PyTorch has been such as Switzerland
00:10:50.840 | versus just making meta hardware go brr?
00:10:54.320 | First, meta is not in the business of selling hardware.
00:10:57.760 | Meta is not in the business of cloud compute.
00:11:03.640 | We kind of-- the way meta thinks about funding PyTorch is it's
00:11:11.640 | just like we're funding it because it's net good for meta
00:11:17.000 | to fund PyTorch because PyTorch has become a standard
00:11:20.280 | and a big open source project.
00:11:22.440 | And generally, it gives us a timeline edge.
00:11:27.240 | It gives us various leverage and all that within our own work.
00:11:32.920 | So why is PyTorch more of a Switzerland
00:11:38.080 | rather than being opinionated?
00:11:40.440 | I think the way we think about it is not in terms of Switzerland
00:11:43.800 | or not.
00:11:44.440 | Actually, the way we articulated to all hardware vendors
00:11:47.940 | and software vendors and all who come to us being like,
00:11:51.600 | we want to build a backend in core for PyTorch
00:11:54.080 | and ship it by default is we just only look
00:11:57.880 | at our user side of things.
00:12:00.160 | If users are using a particular piece of hardware,
00:12:03.560 | then we want to support it.
00:12:05.460 | We very much don't want to king make the hardware
00:12:09.260 | side of things.
00:12:11.720 | So as the MacBooks have GPUs and as that stuff
00:12:16.880 | started getting increasingly interesting,
00:12:19.960 | we pushed Apple to push some engineers and work
00:12:24.080 | on the MPS support.
00:12:25.040 | And we spent significant time from like meta funded
00:12:28.320 | engineers on that as well.
00:12:29.820 | Because a lot of people are using the Apple GPUs
00:12:34.040 | and there is demand.
00:12:35.360 | So we kind of mostly look at it from the demand side.
00:12:38.360 | We never look at it from like, oh,
00:12:40.960 | which hardware should we start taking opinions on?
00:12:44.480 | Is there a future in which-- because Mojo or Modulus
00:12:48.720 | Mojo is kind of a superset of Python--
00:12:50.360 | is there a future in which PyTorch might use
00:12:53.400 | Mojo features optionally?
00:12:55.560 | I think it depends on how well integrated
00:12:57.920 | it is into the Python ecosystem.
00:13:01.760 | So if Mojo is like a PIP install and it's readily available
00:13:06.960 | and users feel like they can use Mojo so smoothly
00:13:11.680 | within their workflows within--
00:13:14.840 | in a way that just is slow friction,
00:13:19.780 | we would definitely look into that.
00:13:21.440 | In the same way, PyTorch now depends on Triton,
00:13:24.720 | like OpenAI Triton.
00:13:26.500 | And we never had a conversation that was like, huh,
00:13:32.280 | that's like a dependency.
00:13:33.640 | Should we just build a Triton of our own
00:13:36.640 | or should we use Triton?
00:13:38.720 | It almost doesn't-- those conversations don't really
00:13:41.600 | come up for us.
00:13:43.000 | The conversations are more like, well, does Triton
00:13:45.200 | have 10,000 dependencies and is it hard to install?
00:13:48.760 | We almost don't look at these things
00:13:51.160 | from a strategic leverage point of view.
00:13:54.160 | We look at these things from a user experience point of view.
00:13:57.880 | Is it easy to install?
00:13:59.080 | Is it like smoothly integrated?
00:14:00.600 | If so, we should consider--
00:14:02.840 | and does it give enough benefits for us
00:14:04.680 | to start depending on it?
00:14:05.760 | If so, yeah, we should consider it.
00:14:06.800 | That's how we think about it.
00:14:07.960 | You're inclusive by default as long as it
00:14:09.720 | meets the minimum bar.
00:14:11.160 | Yeah.
00:14:12.440 | But maybe I phrased it wrongly.
00:14:14.360 | Maybe it's more like, OK, what problems
00:14:16.100 | would you look to solve that you have right now?
00:14:20.320 | I think it depends on what problems
00:14:21.880 | Mojo will be useful at.
00:14:25.680 | It's more performance, mainly a performance pitch,
00:14:28.520 | some amount of cross-compiling pitch.
00:14:30.960 | Yeah, I think the performance pitch for Mojo was like,
00:14:34.320 | we're going to performant even if you
00:14:37.760 | have a lot of custom stuff.
00:14:41.360 | You can write arbitrary custom things,
00:14:43.400 | and we will be performant.
00:14:45.760 | And that value proposition is not
00:14:49.520 | clear to us from the PyTorch side
00:14:53.840 | to consider it for PyTorch.
00:14:56.180 | So PyTorch exposes-- it's actually not 250 operators,
00:15:01.160 | like 1,000 operators.
00:15:02.360 | PyTorch exposes about 1,000 operators,
00:15:04.400 | and people write their ideas in the 1,000 operators of PyTorch.
00:15:10.080 | Mojo is like, well, maybe it's OK to completely sidestep
00:15:17.240 | those 1,000 operators of PyTorch and just write it
00:15:20.160 | in a more natural form, just write like raw Python,
00:15:23.440 | write for loops or whatever.
00:15:25.400 | So from the consideration of how do we intersect PyTorch
00:15:31.160 | with Mojo, I can see one use case
00:15:33.600 | where you have custom stuff for some parts of your program,
00:15:39.800 | but mostly it's PyTorch.
00:15:41.480 | And so we can probably figure out
00:15:42.880 | how to make it easier for, say, torch.compile to smoothly also
00:15:49.200 | consume Mojo subgraphs, and the interoperability
00:15:53.640 | being actually usable.
00:15:56.520 | That I think is valuable.
00:15:57.560 | But Mojo as a fundamental front end
00:16:00.480 | would be replacing PyTorch, not augmenting PyTorch.
00:16:06.240 | So in that sense, I don't see a synergy in more deeply
00:16:10.800 | integrating Mojo.
00:16:12.760 | So call out to Mojo whenever they
00:16:15.040 | have written something in Mojo and there's some performance
00:16:18.720 | related thing going on.
00:16:20.340 | And then since you mentioned Apple,
00:16:24.160 | what should people think of PyTorch versus MLX?
00:16:26.720 | I mean, MLX is early, and I know the folks well.
00:16:32.160 | Ani used to work at FAIR, and I used to chat with him
00:16:38.260 | all the time.
00:16:38.840 | He used to be based out of New York as well.
00:16:42.560 | The way I think about MLX is that MLX is specialized
00:16:49.800 | for Apple right now.
00:16:53.760 | It has a happy path because it's defined
00:16:58.140 | its product in a narrow way.
00:17:00.400 | At some point, MLX either says we will only
00:17:05.040 | be supporting Apple and we will just focus on enabling--
00:17:11.000 | this is a framework if you use your MacBook,
00:17:13.060 | but once you go server side or whatever, that's not my problem
00:17:16.400 | and I don't care.
00:17:19.000 | Or MLX, it enters the server side set of things as well.
00:17:24.240 | One of these two things will happen, right?
00:17:26.240 | If the first thing will happen, MLX's overall addressable
00:17:29.280 | market will be small, but it'll probably
00:17:32.040 | do well within that addressable market.
00:17:34.920 | If it enters the second phase, they're
00:17:37.640 | going to run into all the same complexities
00:17:39.440 | that we have to deal with.
00:17:42.200 | They will not have any magic wand,
00:17:44.120 | and they will have vastly more complex work to do.
00:17:49.460 | They probably wouldn't be able to move as fast in certain ways.
00:17:52.800 | Like having to deal with distributed compute.
00:17:55.020 | Distributed, NVIDIA-named GPUs, just
00:17:58.520 | like having a generalization of the concept of a back end,
00:18:02.400 | how they treat compilation with plus overheads.
00:18:07.000 | Right now, they deeply assume the whole MPS graph thing.
00:18:12.480 | So they need to think about all these additional things
00:18:16.680 | if they end up expanding onto the server side.
00:18:19.480 | And they'll probably build something like PyTorch
00:18:22.720 | as well, right?
00:18:23.760 | Eventually, that's where it will land.
00:18:26.020 | And I think there they will fail on the lack of differentiation.
00:18:31.780 | It wouldn't be obvious to people why they would want to use it.
00:18:36.200 | I mean, there are some cloud companies offering M1 and M2
00:18:39.000 | chips on servers.
00:18:41.120 | I feel like it might be interesting for Apple
00:18:43.320 | to pursue that market, but it's not their core.
00:18:45.760 | Yeah, I mean, if Apple can figure out their interconnect
00:18:49.080 | story, maybe, then it can become a thing.
00:18:52.480 | Honestly, that's more interesting than the cars.
00:18:56.160 | I think the mode that NVIDIA has right now, I feel like,
00:18:59.760 | is that they have the interconnect
00:19:01.940 | that no one else has.
00:19:03.820 | AMD GPUs are pretty good.
00:19:06.940 | I'm sure there is very silicon that is not bad at all.
00:19:10.660 | But the interconnect, like NVLink, is uniquely awesome.
00:19:16.340 | So I'm sure the other hardware providers are working on it.
00:19:21.060 | I feel like when you say it's uniquely awesome, you
00:19:23.260 | have some appreciation of it that the rest of us don't.
00:19:25.840 | I mean, the rest of us just like--
00:19:27.220 | we hear marketing lines, but what
00:19:28.800 | do you mean when you say NVIDIA is very good at networking?
00:19:32.000 | Obviously, they made the acquisition maybe 15 years ago.
00:19:34.420 | It's just like the bandwidth it offers
00:19:37.120 | and the latency it offers.
00:19:38.660 | I mean, TPUs also have a good interconnect,
00:19:41.660 | but you can't buy them.
00:19:43.080 | So you have to go to Google to use it.
00:19:46.700 | Who are some of the other fair PyTorch alumni that
00:19:50.100 | are building cool companies?
00:19:51.220 | I know you have Fireworks AI, Lightning AI, Lepton.
00:19:55.180 | And Youngking, you knew since college
00:19:58.420 | when he was building coffee.
00:20:00.060 | Yeah, so Yanqing and I used to be framework rivals,
00:20:03.780 | like Cafe, Torch.
00:20:06.460 | I mean, we were all a very small, close-knit community
00:20:09.180 | back then.
00:20:13.060 | Cafe, Torch, Tiano, Chainer, Keras, various frameworks.
00:20:22.820 | I mean, it used to be more like 20 frameworks.
00:20:25.820 | I can't remember all the names.
00:20:27.540 | CCB by Liu Liu, who is also based out of SF.
00:20:33.700 | And one of the ways it was interesting
00:20:37.420 | is you went into the framework guts
00:20:39.900 | and saw if someone wrote their own convolution kernel,
00:20:43.540 | or they were just copying someone else's.
00:20:47.140 | And there were four or five convolution kernels
00:20:50.940 | that were unique and interesting.
00:20:53.900 | There was one from this guy out of Russia.
00:20:57.620 | I forgot the name.
00:21:00.380 | But I remembered who was awesome enough
00:21:03.900 | to have written their own kernel.
00:21:08.180 | And at some point there, I built out these benchmarks
00:21:13.340 | called ConNet benchmarks that they were just
00:21:18.020 | benchmarking all the convolution kernels that
00:21:21.780 | were available at that time.
00:21:25.380 | And it hilariously became big enough that at that time,
00:21:30.060 | AI was getting important, but not important enough
00:21:34.020 | that industrial strength players came
00:21:37.020 | in to do these kind of benchmarking and standardization.
00:21:39.660 | Like we have MLPerf today.
00:21:41.780 | So a lot of the startups were using ConNet benchmarks
00:21:47.980 | in their pitch decks as like, oh, you know,
00:21:51.380 | on ConNet benchmarks, this is how we fare,
00:21:54.220 | so you should fund us.
00:21:55.820 | I remember Nirvana actually was at the top of the pack
00:21:58.420 | because Scott Gray wrote amazingly fast convolution
00:22:03.220 | kernels at that time.
00:22:06.540 | Very interesting, but separate times.
00:22:08.020 | But to answer your question, Alessio,
00:22:10.660 | I think mainly Lepton fireworks are the two most obvious ones.
00:22:17.460 | But I'm sure the fingerprints are a lot wider.
00:22:27.060 | They're just people who worked within the PyTorch Cafe
00:22:31.620 | to a cohort of things and now end up
00:22:35.260 | at various other places.
00:22:38.980 | I think both as an investor and people looking
00:22:45.100 | to build on top of their services,
00:22:47.940 | it's an uncomfortable slash I don't
00:22:51.740 | know what I don't know pitch.
00:22:53.500 | Because I've met Yang Ting and I've met--
00:22:56.260 | Lin Chao.
00:22:57.060 | Yeah, I've met these folks.
00:22:59.060 | And they're like, you know, we are deep in the PyTorch
00:23:02.380 | ecosystem, and we serve billions of inferences a day
00:23:05.140 | or whatever at Facebook, and now we can do it for you.
00:23:07.980 | And I'm like, OK, that's great.
00:23:10.220 | What should I be wary of or cautious of
00:23:12.740 | when these things happen?
00:23:13.900 | Because I'm like, obviously, this experience
00:23:16.580 | is extremely powerful and valuable.
00:23:20.660 | I just don't know what I don't know.
00:23:22.580 | What should people know about these sort of new inference
00:23:26.380 | as a service companies?
00:23:28.140 | At that point, you would be investing in them
00:23:30.420 | for their expertise of one kind.
00:23:33.940 | So if they've been at a large company,
00:23:38.180 | but they've been doing amazing work,
00:23:39.900 | you would be thinking about it as like, OK,
00:23:41.860 | what these people bring to the table
00:23:43.660 | is that they're really good at GPU programming
00:23:48.140 | or understanding the complexity of serving models
00:23:52.780 | once it hits a certain scale, various expertise
00:23:58.380 | from the infra and AI and GPUs point of view.
00:24:03.780 | What you would obviously want to figure out
00:24:06.980 | is whether their understanding of the external markets
00:24:12.300 | is clear, whether they know and understand
00:24:15.420 | how to think about running a business,
00:24:19.980 | understanding how to be disciplined about making money,
00:24:23.860 | or various things like that.
00:24:25.540 | Maybe I'll put it--
00:24:26.980 | actually, I will de-emphasize the investing bit,
00:24:29.020 | and just more as a potential customer.
00:24:31.820 | It's more like, OK, you're PyTorch gods, of course.
00:24:37.260 | What else should I know?
00:24:39.020 | I mean, I would not care about who's building something
00:24:42.820 | if I'm trying to be a customer.
00:24:44.220 | I would care about whether--
00:24:45.860 | The benchmarks.
00:24:46.660 | Yeah, I use it.
00:24:48.580 | And it's usability, and reliability, and speed.
00:24:53.020 | Quality as well.
00:24:53.980 | Yeah, if someone from some random unknown place
00:24:58.860 | came to me and said, user stuff is great,
00:25:04.100 | and I have the bandwidth, I probably will give it a shot.
00:25:06.780 | And if it turns out to be great, I'll just use it.
00:25:10.300 | OK, great.
00:25:11.700 | And then maybe one more thing about benchmarks,
00:25:13.660 | since we already brought it up, and you brought up
00:25:15.420 | Confnet benchmarks.
00:25:16.620 | There was some recent drama around Antiscale.
00:25:20.660 | Antiscale released their own benchmarks,
00:25:22.340 | and obviously they looked great on their own benchmarks.
00:25:24.660 | But maybe didn't give the other--
00:25:28.220 | I feel like there are two lines of criticism.
00:25:30.260 | One, which is they didn't test apples for apples
00:25:33.620 | on the kind of endpoints that the other providers
00:25:36.940 | that they are competitors with on their benchmarks.
00:25:39.900 | And that is due diligence baseline.
00:25:41.980 | And then the second would be more just
00:25:43.700 | optimizing for the right thing.
00:25:45.700 | You had some commentary on it.
00:25:46.940 | I'll just let you riff.
00:25:48.060 | Yeah, I mean, in summary, basically my criticism
00:25:53.140 | that Antiscale built these benchmarks for end users
00:25:58.780 | to just understand what they should pick.
00:26:01.340 | And that's a very good thing to do.
00:26:03.900 | I think what they didn't do a good job of
00:26:06.060 | is give that end user a full understanding of what
00:26:11.060 | they should pick.
00:26:11.780 | They just gave them a very narrow slice
00:26:14.900 | of understanding.
00:26:15.660 | I think they just gave them latency numbers,
00:26:19.340 | and that's not sufficient.
00:26:22.980 | You need to understand your total cost of ownership
00:26:26.300 | at some reasonable scale.
00:26:27.700 | Not like, oh, like one API call is like $0.01,
00:26:30.540 | but like 1,000 API calls are like $0.10.
00:26:36.580 | People can misprice to cheat on those benchmarks.
00:26:39.220 | So you want to understand, OK, how much is it
00:26:42.860 | going to cost me if I actually subscribe to you
00:26:45.980 | and do like a million API calls a month or something?
00:26:49.460 | And then you want to understand the latency and reliability,
00:26:55.340 | not just from one call you made, but an aggregate of calls
00:27:01.140 | you made over various times of the day and times of the week
00:27:05.540 | and the nature of the workloads.
00:27:08.260 | Is it just like some generic single paragraph
00:27:11.220 | that you're sending that is cashable,
00:27:13.260 | or is it like testing a real world workload?
00:27:17.540 | I think that kind of rigor in presenting
00:27:21.060 | that benchmark wasn't there.
00:27:22.460 | It was a much more narrow sliver of what should
00:27:26.300 | have been a good benchmark.
00:27:28.300 | That was my main criticism.
00:27:30.060 | And I'm pretty sure if before they released it,
00:27:33.580 | they showed it to their other stakeholders who
00:27:38.580 | would be caring about this benchmark
00:27:40.940 | because they are present in it, they
00:27:43.020 | would have easily just pointed out these gaps.
00:27:46.020 | And I think they didn't do that, and they just released it.
00:27:50.020 | So I think those were the two main criticisms.
00:27:52.620 | And I think they were fair, and Robert took it well.
00:27:54.980 | He took it very well.
00:27:56.060 | Yeah, we'll have him on at some point, and we'll discuss it.
00:27:58.820 | But I think it's important for--
00:28:00.140 | I think the market being maturing enough
00:28:01.900 | that people start caring and competing
00:28:03.500 | on these kinds of things means that we
00:28:05.340 | need to establish what best practice is,
00:28:07.740 | because otherwise everyone's going to play dirty.
00:28:09.780 | Yeah, absolutely.
00:28:11.860 | My view of the LLM inference market in general
00:28:14.380 | is that it's like the laundromat model.
00:28:19.260 | The margins are going to drive down towards the bare minimum.
00:28:23.940 | It's going to be all kinds of arbitrage between how much you
00:28:26.820 | can get the hardware for and then how much you sell the API
00:28:30.300 | and how much latency your customers are
00:28:32.660 | willing to let go.
00:28:34.500 | You need to figure out how to squeeze your margins.
00:28:36.620 | What is your unique thing here?
00:28:40.260 | I think Together and Fireworks and all these people
00:28:42.860 | are trying to build some faster CUDA kernels and faster
00:28:48.060 | hardware kernels in general.
00:28:50.540 | But those modes only last for a month or two.
00:28:53.180 | These ideas quickly propagate.
00:28:55.340 | Even if they're not published?
00:28:57.580 | Even if they're not published, the idea space is small.
00:29:03.460 | So even if they're not published,
00:29:06.460 | the discovery rate is going to be pretty high.
00:29:09.020 | It's not like we're talking about a combinatorial thing
00:29:11.620 | that is really large.
00:29:13.300 | You're talking about like llama-style LLM models,
00:29:17.460 | and we're going to beat those to death
00:29:19.900 | on a few different hardware SKUs.
00:29:23.180 | It's not even like we have a huge diversity of hardware
00:29:26.860 | you're going to aim to run it on.
00:29:28.740 | Now when you have such a narrow problem
00:29:31.020 | and you have a lot of people working on it,
00:29:32.940 | the rate at which these ideas are going to get figured out
00:29:35.740 | is going to be pretty rapid.
00:29:36.620 | Is it like a standard bag of tricks?
00:29:38.180 | The standard one that I know of is fusing operators
00:29:41.500 | and--
00:29:41.620 | Yeah, it's the standard bag of tricks
00:29:43.420 | on figuring out how to improve your memory bandwidth
00:29:48.780 | and all that.
00:29:49.580 | OK, interesting.
00:29:51.420 | Any ideas instead of things that are not being beaten to death
00:29:54.700 | that people should be paying more attention to?
00:29:56.900 | One thing I was like, you have 1,000 operators.
00:29:59.180 | What's the most interesting usage of PyTorch
00:30:01.260 | that you're seeing maybe outside of this little bubble?
00:30:04.380 | So PyTorch, it's very interesting and scary
00:30:08.140 | at the same time.
00:30:08.940 | But basically, it's used in a lot of exotic ways
00:30:13.740 | from the ML angle, like, OK, what kind of models
00:30:16.700 | are being built?
00:30:18.180 | And you get all the way from state space model
00:30:21.980 | then all these things to stuff like nth-order differentiable
00:30:29.220 | models, like neural IDs and stuff like that.
00:30:35.220 | I think there's one set of interestingness factor
00:30:39.020 | from the ML side of things.
00:30:42.500 | And then there's the other set of interesting factor
00:30:44.900 | from the applications point of view.
00:30:46.620 | It's used in Mars Rover simulations, to drug discovery,
00:30:51.620 | to Tesla cars.
00:30:54.780 | And there's a huge diversity of applications
00:31:00.020 | in which it is used.
00:31:01.940 | So in terms of the most--
00:31:06.940 | I think in terms of the most interesting application
00:31:10.260 | side of things, I think I am scared
00:31:15.020 | at how many interesting things that
00:31:17.380 | are also very critical and really important it is used in.
00:31:20.340 | I think the scariest was when I went
00:31:27.740 | to visit CERN at some point.
00:31:30.820 | And they said they were using PyTorch.
00:31:34.180 | And they were using GANs at the same time
00:31:37.220 | for particle physics research.
00:31:39.300 | And I was scared more about the fact that they were using GANs
00:31:42.100 | than they were using PyTorch.
00:31:43.620 | Because at that time, I was a researcher focusing on GANs.
00:31:47.420 | The diversity is probably the most interesting,
00:31:49.740 | how many different things it is being used in.
00:31:53.100 | I think that's the most interesting to me
00:31:55.020 | from the application's perspective.
00:31:57.060 | From the model's perspective, I think
00:32:00.540 | I've seen a lot of them.
00:32:02.340 | The really interesting ones to me
00:32:04.140 | are where we're starting to combine
00:32:09.660 | search and symbolic stuff with differentiable models.
00:32:16.300 | I think the whole AlphaGo style model is one example.
00:32:25.180 | And then I think we're attempting
00:32:26.620 | to do it for elements as well with various reward
00:32:29.180 | models and then search.
00:32:31.940 | I don't think PyTorch is being used in this,
00:32:34.340 | but the whole alpha geometry thing was interesting.
00:32:37.540 | Because again, it's an example of combining
00:32:39.380 | the symbolic models with the gradient-based ones.
00:32:45.540 | But there are stuff like alpha geometry
00:32:48.020 | that PyTorch is used at, especially
00:32:50.820 | when you intersect biology and chemistry with ML.
00:32:56.100 | In those areas, you want stronger guarantees
00:32:59.900 | on the output.
00:33:03.340 | So yeah, maybe from the ML side, those things to me
00:33:05.940 | are very interesting right now.
00:33:08.820 | - Yeah.
00:33:09.780 | People are very excited about the alpha geometry thing.
00:33:12.900 | For me, it's theoretical.
00:33:14.820 | It's great.
00:33:15.340 | You can solve some Olympiad questions.
00:33:16.940 | I'm not sure how to make that bridge over
00:33:18.740 | into the real-world applications, but I'm sure it--
00:33:21.740 | - Well, OK.
00:33:22.380 | - --will figure it out.
00:33:23.340 | - Let me give you an example of it.
00:33:25.740 | You know how the whole thing about synthetic data
00:33:29.780 | will be the next rage in LLMs is a thing?
00:33:32.380 | - Already is a rage.
00:33:34.060 | - Which I think is fairly misplaced
00:33:38.100 | in how people perceive it.
00:33:39.820 | People think synthetic data is some kind of magic wand
00:33:42.540 | that you wave, and it's going to be amazing.
00:33:45.940 | Synthetic data is useful in neural networks
00:33:50.340 | right now because we, as humans, have figured out
00:33:55.780 | a bunch of symbolic models of the world
00:34:01.460 | or made up certain symbolic models because
00:34:03.900 | of human innate biases.
00:34:06.100 | So we've figured out how to ground particle physics
00:34:11.900 | in a 30-parameter model.
00:34:16.540 | And it's just very hard to compute.
00:34:20.660 | As in, it takes a lot of flops to compute,
00:34:22.780 | but it only has 30 parameters or so.
00:34:25.420 | I mean, I'm not a physics expert,
00:34:26.780 | but it's a very low-rank model.
00:34:29.900 | We built mathematics as a field that
00:34:33.940 | basically is very low-rank.
00:34:37.900 | Language, a deep understanding of language,
00:34:40.740 | like the whole syntactic parse trees
00:34:42.540 | and just understanding how language can be broken down
00:34:46.420 | into formal symbolism is something that we've figured out.
00:34:50.660 | So we basically, as humans, have accumulated
00:34:53.060 | all this knowledge on these subjects, either synthetically--
00:34:57.340 | I mean, we created those subjects in our heads,
00:35:00.500 | or we've grounded some real-world phenomenon
00:35:03.260 | into a set of symbols.
00:35:05.380 | But we haven't figured out how to teach neural networks
00:35:09.940 | symbolic world models directly.
00:35:12.860 | The only way we have to teach them
00:35:14.700 | is generating a bunch of inputs and outputs
00:35:17.580 | and gradient descending over them.
00:35:19.820 | So in areas where we have the symbolic models
00:35:23.340 | and we need to teach all the knowledge we have
00:35:29.820 | that is better encoded in the symbolic models,
00:35:32.620 | what we're doing is we're generating
00:35:34.100 | a bunch of synthetic data, a bunch of input-output pairs,
00:35:38.580 | and then giving that to the neural network
00:35:40.420 | and asking it to learn the same thing
00:35:42.580 | that we already have a better low-rank model of
00:35:46.420 | in gradient descent in a much more overparameterized way.
00:35:50.420 | Outside of this, where we don't have good symbolic models,
00:35:55.020 | synthetic data obviously doesn't make any sense.
00:35:58.020 | So synthetic data is not a magic wand
00:36:00.020 | where it'll work in all cases and every case or whatever.
00:36:02.820 | It's just where we as humans
00:36:05.180 | already have good symbolic models of,
00:36:09.140 | we need to impart that knowledge to neural networks
00:36:12.700 | and we figured out the synthetic data is a vehicle
00:36:16.180 | to impart this knowledge to.
00:36:18.540 | But people, because maybe they don't know enough
00:36:23.940 | about synthetic data as a notion,
00:36:27.060 | but they hear the next wave of data revolution
00:36:30.100 | is synthetic data, they think it's some kind of magic
00:36:32.940 | where we just create a bunch of random data somehow.
00:36:36.900 | They don't think about how.
00:36:38.500 | And then they think that's just a revolution,
00:36:40.940 | and I think that's maybe a gap in understanding
00:36:43.820 | most people have in this hype cycle.
00:36:46.380 | - Yeah, well, it's a relatively new concept.
00:36:48.380 | - Yeah.
00:36:49.220 | - There's two more that I'll put in front of you
00:36:52.020 | and then see what you respond.
00:36:54.380 | One is, I have this joke that it's only synthetic data
00:36:58.940 | if it's from the Mistral region of France,
00:37:01.260 | otherwise it's a sparkling distillation,
00:37:03.060 | which is what news research is doing.
00:37:04.980 | They're distilling GPT-4 by creating synthetic data
00:37:07.660 | from GPT-4, creating mock textbooks inspired by Phi-2,
00:37:11.540 | and then fine-tuning open source models like LAMA.
00:37:14.900 | - Yeah.
00:37:15.940 | - And so, should we call that synthetic data?
00:37:17.580 | Should we call it something else?
00:37:18.500 | I don't know, but it's--
00:37:19.900 | - Yeah, I mean, the outputs of LLMs, are they synthetic data?
00:37:24.240 | They probably are, but I think it depends
00:37:27.660 | on the goal you have.
00:37:29.340 | If your goal is you're creating synthetic data
00:37:36.540 | with the goal of trying to distill GPT-4's superiority
00:37:40.780 | into another model, I guess you can call it synthetic data,
00:37:45.300 | but it also feels disingenuous because your goal is like,
00:37:49.580 | I need to copy the behavior of GPT-4 and--
00:37:53.740 | - It's also not just behavior, but data set.
00:37:57.100 | - Yeah.
00:37:57.980 | - I've often thought of this as data set washing.
00:38:00.100 | You need one model at the top of the chain.
00:38:02.220 | - Yeah, yeah.
00:38:03.480 | - Unnamed French company that makes a model
00:38:07.120 | that has all the data in it that we don't know
00:38:08.720 | where it's from, but it's open source, hey,
00:38:09.920 | and then we distill from that.
00:38:11.160 | - Yeah.
00:38:12.000 | - And it's great.
00:38:12.840 | (laughing)
00:38:13.680 | - Yeah.
00:38:14.920 | - But they also, to be fair, they also use larger models
00:38:18.560 | as judges or for preference ranking, right?
00:38:20.720 | - Yes.
00:38:21.560 | - That is, I think, a very, very accepted use of synthetic.
00:38:24.620 | - Correct.
00:38:25.460 | I think it's a very interesting time where we don't really
00:38:28.960 | have good social models of what is acceptable
00:38:33.960 | depending on how many bits of information you use
00:38:43.000 | from someone else, right?
00:38:44.560 | It's like, okay, you use like one bit, is that okay?
00:38:49.560 | Yeah, that's accepted to be okay.
00:38:51.920 | Okay, what about if you use like 20 bits, is that okay?
00:38:55.840 | But I don't know.
00:38:57.080 | What if you use like 200 bits?
00:38:59.280 | Like, I don't think we as society have ever been
00:39:03.160 | in this conundrum where we have to be like,
00:39:05.760 | where is the boundary of copyright
00:39:08.480 | or where is the boundary of socially accepted understanding
00:39:13.340 | of copying someone else?
00:39:15.960 | Like, we haven't been tested this mathematically before,
00:39:19.880 | in my opinion.
00:39:20.720 | - Yeah, where there's transformative use.
00:39:22.760 | - Yes.
00:39:23.600 | - So yeah, I think this New York Times open AI case
00:39:26.180 | is gonna go to the Supreme Court.
00:39:27.480 | - Yeah.
00:39:28.320 | - And we'll have to decide it 'cause--
00:39:29.320 | - I think it'll be very interesting.
00:39:30.440 | - Never had to deal with it before.
00:39:31.960 | And then finally, for synthetic data,
00:39:34.040 | the thing that I'm personally exploring
00:39:35.320 | is solving this very stark paradigm difference
00:39:38.960 | between rag and fine tuning,
00:39:40.900 | where you can kind of create synthetic data
00:39:43.600 | off of your retrieved documents.
00:39:46.000 | - Yeah.
00:39:46.840 | - And then fine tune on that.
00:39:47.720 | That's kind of synthetic.
00:39:49.180 | All you need is variation or diversity of samples
00:39:53.700 | for you to fine tune on.
00:39:55.120 | And then you can fine tune your knowledge
00:39:56.340 | into your model.
00:39:58.380 | - Yeah.
00:39:59.340 | - I don't know if you've seen that
00:40:00.300 | as a direction for synthetic data.
00:40:03.000 | - I think that is,
00:40:04.440 | that is like you're basically trying to create,
00:40:08.480 | like what you're doing is you're saying,
00:40:10.500 | well, language, I know how to parameterize language
00:40:13.660 | to an extent.
00:40:14.500 | - Yeah.
00:40:15.340 | - And I need to teach my model variations
00:40:18.260 | of this input data so that it's resilient
00:40:22.300 | or invariant to language uses of that data.
00:40:25.640 | - Yeah, it doesn't overfit on--
00:40:26.580 | - Yeah, so I think that's 100% like synthetic, right?
00:40:29.760 | You understand, like the key is like,
00:40:32.340 | you create variations of your documents
00:40:34.700 | and you know how to do that
00:40:36.100 | because you have a symbolic model
00:40:37.460 | or like some implicit symbolic model of language.
00:40:41.620 | - Okay.
00:40:42.680 | Do you think the issue with symbolic models
00:40:45.940 | is just the architecture of the language models
00:40:49.540 | that we're building?
00:40:50.380 | I think like the, maybe the thing that people grasp
00:40:52.860 | is like the inability of transformers
00:40:55.340 | to deal with numbers because of the tokenizer.
00:40:58.580 | Is it a fundamental issue there too
00:41:00.620 | and do you see alternative architectures
00:41:03.040 | that will be better with symbolic understanding?
00:41:06.180 | - I am not sure if it's a fundamental issue or not.
00:41:09.500 | I think we just don't understand transformers enough.
00:41:13.220 | I don't even mean transformers as an architecture.
00:41:15.820 | I mean like the use of transformers today,
00:41:19.460 | like combining the tokenizer and transformers
00:41:22.740 | and the dynamics of training,
00:41:24.700 | like when you show math heavy questions versus not.
00:41:29.220 | I don't have a good calibration
00:41:32.780 | of whether I know the answer or not.
00:41:35.180 | I, you know, there's common criticisms that are like,
00:41:38.340 | well, you know, transformers will just fail at X
00:41:42.260 | but then when you scale them up to sufficient scale,
00:41:46.500 | they actually don't fail at that X.
00:41:48.820 | I think this is, this entire subfield
00:41:51.940 | where they're trying to figure out these answers
00:41:53.580 | called like the science of deep learning or something.
00:41:55.720 | So we'll get to know more.
00:41:57.860 | I don't know the answer.
00:42:00.180 | - Got it.
00:42:01.380 | Let's touch a little bit on just meta AI
00:42:04.020 | and you know, stuff that's going on there.
00:42:05.480 | Maybe, I don't know how deeply
00:42:07.460 | you're personally involved in it
00:42:08.420 | but you're our first guest from meta AI
00:42:10.280 | which is really fantastic.
00:42:11.700 | And LlamaOne was, you know, you are such a believer
00:42:15.860 | in open source.
00:42:16.680 | LlamaOne was more or less like the real breakthrough
00:42:19.520 | in open source AI.
00:42:20.840 | The most interesting thing for us covering in this podcast
00:42:26.700 | was the depth of Chinchilla, as people say.
00:42:30.040 | Any interesting insights there around like
00:42:32.200 | the scaling models for open source models or smaller models
00:42:36.680 | or whatever that design decision was
00:42:38.480 | when you guys were doing it?
00:42:40.440 | - So LlamaOne was Guillaume Lample and team.
00:42:45.860 | There was OPT before, which I'm also very proud of.
00:42:50.820 | - That's true.
00:42:53.160 | - Because we bridged the gap in understanding
00:42:56.620 | of how complex it is to train these models to the world.
00:43:01.620 | Like until then, no one really,
00:43:04.200 | in gory detail, published--
00:43:06.600 | - The logs.
00:43:07.440 | - Yeah, like why is it complex?
00:43:09.460 | And everyone says like, oh, it's complex.
00:43:11.800 | But no one really talked about why it's complex.
00:43:16.800 | So I think OPT was cool.
00:43:19.820 | We probably--
00:43:20.660 | - I met Susan and she's very, very outspoken.
00:43:22.500 | - Yeah, we probably, I think,
00:43:25.900 | didn't train it for long enough, right?
00:43:28.540 | Like, you know, that's kind of obvious in retrospect.
00:43:31.800 | - For a 175B?
00:43:33.580 | - Yeah.
00:43:34.420 | - But you trained it according to Chinchilla at the time or?
00:43:38.540 | - I can't remember the details,
00:43:40.420 | but I think it's a commonly held belief at this point
00:43:42.740 | that like, well, if we trade OPT longer,
00:43:45.340 | it would actually end up being better.
00:43:47.840 | Llama one, I think was, yeah,
00:43:50.860 | Guillaume Lample and team Guillaume is fantastic
00:43:54.480 | and went on to build Mistral.
00:43:56.740 | I wasn't too involved in that side of things.
00:44:00.220 | So I don't know what you're asking me,
00:44:05.700 | which is like, well, like,
00:44:06.660 | how did they think about scaling laws and all of that?
00:44:10.600 | Llama two, I was more closely involved in.
00:44:15.600 | I helped them a reasonable amount
00:44:19.580 | with like their infrastructure needs and stuff.
00:44:24.580 | Llama two, I think was more like,
00:44:27.700 | let's get to the evolution.
00:44:31.040 | At that point, we kind of understood
00:44:35.040 | what we were missing from the industry's understanding
00:44:40.040 | of LLMs and we needed more data
00:44:45.000 | and we needed more to train the models for longer.
00:44:48.100 | And we made, I think, a few tweaks to the architecture
00:44:51.600 | and we scaled up more and like that is llama two.
00:44:56.120 | I think llama two, you can think of it as like,
00:44:58.840 | after Guillaume left,
00:45:00.160 | the team kind of rebuilt their muscle around llama two.
00:45:04.320 | And Hugo, I think, who's the first daughter is fantastic.
00:45:07.760 | And I think he did play a reasonable big role
00:45:11.320 | in llama one as well and he overlaps between llama one
00:45:13.680 | and two.
00:45:14.520 | So in llama three, obviously, hopefully will be awesome.
00:45:18.940 | - Just one question on llama two
00:45:21.680 | and then we'll try and fish llama three spoilers out of you.
00:45:25.960 | In the llama two paper,
00:45:27.080 | the loss curves of the 34 and 70B parameter,
00:45:30.840 | they still seem kind of steep,
00:45:32.880 | but they could go lower.
00:45:34.320 | How, from an infrastructure level,
00:45:37.040 | how do you allocate resources?
00:45:38.560 | Could they have just gone longer or were you just like,
00:45:41.920 | hey, this is all the GPUs that we can burn
00:45:43.920 | and let's just move on to llama three
00:45:45.480 | and then make that one better?
00:45:46.960 | - Instead of answering specifically
00:45:48.760 | about that llama two situation or whatever,
00:45:51.200 | I'll tell you how we think about things.
00:45:54.240 | Generally, we have,
00:45:58.320 | I mean, Mark really is some numbers, right?
00:46:01.800 | So let's cite those things again.
00:46:04.140 | All I remember is like 600K GPUs.
00:46:07.200 | - That is by the end of this year
00:46:08.840 | and 600K H100 equivalents with 250K H100s
00:46:13.840 | and including all of the other GPU or accelerator stuff,
00:46:20.960 | it would be 600 and something K aggregate capacity.
00:46:25.960 | That's a lot of GPUs, we'll talk about it separately,
00:46:29.840 | but the way we think about it
00:46:32.840 | is we have a train of models, right?
00:46:35.800 | Llama one, two, three, four.
00:46:38.040 | And we have a bunch of GPUs.
00:46:41.880 | I don't think we're short of GPUs.
00:46:44.280 | - Yeah, no, I wouldn't say so.
00:46:45.600 | - Yeah, so I think it's all a matter of time.
00:46:50.600 | I think time is the biggest bottleneck.
00:46:53.160 | It's like when do you stop training the previous one
00:46:55.960 | and when do you start training the next one
00:46:58.400 | and how do you make those decisions?
00:47:00.720 | The data, do you have net new data,
00:47:04.320 | better clean data for the next one
00:47:06.480 | in a way that it's not worth
00:47:08.000 | like really focusing on the previous one.
00:47:10.560 | It's just a standard iterative product.
00:47:13.220 | You're like, when is the iPhone one?
00:47:15.360 | When you start working iPhone two, where is the iPhone?
00:47:18.780 | Like so on, right?
00:47:19.740 | So mostly the considerations are time and generation
00:47:26.200 | rather than GPUs in my opinion.
00:47:28.320 | - So one of the things with the scaling laws,
00:47:30.480 | like Chinchilla is like optimal to balance
00:47:33.040 | training and inference costs.
00:47:34.520 | I think at Facebook scale or Metascale,
00:47:37.600 | you would rather pay a lot more maybe at training
00:47:39.760 | and then save on inference.
00:47:41.680 | How do you think about that
00:47:42.640 | from a infrastructure perspective?
00:47:45.220 | I think in your tweet you say you can try and guess
00:47:47.920 | on like how we're using these GPUs.
00:47:50.320 | Can you just give people a bit of understanding?
00:47:52.240 | It's like, because I've already seen a lot of VCs say,
00:47:54.640 | Llama 3 has been trained on 600,000 GPUs
00:47:56.760 | and that's obviously not true, I'm sure.
00:47:58.900 | How do you allocate between the research like FAIR
00:48:03.040 | and the Llama training, the inference on
00:48:07.000 | Instagram suggestions that got me to scroll,
00:48:09.280 | like AI generated stickers on WhatsApp and all that?
00:48:12.720 | - Yeah, we haven't talked about any of this publicly
00:48:16.760 | but like as a broad stroke,
00:48:19.120 | it's like how we would allocate resources
00:48:21.600 | of any other kinds at any company.
00:48:24.900 | You run a company, you run like a VC portfolio,
00:48:29.900 | like how do you allocate your investments
00:48:36.300 | between different companies or whatever?
00:48:38.000 | You kind of make various trade offs
00:48:39.580 | and you kind of decide should I invest in this project
00:48:42.260 | or this other project or how much should I invest
00:48:45.020 | in this project?
00:48:46.180 | It's very much like a zero sum of trade offs
00:48:52.820 | and it also comes into play like how is your,
00:48:57.300 | how are your like clusters configured?
00:48:59.700 | Like overall, like what you can fit of what size
00:49:02.960 | and what cluster and so on.
00:49:04.400 | So broadly, there's no magic sauce here.
00:49:08.460 | Like, I mean, I think the details would add more spice
00:49:12.820 | but also wouldn't add more understanding.
00:49:16.560 | It's just gonna be like, oh, okay.
00:49:18.960 | I mean, this looks like they just think about this
00:49:22.080 | as I would normally do.
00:49:24.000 | - Right, so even the GPU rich run through the same struggles
00:49:27.920 | while having to decide where to allocate things?
00:49:30.800 | - Yeah, I mean like at some point, I forgot who said it
00:49:34.760 | but it's like you kind of fit your bottles
00:49:39.760 | to the amount of compute you have.
00:49:43.320 | If you don't have enough compute,
00:49:44.700 | you figure out how to make do with smaller models
00:49:48.140 | but like no one as of today, I think would feel like
00:49:53.140 | they have enough compute.
00:49:55.180 | I don't think like I have heard any company
00:49:59.820 | within the AI space be like, oh yeah,
00:50:03.500 | like we feel like we have sufficient compute
00:50:05.920 | and we couldn't have done better.
00:50:07.760 | So like that conversation, I don't think I've heard
00:50:12.860 | from any of my friends at other companies.
00:50:16.340 | - Stella from Eleuther sometimes says that
00:50:18.900 | because she has a lot of donated compute
00:50:20.900 | and she's trying to put it to interesting uses
00:50:23.720 | but for some reason, she's decided to stop
00:50:26.900 | making large models.
00:50:28.820 | - I mean, that's a cool high conviction opinion
00:50:33.280 | that might pay out, right?
00:50:35.900 | I mean, she's taking a path that most people
00:50:39.940 | don't care to take about in this climate
00:50:42.060 | and she probably will have very differentiated ideas
00:50:46.080 | and I mean, think about the correlation of ideas
00:50:49.860 | in AI right now, it's so bad, right?
00:50:53.220 | So everyone's fighting for the same pie.
00:50:56.860 | In some weird sense, like that's partly why
00:51:01.620 | I don't really directly work on LLMs.
00:51:04.180 | I used to be a, I used to do image models and stuff
00:51:08.080 | and I actually stopped doing GANs
00:51:10.020 | because GANs were getting so hot
00:51:12.820 | that I didn't have any calibration
00:51:14.900 | of whether my work would be useful or not
00:51:17.780 | because oh yeah, someone else did the same thing you did.
00:51:21.120 | It's like, there's so much to do,
00:51:24.260 | I don't understand why I need to fight for the same pie.
00:51:27.980 | So I think Stella's decision is very smart.
00:51:32.980 | - And how do you reconcile that with how we started
00:51:36.740 | the discussion about intrinsic versus extrinsic
00:51:39.860 | kind of like accomplishment or success?
00:51:42.540 | How should people think about that one,
00:51:44.300 | especially when they're doing a PhD
00:51:45.980 | or like early in their career?
00:51:48.900 | It seems like, I think in Europe's,
00:51:50.600 | I walked through a lot of the posters and whatnot,
00:51:52.980 | there seems to be multiple apps in a way in the research,
00:51:56.100 | a lot of people working on the same things.
00:51:58.540 | Is it worth for like a PhD to not take a bet
00:52:01.480 | on something that is like maybe not as interesting,
00:52:04.500 | just because of funding and visibility and whatnot
00:52:07.300 | or yeah, what suggestions would you give?
00:52:10.260 | - I think there's a baseline level of compatibility
00:52:13.180 | you need to have with the field.
00:52:16.100 | Basically, you need to figure out
00:52:19.440 | if you will get paid enough to eat, right?
00:52:22.020 | Like, and like whatever reasonable, normal lifestyle
00:52:25.440 | you want to have as a baseline.
00:52:29.500 | So you at least have to pick a problem
00:52:31.380 | within the neighborhood of like fundable.
00:52:34.220 | Like you wouldn't want to be doing something so obscure
00:52:39.220 | that people are like, I don't know, like you can work on it.
00:52:42.960 | With a limit on fundability, I'm just like observing
00:52:47.020 | something like three months of compute, right?
00:52:49.380 | That's the top line.
00:52:50.220 | That's the like max that you can spend on any one project.
00:52:53.440 | - But like, I think that's very ill specified,
00:52:56.280 | like how much compute?
00:52:57.180 | - Yeah.
00:52:58.820 | - So I think the notion of fundability is broader.
00:53:03.820 | It's more like, hey, are these family of models
00:53:06.780 | within the acceptable set of you're not crazy
00:53:10.700 | or something, right?
00:53:11.520 | Like even something like neural RDEs,
00:53:14.540 | which is a very like boundary pushing thing
00:53:18.120 | or like state space models or whatever.
00:53:20.160 | Like all of these things I think are still
00:53:22.280 | in fundable territory.
00:53:23.760 | When you're talking about, I'm gonna do one
00:53:28.820 | of the neuromorphic models and then apply
00:53:33.820 | like image classification to them or something,
00:53:38.280 | then it becomes like a bit questionable.
00:53:41.140 | Again, it depends on your motivation.
00:53:42.640 | Maybe if you're a neuroscientist, it actually is feasible.
00:53:46.320 | But if you're like a AI engineer, like the audience
00:53:50.120 | of these podcasts, then it's less, it's more questionable.
00:53:54.760 | So I think like, the way I think about it is like,
00:53:57.680 | you need to figure out how you can be in the baseline level
00:54:01.800 | of fundability just so that you can just live.
00:54:06.400 | And then after that, really focus on intrinsic motivation
00:54:11.400 | and depends on your strengths, like how you can play
00:54:16.740 | to your strengths and your interests at the same time.
00:54:21.060 | Like you, like I try to look at a bunch of ideas
00:54:26.060 | that are interesting to me, but also try to play
00:54:29.800 | to my strengths.
00:54:31.420 | I'm not gonna go work on theoretical ML.
00:54:34.960 | I'm interested in it, but when I want to work
00:54:38.720 | on something like that, I try to partner with someone
00:54:40.800 | who is actually a good like theoretical ML person
00:54:43.440 | and see if I actually have any value to provide.
00:54:45.720 | And if they think I do, then I come in.
00:54:48.280 | So I think you'd want to find that intersection
00:54:50.840 | of ideas you like, and that also play to your strengths.
00:54:55.840 | And I'd go from there.
00:54:57.520 | Everything else, like actually finding extrinsic success
00:55:01.160 | and all of that I think is, the way I think about it
00:55:05.200 | is like somewhat immaterial.
00:55:06.820 | When you're talking about building ecosystems and stuff,
00:55:10.560 | like slightly different considerations come into play,
00:55:13.200 | but that's a different conversation.
00:55:16.600 | - Yeah, I should, we're gonna pivot a little bit
00:55:20.800 | to just talk about open source AI.
00:55:23.600 | But one more thing I wanted to establish for meta
00:55:25.720 | is like this 600K number, just kind of rounding out
00:55:28.160 | the discussion, that's for all meta.
00:55:31.060 | So including your own inference needs, right?
00:55:32.640 | It's not just about training.
00:55:33.900 | - It's for all, it's gonna be the number
00:55:36.960 | in our data centers for all of meta, yeah.
00:55:39.380 | - Yeah, so like, there's a decent amount of workload
00:55:42.400 | serving Facebook and Instagram and you know, whatever.
00:55:45.920 | And then is there interest in like your own hardware?
00:55:49.740 | - We already talked about our own hardware.
00:55:53.640 | It's called MTIA, our own silicon.
00:55:57.620 | I think we've even showed like the standard photograph
00:56:02.380 | of you holding the chip that doesn't work.
00:56:05.000 | I mean, like as in the chip that you basically
00:56:10.000 | just get like--
00:56:11.520 | - As a test?
00:56:12.680 | - Yeah, a test chip or whatever.
00:56:14.280 | So we are working on our silicon
00:56:18.800 | and we'll probably talk more about it
00:56:21.720 | when the time is right, but--
00:56:25.220 | - Like what gaps do you have that the market doesn't offer?
00:56:29.000 | - Okay, I mean this is easy to answer.
00:56:31.120 | So basically, remember how I told you about the whole,
00:56:34.680 | like there's this memory hierarchy
00:56:36.640 | and like sweet spots and all of that?
00:56:39.360 | Fundamentally, like when you build a hardware,
00:56:42.080 | like you make it general enough that a wide set of customers
00:56:46.680 | and a wide set of workloads can use it effectively
00:56:49.800 | while trying to get the maximum level of performance
00:56:53.160 | they can.
00:56:55.000 | The more specialized you make the chip,
00:56:58.460 | the more hardware efficient it's going to be,
00:57:02.660 | the more power efficient it's gonna be,
00:57:04.460 | the more easier it's going to be to find like the software,
00:57:08.820 | like the kernel's right to just map one,
00:57:14.020 | that one or two workloads to that hardware and so on.
00:57:17.080 | So it's pretty well understood across the industry
00:57:21.840 | that if you have a sufficiently large volume enough workload,
00:57:26.840 | you can specialize it and get some efficiency gains,
00:57:33.580 | like power gains and so on.
00:57:35.460 | So the way you can think about everyone building,
00:57:40.380 | every large company building silicon,
00:57:42.560 | like I think a bunch of the other large companies
00:57:46.180 | are building their own silicon as well,
00:57:48.860 | is each large company has a sufficient enough set
00:57:53.840 | of verticalized workloads that have a pattern to them
00:57:58.840 | that say a more generic accelerator
00:58:03.920 | like an Nvidia or an AMD GPU does not exploit.
00:58:07.880 | So there is some level of power efficiency
00:58:11.520 | that you're leaving on the table by not exploiting that.
00:58:14.920 | And you have sufficient skill
00:58:16.480 | and you have sufficient forecasted stability
00:58:21.120 | that those workloads will exist in the same form,
00:58:25.100 | that it's worth spending the time to build out a chip
00:58:28.520 | to exploit that sweet spot.
00:58:32.640 | Like obviously something like this is only useful
00:58:36.220 | if you hit a certain scale
00:58:38.700 | and that you're like forecasted prediction
00:58:42.040 | of those kinds of workloads being in the same kind
00:58:45.880 | of specializable exploitable way is true.
00:58:49.860 | So yeah, that's why we're building our own chips.
00:58:54.960 | - Amazing, awesome.
00:58:56.080 | Yeah, I know we've been talking a lot
00:58:59.040 | on a lot of different topics
00:59:00.560 | and going back to open source, you had a very good tweet.
00:59:03.600 | You said that a single company's close source effort
00:59:06.360 | rate limits against people's imaginations and needs.
00:59:09.160 | How do you think about that?
00:59:11.320 | How do you think about all the impact
00:59:13.960 | that some of the meta AI work in open source has been doing
00:59:17.200 | and maybe directions of the whole open source AI space?
00:59:20.120 | - Yeah.
00:59:20.960 | In general, I think first I think it's worth talking
00:59:25.280 | about this in terms of open and not just open source
00:59:28.940 | because like with the whole notion of model weights,
00:59:31.920 | no one even knows what source means for these things.
00:59:35.500 | But just for the discussion, when I say open source,
00:59:39.360 | you can assume it's just I'm talking about open.
00:59:42.240 | And then there's the whole notion of like licensing
00:59:45.040 | and all that like, you know--
00:59:46.440 | - Commercial.
00:59:47.280 | - Commercial, non-commercial, commercial with clauses
00:59:49.240 | and all that.
00:59:50.060 | I think like at a fundamental level,
00:59:53.480 | the most benefited value of open source
00:59:57.160 | is that you make the distribution to be very wide.
01:00:02.160 | Like it's just available with no friction
01:00:06.300 | and like people can do transformative things.
01:00:10.860 | In a way that's very accessible.
01:00:14.060 | Like maybe like it's open source,
01:00:17.100 | but it has a commercial license
01:00:18.660 | and I'm a student like in India.
01:00:20.860 | I don't care about the license.
01:00:22.980 | I just don't even understand the license.
01:00:25.340 | But like the fact that I can use it
01:00:27.420 | and do something with it is very transformative to me.
01:00:32.260 | Like I got this thing in a very accessible way.
01:00:38.700 | And then like so it's very, very various degrees, right?
01:00:42.260 | And then like if it's open source,
01:00:44.100 | but it's like actually like a commercial license,
01:00:47.260 | then a lot of companies are gonna benefit
01:00:50.020 | from like gaining value that they didn't previously have
01:00:54.780 | that they maybe had to pay a closed source company for it.
01:00:59.100 | So open source is just a very interesting tool
01:01:02.460 | that you can use in various ways.
01:01:04.420 | So there's, again, two kinds of open source.
01:01:06.540 | One is like some large company doing a lot of work
01:01:09.300 | and then open sourcing it.
01:01:12.260 | And that kind of effort is not really feasible
01:01:15.820 | by say like a band of volunteers doing it the same way.
01:01:19.860 | So there's both a capital and operational expenditure
01:01:22.900 | that the large company just decided to
01:01:25.220 | ignore and give it away to the world
01:01:30.780 | for some benefits of some kind.
01:01:33.740 | They're not as tangible as like direct revenue
01:01:36.300 | or something.
01:01:37.660 | So in that part, Meta has been doing incredibly good things.
01:01:42.660 | They fund a huge amount of the PyTorch development.
01:01:47.900 | They've open sourced Llama and those family of models.
01:01:52.060 | And several other fairly transformative projects.
01:01:58.060 | FICE is one, Segment Anything, Detectron,
01:02:03.140 | Detectron 2, Densepose, I mean it's--
01:02:06.900 | - Seamless. - Yeah, Seamless.
01:02:08.660 | It's just like the list is so long
01:02:10.500 | that we're not gonna cover.
01:02:12.660 | So I think Meta comes into that category
01:02:15.860 | where we spend a lot of capex and opex
01:02:19.220 | and we have a high talent density of great AI people.
01:02:24.220 | And we open our stuff.
01:02:27.700 | And the thesis for that, I remember when Fair was started,
01:02:31.420 | the common thing was like wait,
01:02:33.300 | why would Meta wanna start a open AI lab?
01:02:38.300 | What exactly is the benefit from a commercial perspective?
01:02:44.380 | And then the thesis was very simple.
01:02:46.780 | It was like AI is currently rate limiting
01:02:50.300 | Meta's ability to do things.
01:02:53.280 | Our ability to build various product integrations,
01:02:58.980 | moderation, various other factors.
01:03:01.660 | AI was the limiting factor.
01:03:04.020 | And we just wanted AI to advance more.
01:03:06.980 | And we didn't care if the IP of the AI
01:03:11.380 | was uniquely in our possession or not for us.
01:03:15.460 | However the field advances, that accelerates
01:03:17.900 | Meta's ability to build a better product.
01:03:20.620 | So we just built an open AI lab and we said,
01:03:24.180 | if this helps accelerate the progress of AI,
01:03:27.260 | that's strictly great for us.
01:03:29.340 | But very easy rational, right?
01:03:31.380 | Still the same to a large extent with the Llama stuff
01:03:35.220 | and it's a bit more, I think it's the same values,
01:03:40.220 | but the argument, it's a bit more nuanced.
01:03:46.160 | And then there's the second kind of open source,
01:03:50.420 | which is oh, we built this project nights and weekends
01:03:54.140 | and we're very smart people and we open sourced it
01:03:56.580 | and then we built a community around it.
01:03:58.140 | This is like the Linux kernel
01:03:59.660 | and various software projects like that.
01:04:03.420 | So I think about open source,
01:04:08.420 | like both of these things being beneficial
01:04:13.940 | and both of these things being different.
01:04:15.980 | They're different and beneficial in their own ways.
01:04:22.080 | The second one is really useful
01:04:24.380 | when there's an active arbitrage to be done.
01:04:28.580 | If someone's not really looking at a particular space,
01:04:33.780 | because it's not commercially viable or whatever,
01:04:35.980 | like a band of volunteers can just coordinate online
01:04:39.680 | and do something and then make that happen.
01:04:43.820 | And that's great.
01:04:44.820 | I wanna cover a little bit about open source LLMs maybe.
01:04:51.820 | So open source LLMs have been very interesting
01:04:54.620 | because I think we were trending towards an increase
01:04:58.580 | in open source in AI from 2010
01:05:02.980 | all the way to like 2017 or something.
01:05:08.200 | Like where more and more pressure within the community
01:05:11.520 | was to open source their stuff
01:05:13.080 | so that their methods and stuff get adopted.
01:05:17.580 | And then the LLM revolution kind of took the opposite effect.
01:05:22.580 | Open AI stopped open sourcing their stuff
01:05:28.020 | and DeepMind kind of like all the other cloud
01:05:33.020 | and all these other providers,
01:05:35.160 | they didn't open source their stuff.
01:05:38.300 | And it was not good in the sense that first,
01:05:46.260 | like science done in isolation
01:05:48.120 | probably will just form its own bubble
01:05:51.400 | where like people believe their own bullshit
01:05:53.020 | or whatever, right?
01:05:54.260 | So there's that problem.
01:05:56.220 | And then there was the other problem
01:05:59.180 | which was the accessibility part.
01:06:01.840 | Like, okay, I again always go back to like,
01:06:05.740 | I'm a student in India with no money.
01:06:07.800 | What is my accessibility to any of these closer models?
01:06:15.060 | At some scale I have to pay money.
01:06:18.100 | That makes it a non-starter and stuff.
01:06:22.620 | And there is also the control thing.
01:06:24.640 | I strongly believe the best,
01:06:27.280 | if you want human-aligned stuff,
01:06:31.540 | you want all humans to give feedback
01:06:35.700 | and you want all humans to have access
01:06:37.720 | to their technology in the first place.
01:06:40.380 | And I actually have seen, living in New York,
01:06:44.140 | whenever I come to Silicon Valley
01:06:45.580 | I see a different cultural bubble.
01:06:47.600 | Like all the friends I hang out with
01:06:50.060 | talk about some random thing,
01:06:52.700 | like Dyson spheres or whatever, that's a thing.
01:06:55.940 | And most of the world doesn't know
01:06:58.220 | or care about any of this stuff.
01:06:59.860 | Like it's definitely like a bubble
01:07:02.380 | and bubbles can form very easily.
01:07:04.240 | And when you make a lot of decisions
01:07:05.980 | because you're in a bubble,
01:07:07.960 | they're probably not globally optimal decisions.
01:07:11.780 | So I think open source, the distribution of open source,
01:07:15.140 | powers a certain kind of non-falsifiability
01:07:20.140 | that I think is very important.
01:07:22.120 | So I think on the open source models,
01:07:27.740 | it's going great in the fact that Laura, I think,
01:07:31.560 | came out of the necessity of open source models
01:07:36.420 | needing to be fine-tunable in some way.
01:07:41.060 | - Yeah, and I think DPO also came
01:07:44.580 | out of the academic open source side of things.
01:07:49.480 | So do any of the closed source labs,
01:07:54.480 | did any of them already have Laura or DPO internally?
01:08:00.540 | Maybe, but that does not advance humanity in any way.
01:08:05.540 | It advances some company's probability
01:08:09.780 | of doing the winner takes all
01:08:11.940 | that I talked about earlier in the podcast.
01:08:14.680 | So I don't know, it just feels fundamentally good.
01:08:19.180 | Like when people try to, people are like,
01:08:22.860 | well, what are the ways in which it is not okay?
01:08:27.300 | And this might be a little controversial,
01:08:29.180 | but I find a lot of arguments based on
01:08:33.260 | whether closed source models are safer
01:08:35.460 | or open source models are safer,
01:08:37.860 | very much related to what kind of cultural culture
01:08:42.860 | they grew up in, what kind of society they grew up in.
01:08:50.140 | If they grew up in a society that they trusted,
01:08:52.960 | then I think they take the closed source argument.
01:08:57.900 | And if they grew up in a society that they couldn't trust,
01:09:00.420 | where the norm was that you didn't trust your government,
01:09:03.260 | obviously, like it's corrupt or whatever,
01:09:05.500 | then I think the open source argument is what they take.
01:09:08.620 | I think there's a deep connection
01:09:10.360 | to people's innate biases from their childhood
01:09:15.360 | and their trust in society and governmental aspects
01:09:21.900 | that push them towards one opinion or the other.
01:09:26.260 | And I'm definitely in the camp of open source
01:09:29.900 | is definitely going to actually have
01:09:31.940 | better outcomes for society.
01:09:33.860 | Closed source to me just means that centralization of power,
01:09:37.340 | which is really hard to trust.
01:09:39.220 | So I think it's going well in so many ways.
01:09:46.180 | We're actively disaggregating the centralization of power
01:09:52.540 | to just two or three providers.
01:09:55.180 | We are, I think, benefiting from so many people
01:09:58.420 | using these models in so many ways
01:10:00.660 | that aren't allowed by say Silicon Valley left wing tropes.
01:10:05.660 | Some of these things are good or bad,
01:10:13.180 | but they're not culturally accepted universally in the world.
01:10:16.700 | So those are things worth thinking about.
01:10:20.420 | And I think open source is not winning in certain ways.
01:10:25.420 | These are all the things in which, as I mentioned,
01:10:29.980 | it's actually being very good and beneficial and winning.
01:10:33.060 | I think one of the ways in which it's not winning,
01:10:36.340 | at some point I should write a long form post about this,
01:10:39.220 | is I think it has a classic coordination problem.
01:10:43.260 | I mean, open source in general
01:10:44.420 | always has a coordination problem.
01:10:46.620 | If there's a vertically integrated provider
01:10:48.940 | with more resources,
01:10:50.480 | they will just be better coordinated than open source.
01:10:54.980 | And so now open source has to figure out
01:10:57.780 | how to have coordinated benefits.
01:10:59.560 | And the reason you want coordinated benefits
01:11:01.780 | is because these models are getting better
01:11:06.780 | based on human feedback.
01:11:09.580 | And if you see with open source models,
01:11:12.100 | like if you go to Reddit, local llama, subreddit,
01:11:16.860 | there's so many variations of models
01:11:19.020 | that are being produced from say, NOS research.
01:11:23.340 | I mean, there's so many variations
01:11:26.820 | built by so many people.
01:11:29.420 | And one common theme is they're all using these fine tuning
01:11:34.420 | or human preferences data sets that are very limited
01:11:39.660 | and someone published them somewhere
01:11:42.500 | and they're not sufficiently diverse.
01:11:46.940 | And you look at the other side,
01:11:49.080 | like say front-ends like Uber or Hugging Chat or Ollama,
01:11:54.080 | they don't really have feedback buttons.
01:11:58.900 | All the people using all of these front-ends,
01:12:01.920 | they probably want to give feedback
01:12:04.380 | but there's no way for them to give feedback.
01:12:07.600 | So these models are being built,
01:12:10.180 | they're being arbitrarily measured
01:12:13.660 | and then they are being deployed
01:12:14.940 | into all these open source front-ends
01:12:16.780 | or like apps that are closed source,
01:12:19.900 | they're serving open source models.
01:12:21.720 | And these front-ends don't have,
01:12:24.580 | they are not exposing the ability to give feedback.
01:12:27.620 | So we're just losing all of this feedback.
01:12:31.340 | Maybe open source models are being as used as GPT is
01:12:34.940 | at this point in all kinds of, in a very fragmented way.
01:12:39.700 | Like in aggregate, all the open source models together
01:12:41.980 | are probably being used as much as GPT is,
01:12:44.500 | maybe close to that.
01:12:47.180 | But the amount of feedback that is driving back
01:12:50.240 | into the open source ecosystem is like negligible,
01:12:53.140 | maybe less than 1% of the usage.
01:12:56.940 | So I think like some,
01:13:00.060 | like the blueprint here I think is,
01:13:05.000 | you'd want someone to create a sinkhole for the feedback,
01:13:08.140 | some centralized sinkhole,
01:13:09.260 | like maybe Hugging Face or someone just finds like,
01:13:12.960 | okay, like I will make available a call to log a string
01:13:17.960 | along with like a bit of information of positive or negative
01:13:22.620 | or something like that.
01:13:24.300 | And then you would want to send pull requests
01:13:26.540 | to all the open source front ends, like Uber and all,
01:13:30.860 | being like, hey, we're just integrating like a feedback UI.
01:13:34.660 | And then work with like the closed source people
01:13:37.260 | is also being like, look, it doesn't cost you anything,
01:13:40.200 | just like have a button.
01:13:42.140 | And then the sinkhole will have a bunch
01:13:45.700 | of this data coming in.
01:13:47.480 | And then I think a bunch of open source researchers
01:13:50.580 | should figure out how to filter the feedback
01:13:52.900 | into only the like high quality one.
01:13:54.640 | I'm sure like it will be exploited by spam bots
01:13:56.760 | or whatever, right?
01:13:58.000 | This is like the perfect way
01:13:59.280 | to inject your advertising product into like the next--
01:14:03.200 | - Buy Coca Cola now.
01:14:05.040 | - So there needs to be some level of that.
01:14:08.760 | In the same way, I'm sure like all the close providers
01:14:13.080 | are doing today, like OpenAI, Claude,
01:14:15.880 | like the feedback that comes in,
01:14:17.920 | I'm sure they are figuring out if that's legit or not.
01:14:21.600 | That kind of data filtering needs to be done.
01:14:24.240 | And that loop has to be set up.
01:14:28.600 | And this requires that central sinkhole
01:14:31.160 | and that like data cleaning effort both to be like there.
01:14:35.760 | They're not there right now.
01:14:37.200 | They're not there right now.
01:14:38.360 | I think for capital reasons,
01:14:42.920 | but also for coordination reasons.
01:14:44.680 | Okay, if that central sinkhole is there,
01:14:46.360 | who's gonna go coordinate all of this integration
01:14:49.840 | across all of these like open source front ends.
01:14:52.840 | But I think if we do that, if that actually happens,
01:14:57.200 | I think that probably has a real chance
01:15:00.800 | of the open source models having a runaway effect
01:15:03.080 | against OpenAI with their current
01:15:06.640 | like daily active users rumored.
01:15:10.000 | Probably doesn't have a chance against Google
01:15:13.360 | because Google has Android and Chrome and Gmail
01:15:18.360 | and Google Docs and everything.
01:15:22.000 | So people just use that a lot.
01:15:25.280 | But like I think like there's a clear chance
01:15:29.160 | we can take at truly winning open source.
01:15:34.160 | - Do you think this feedback is helpful
01:15:37.000 | to make open source models better
01:15:38.960 | or to get to like open source AGI?
01:15:41.720 | Because in a way like OpenAI's goal is to get to AGI, right?
01:15:44.920 | So versus I think in open source,
01:15:47.160 | we're more focused on personal better usage or like--
01:15:50.960 | - Yeah, I think that's a good question.
01:15:52.680 | But I think like, largely I actually don't think people
01:15:57.680 | have a good understanding of AGI
01:16:00.720 | and I don't mean definition level.
01:16:02.280 | I mean, people are like, okay, we're gonna,
01:16:05.560 | AGI means it's powering 40% of world economic output
01:16:10.000 | or something like that, right?
01:16:12.700 | But what does that mean?
01:16:14.680 | So do you think electricity is powering 40%
01:16:18.360 | of world economic output or is it not?
01:16:21.520 | Like, generally the notion of like powering X percent
01:16:26.400 | of economic output is not defined well at all
01:16:31.160 | for me to understand like how to know when we got to AGI
01:16:36.160 | or how to measure whether we're getting AGI.
01:16:40.640 | Like, you know, you can look at it in terms of intelligence
01:16:43.420 | or task automation or whatever.
01:16:46.000 | I think that's what we are doing right now.
01:16:48.200 | We're basically integrating like the current set
01:16:50.520 | of AI technologies into so many real world use cases
01:16:55.160 | where we find value that if some new version of AI comes in,
01:17:01.360 | we can find, like we can be like, ah, this helps me more.
01:17:05.300 | In that sense, I think like the whole process
01:17:10.180 | of like how we think we got to AGI will be continuous
01:17:13.700 | and not like not discontinuous like how I think
01:17:18.700 | the question is posed.
01:17:21.280 | So I think the open source thing will be very much in line
01:17:26.280 | with getting to AGI because open source has that
01:17:31.460 | like natural selection effect.
01:17:36.860 | Like if a better open source model comes,
01:17:39.900 | really no one says, huh, I don't wanna use it
01:17:43.540 | because there are ecosystem effect.
01:17:46.120 | I'm logged into my ecosystem or like,
01:17:49.080 | I don't know if I like the models, you know, whatever.
01:17:52.080 | It's just a very pure direct thing.
01:17:55.440 | So if there's a better model that comes out,
01:17:58.560 | then it will be used.
01:18:00.800 | So I definitely think it has a good chance of achieving
01:18:05.160 | how I would think about as a continuous
01:18:09.860 | path to what we might define as AGI.
01:18:13.520 | - For the listeners, I will actually mention
01:18:16.680 | a couple other maybe related notes on just
01:18:19.480 | this very interesting concept of feedbacks and coal
01:18:22.120 | for open source to really catch up in terms of
01:18:25.380 | the overall Google versus OpenAI debate.
01:18:28.060 | Open Assistant was led by Yannick Kilcher
01:18:32.480 | who recently ended his effort.
01:18:33.860 | I think the criticism there was like the kind of people
01:18:35.720 | that go to a specific website to give feedback
01:18:38.860 | is not representative of real world usage
01:18:40.760 | and that's why the models trained on Open Assistant
01:18:43.640 | didn't really seem like they'd have caught on
01:18:45.680 | in the open source world.
01:18:47.400 | The two leading candidates in my mind
01:18:48.760 | are LMSIS out of UC Berkeley who have the LMSIS arena
01:18:53.080 | which is being touted as one of the only ways,
01:18:56.040 | only reliable benchmarks anymore.
01:18:57.680 | I kinda call them non-parametric benchmarks
01:18:59.720 | 'cause there's nothing to cheat on except for ELO.
01:19:03.780 | And then the other one is OpenRouter
01:19:06.280 | which is Alex Atala's thing.
01:19:07.560 | I don't know if you've talked to any of these people.
01:19:10.020 | - I obviously know all of the efforts
01:19:12.880 | that you talked about.
01:19:14.340 | I haven't talked to them directly about this yet
01:19:17.700 | but the way I think about it is
01:19:20.060 | the way these models are going to be used
01:19:23.300 | is always going to be way more distributed than centralized.
01:19:26.880 | Like which is the power of the open source movement.
01:19:31.580 | Like the UI within which these models are going to be used
01:19:35.460 | is going to be decentralized.
01:19:37.800 | Like these models are going to be integrated
01:19:39.860 | into like hundreds and thousands of projects
01:19:43.320 | and products and all of that, right?
01:19:45.400 | And I think that is important to recognize.
01:19:50.200 | Like the LMSIS leaderboard is the best thing we have
01:19:54.320 | right now to understand whether a model is better or not
01:19:57.900 | versus another model.
01:19:59.640 | But it's also biased and only having a sliver of view
01:20:04.080 | into how people actually use these models.
01:20:05.880 | Like the people who actually end up coming
01:20:07.880 | to the LMSIS leaderboard and then using a model
01:20:10.960 | only use it for certain things.
01:20:13.000 | Like GitHub co-pilot style usage is not captured
01:20:18.000 | in say like LMSIS thing and so many other styles
01:20:22.560 | like the character AI style things is not captured in LMSIS.
01:20:26.980 | - Which OpenRouter could do.
01:20:28.440 | They don't do it right now.
01:20:29.640 | - Yeah, so like I think like yeah, my point is like
01:20:33.920 | the way these models are going to be used
01:20:35.880 | is going to be always a large surface area.
01:20:40.280 | And I think we need to figure out
01:20:41.560 | how to provide infrastructure to integrate
01:20:45.420 | with all these like ways in which it's being used.
01:20:49.460 | Even if you get like the top hundred front ends
01:20:54.180 | that the model like open source models are used through,
01:20:58.560 | subscribe to like the sinkhole.
01:21:01.860 | I think that's already like a substantial thing.
01:21:04.220 | I think like thinking one or two things
01:21:09.140 | will by themselves get a lot of data
01:21:11.700 | I think is not going to happen.
01:21:14.180 | - Yep, fair enough.
01:21:15.180 | Before we let you go,
01:21:18.620 | can we do just a quick beyond text segment?
01:21:21.740 | So you're an investor in Runway,
01:21:23.780 | which is a leader generation.
01:21:25.520 | You're an investor in 1X, which is a humanoid assistant.
01:21:29.660 | Osmo, which is focused on using AI
01:21:32.040 | for smell recognition and synthesis.
01:21:34.580 | You advise a bunch of robotics projects at NYU.
01:21:37.540 | - And he builds his own home robot.
01:21:40.240 | - Yeah, exactly.
01:21:42.040 | On a more, yeah, maybe you have another thing.
01:21:43.800 | What are like the things that you're most excited about
01:21:46.120 | beyond like text generation
01:21:47.800 | and kind of the more mundane usage?
01:21:50.040 | - Yeah, I mean in general,
01:21:51.800 | I have more things I'm generally excited about
01:21:54.420 | than I can possibly do.
01:21:56.920 | Investing is one way to try to clear those urges.
01:22:01.920 | I'm generally excited about robotics being a possibility,
01:22:09.800 | home robotics being like five to seven years away
01:22:16.080 | into commercialization.
01:22:17.560 | I think like it's not like next year or two years from now,
01:22:21.700 | but like five to seven years from now,
01:22:24.040 | I think like a lot more robotics companies might pop out.
01:22:27.960 | There's not a good consensus
01:22:31.680 | on whether hardware is a bottleneck
01:22:33.680 | or AI is a bottleneck in robotics right now.
01:22:36.300 | My view is actually hardware is still the bottleneck
01:22:40.440 | and AI is also a little bit of bottleneck,
01:22:43.240 | but like I don't think there's any like obvious
01:22:46.520 | breakthroughs we need.
01:22:50.240 | I think it just work.
01:22:51.720 | So I'm generally excited about robotics.
01:22:53.600 | I spend a lot of time, a lot of personal time,
01:22:55.980 | I spend like every Wednesday afternoon at NYU
01:22:58.960 | working with Laurel Pinto and team
01:23:01.440 | and just getting towards my like home robot
01:23:05.380 | that just does my dishes and stuff.
01:23:07.920 | - What's the status of it?
01:23:08.760 | Like what does it do for you now?
01:23:10.300 | - As of today, we just deployed a couple months ago,
01:23:15.300 | we deployed our home robotic stuff
01:23:19.520 | into like several tens of New York City homes
01:23:24.160 | and like try to make it do a bunch of tasks.
01:23:26.960 | And we're basically starting to build out a framework
01:23:31.220 | that gets to a certain level of robustness
01:23:34.120 | on fairly simple tasks, like picking this cup
01:23:39.260 | and putting it somewhere else
01:23:40.440 | or like taking a few pieces of cloth on the ground
01:23:44.560 | and put it somewhere else or open your microwave.
01:23:48.820 | Like various like baseline tasks like that
01:23:51.980 | with low sample complexity.
01:23:55.480 | So like the key thing, I think one of the things
01:23:59.080 | people don't spend their time in robotics
01:24:00.640 | is like the user experience,
01:24:02.240 | which I think in the research I do at NYU,
01:24:07.000 | we spend a huge amount of time on.
01:24:09.140 | I think the key there is sample complexity
01:24:11.180 | has to be really low.
01:24:12.740 | A lot of the current robotics research if you see,
01:24:15.660 | they're like, oh yeah, we collected like 50 demos
01:24:18.000 | and now it's able to do this task
01:24:19.800 | or we collected like 300 demos or like...
01:24:22.940 | It's a sample, the number of samples you need
01:24:24.860 | for this thing to do the task is really high.
01:24:27.220 | So we're focusing a lot on...
01:24:29.780 | You show it like two or three times
01:24:32.700 | and that's sufficient for it to actually like do the task.
01:24:35.860 | But it comes with like less generalization, right?
01:24:39.740 | Like there's some initial conditions
01:24:41.560 | that have to be true for it to do the task.
01:24:43.860 | So we're making progress.
01:24:47.360 | That's very interesting in general, the space.
01:24:49.980 | I don't think people in this space
01:24:52.880 | have settled on the hardware,
01:24:55.340 | like how the hardware looks like
01:24:57.580 | for it to be truly useful in the home or whatever.
01:25:00.120 | Or the UX or the like AI/ML stuff needed
01:25:06.580 | to make it sample efficient and all of that.
01:25:10.700 | But I think like lots of work is happening in the field.
01:25:15.500 | - Yeah, one of my friends, Carlo at Berkeley,
01:25:18.140 | he worked on a project called M3L,
01:25:20.020 | which is two CNNs, one for tactile feedback
01:25:23.400 | and one for image.
01:25:25.280 | When you say hardware,
01:25:26.120 | is it running all these things on the edge
01:25:28.960 | or is it just like the actual servos and the...
01:25:33.020 | - Yeah, by hardware I mean like the actual like servos,
01:25:37.380 | like the motors, servos, even like the sensors,
01:25:42.380 | I think we have incredible vision
01:25:47.380 | that still like is so much better compared to
01:25:53.460 | in the field of view and in resolution
01:25:55.320 | compared to any of the cameras we can buy.
01:25:59.120 | We have, our skin is like all available touch sensing
01:26:04.120 | and we have like some of the most efficient,
01:26:09.320 | some of the most high capacity motors
01:26:14.180 | that can lift large loads in like the dexterity
01:26:18.300 | of a hand and stuff.
01:26:19.860 | So in terms of hardware, I mean like in terms
01:26:24.500 | of those capabilities, like we haven't figured out
01:26:28.620 | how to do a lot of this stuff.
01:26:31.860 | I mean Tesla has been making incredible progress.
01:26:34.660 | OneX I think announced their new thing that looks incredible.
01:26:39.660 | Some of the other companies figure
01:26:42.540 | and like others are doing great work.
01:26:44.860 | But we're really not anywhere close to like the hardware
01:26:48.000 | that we feel like we need.
01:26:50.120 | And there's obviously the other thing I want to call out is
01:26:53.600 | a lot of what people show works,
01:27:00.720 | but like has to be fixed all the time.
01:27:02.400 | I mean like that's the other thing we are incredible at.
01:27:05.580 | Like we don't need any maintenance
01:27:07.980 | or like the maintenance is part of us.
01:27:10.520 | If you buy a product, an electronics product of any kind,
01:27:16.020 | you buy a PS5, you don't say, oh yeah,
01:27:18.580 | my PS5 breaks like every six days
01:27:20.900 | and I have to like do some reasonable amount of work on it.
01:27:23.940 | But like that's robotics,
01:27:26.700 | like if it's not industrial robotics
01:27:28.540 | where it's very controlled and specialized or whatever,
01:27:31.580 | like you're talking about reliability like in those ranges.
01:27:35.020 | So I think people don't talk
01:27:37.420 | about the reliability thing enough.
01:27:38.780 | Like when I mean like we're gonna enter
01:27:41.660 | the commercialization phase,
01:27:42.880 | I mean like we're gonna start thinking about,
01:27:45.320 | okay, now we have this thing and we need to figure out
01:27:47.460 | how to get reliability high enough to deploy it into homes
01:27:50.460 | and like just sell it to people
01:27:52.380 | and like Best Buy or something.
01:27:54.340 | So that's the other factor
01:27:56.020 | that we have to make a lot of progress on.
01:27:59.180 | - I just realized that Google has a play in this
01:28:02.740 | with like Palm E and stuff
01:28:04.060 | and OpenAI obviously has a long history
01:28:06.220 | of doing this stuff.
01:28:07.220 | Is there anything at Meta?
01:28:09.760 | No robotics stuff at Meta?
01:28:12.820 | - I used to, we have a small robotics program
01:28:15.620 | at Meta out of FAIR.
01:28:17.280 | I actually used to do it at FAIR a little bit
01:28:19.580 | before I moved into Infra and focused on my Meta time
01:28:23.340 | on a lot of like other infrastructural stuff.
01:28:26.940 | So yeah, Meta's robotics program is a lot smaller.
01:28:30.700 | - Seems like it would be a fit in personal computing.
01:28:36.140 | - You can think of it as like Meta has a ridiculously
01:28:40.380 | large device strategy, right?
01:28:42.380 | Like this is how our reality labs stuff.
01:28:45.840 | Like we're going at it from VR and AR
01:28:48.500 | and we showcase all that stuff.
01:28:50.920 | I think for Meta, like the robot is not as important
01:28:56.280 | as like the physical devices kind of stuff, for sure.
01:29:01.280 | - Okay, I want to touch on Osmo a bit
01:29:04.460 | because very unusual company too.
01:29:06.660 | The stuff that we normally discuss,
01:29:08.240 | not robotics, sense of smell.
01:29:10.360 | The original pitch I heard from the founder,
01:29:14.300 | maybe you can correct me,
01:29:15.140 | is that he realized that you can smell cancer.
01:29:17.580 | Is that intuitive?
01:29:20.060 | Is that what you get?
01:29:21.020 | - Yeah, I mean first like the very interesting reason
01:29:25.540 | I invested in Osmo is because Alex Wilszko,
01:29:28.740 | the founder of Osmo, also was like a,
01:29:33.740 | before PyTorch there was Torch.
01:29:38.420 | And Alex Wilszko actually worked on Torch.
01:29:41.020 | He's actually like a frameworks guy.
01:29:43.660 | He built this thing called Tangent from Google,
01:29:48.120 | like another like alternative framework and stuff.
01:29:52.560 | So I know him from that side of things.
01:29:55.100 | And then he's a neurobiologist by training.
01:29:59.540 | He just happens to also love like neural networks
01:30:03.060 | and like hacking on those frameworks.
01:30:05.140 | So incredibly smart guy, one of the smartest people I know.
01:30:08.560 | So when he was going in this direction,
01:30:11.620 | I thought it was incredible that like smell
01:30:16.620 | is something that we haven't even started to scrape
01:30:20.860 | in terms of digitization.
01:30:22.660 | When we think about audio or images or video,
01:30:26.580 | they're like so advanced that we have the concept
01:30:30.260 | of color spaces.
01:30:31.240 | We have the concept of like frequency spectrums.
01:30:34.300 | Like, you know, we figured out how ears process
01:30:37.080 | like frequencies in mouth spectrum or whatever,
01:30:39.880 | like logarithmically scaled images for like RGB, YUV.
01:30:44.100 | Like we have so many different kinds of parameterizations.
01:30:47.020 | We have formalized these two senses ridiculously.
01:30:53.020 | (laughing)
01:30:55.080 | Touch and smell, nada.
01:30:58.740 | We're like where we were with images and say in 1920
01:31:03.740 | or maybe even the 1800s, right?
01:31:06.500 | That's where we're at.
01:31:07.500 | And Alex has this incredible vision
01:31:10.060 | of like having a smell sensor just eventually
01:31:15.060 | just be part of your daily life.
01:31:18.380 | Like as of today, you don't really think about
01:31:22.100 | like when you're watching an Instagram reel or something,
01:31:24.380 | huh, like I also would love to know what it smelled like
01:31:28.700 | and if you're watching a reel of a food or something.
01:31:32.020 | You don't because we really haven't as a society
01:31:36.180 | got that muscle to even understand
01:31:38.920 | what a smell sensor can do.
01:31:41.500 | I think the more near term effects are obviously
01:31:44.360 | going to be around things that provide more obvious utility
01:31:49.420 | in the short term, like maybe smelling cancer
01:31:52.580 | or like repelling mosquitoes better or stuff like that.
01:31:57.580 | - More recently he's been talking about
01:31:58.900 | like categorizing perfumes.
01:32:00.260 | - Yeah, exactly.
01:32:01.220 | - That's a market that you can pursue.
01:32:02.420 | - Yeah, like I mean think about how you can customize
01:32:06.200 | a perfume to your own liking in the same way
01:32:09.180 | you can customize a shoe or something, right?
01:32:11.940 | So that's I think all the near term stuff.
01:32:15.780 | I think if he's able to figure out a near term value for it,
01:32:20.780 | they as a company can sustain themselves
01:32:24.480 | to then eventually like try to make progress
01:32:26.940 | on the long term which is really in uncharted territory.
01:32:31.600 | Like think about it, 50 years from now,
01:32:35.660 | it would be pretty obvious to like kids of the generation
01:32:38.900 | to just like, I guess I was saying,
01:32:41.780 | I was gonna say scroll a reel on their phone
01:32:43.660 | and maybe phones would be there.
01:32:46.020 | They're just like on their glasses,
01:32:48.540 | they're watching something and then they immediately get
01:32:53.380 | like a smell sense off that remote experience as well.
01:32:57.140 | Like we haven't really progressed enough in that dimension
01:33:02.140 | and I think they have a chance to do it.
01:33:05.460 | - Awesome.
01:33:07.300 | Awesome, I mean we touched on a lot of things.
01:33:09.060 | Anything, we're missing anything
01:33:10.980 | you wanna direct people to or?
01:33:13.360 | - Yeah, call to action, call for research,
01:33:16.660 | call for startups.
01:33:18.060 | - I don't really have a lot of calls to action
01:33:20.060 | because usually I think people
01:33:22.980 | should be intrinsically like.
01:33:24.460 | (laughing)
01:33:25.420 | - That's a good--
01:33:26.260 | - Look inside yourself.
01:33:27.100 | (laughing)
01:33:28.920 | - That's good, awesome.
01:33:30.480 | Thank you so much for coming on.
01:33:31.560 | - Yeah, for sure.
01:33:32.600 | - Thanks a bit.
01:33:33.440 | (upbeat music)
01:33:36.020 | (upbeat music)
01:33:38.600 | (upbeat music)
01:33:41.200 | (upbeat music)
01:33:43.780 | (upbeat music)
01:33:46.380 | (upbeat music)
01:33:48.960 | (upbeat music)
01:33:51.540 | (upbeat music)
01:33:54.120 | (upbeat music)
01:33:56.700 | (upbeat music)
01:33:59.280 | (upbeat music)
01:34:01.860 | (upbeat music)
01:34:04.440 | (upbeat music)
01:34:07.020 | (upbeat music)
01:34:09.600 | (upbeat music)
01:34:12.180 | (upbeat music)
01:34:14.760 | (upbeat music)
01:34:17.340 | (upbeat music)
01:34:19.920 | (upbeat music)
01:34:22.500 | (upbeat music)
01:34:25.080 | (upbeat music)
01:34:27.660 | (upbeat music)
01:34:30.240 | (upbeat music)
01:34:32.820 | (upbeat music)
01:34:35.400 | (upbeat music)
01:34:37.980 | (upbeat music)
01:34:40.560 | (upbeat music)
01:34:43.140 | (upbeat music)
01:34:45.720 | (upbeat music)
01:34:48.300 | (upbeat music)
01:34:50.880 | (upbeat music)
01:34:53.460 | (upbeat music)
01:34:56.040 | (upbeat music)
01:34:58.620 | (upbeat music)
01:35:01.200 | (upbeat music)
01:35:03.780 | (upbeat music)
01:35:06.360 | (upbeat music)
01:35:08.940 | (upbeat music)
01:35:11.520 | (upbeat music)
01:35:14.100 | (upbeat music)
01:35:16.680 | (upbeat music)
01:35:19.260 | (upbeat music)
01:35:21.840 | (upbeat music)
01:35:24.420 | (upbeat music)
01:35:27.000 | (upbeat music)
01:35:29.580 | (upbeat music)
01:35:32.160 | (upbeat music)
01:35:34.740 | (upbeat music)
01:35:37.320 | (upbeat music)
01:35:39.900 | (upbeat music)
01:35:42.480 | (upbeat music)
01:35:45.060 | (upbeat music)
01:35:47.640 | (upbeat music)
01:35:50.220 | (upbeat music)
01:35:52.800 | (upbeat music)
01:35:55.380 | (upbeat music)
01:35:57.960 | (upbeat music)
01:36:00.540 | (upbeat music)
01:36:03.120 | (upbeat music)
01:36:05.700 | (upbeat music)
01:36:08.280 | (upbeat music)
01:36:10.860 | (upbeat music)
01:36:13.440 | (upbeat music)
01:36:16.020 | (upbeat music)
01:36:18.600 | (upbeat music)
01:36:21.180 | (upbeat music)
01:36:23.760 | (upbeat music)
01:36:26.340 | (upbeat music)
01:36:28.920 | (upbeat music)
01:36:31.500 | (upbeat music)
01:36:34.080 | (upbeat music)
01:36:36.660 | (upbeat music)
01:36:39.240 | (upbeat music)
01:36:41.820 | (upbeat music)
01:36:44.400 | (upbeat music)
01:36:46.980 | (upbeat music)
01:36:49.560 | (upbeat music)
01:36:52.140 | (upbeat music)
01:36:54.720 | (upbeat music)
01:36:57.300 | (upbeat music)
01:36:59.880 | (upbeat music)
01:37:02.460 | (upbeat music)
01:37:05.040 | (upbeat music)
01:37:07.620 | (upbeat music)
01:37:10.200 | (upbeat music)
01:37:12.780 | (upbeat music)
01:37:15.360 | (upbeat music)
01:37:17.940 | (upbeat music)
01:37:20.520 | (upbeat music)
01:37:23.100 | (upbeat music)
01:37:25.680 | (upbeat music)
01:37:28.260 | (upbeat music)
01:37:30.840 | (upbeat music)
01:37:33.420 | [BLANK_AUDIO]