Large Scale AI on Apple Silicon — Alex Cheema, EXO Labs

00:00:00.000 | -

00:00:02.000 | - Yeah, thank you all for coming.

00:00:17.140 | I'm sure you're wondering what this has to do with AI,

00:00:19.980 | but we'll get there.

00:00:20.980 | So let me set the stage.

00:00:24.880 | At the turn of the 20th century,

00:00:26.440 | physics had a big problem.

00:00:29.080 | So the problem was that the current theory

00:00:32.940 | said that there would be an infinite amount of energy

00:00:35.020 | in the universe, which was very strange,

00:00:36.920 | because clearly that would violate

00:00:40.180 | a lot of physical principles.

00:00:42.640 | So what did physicists do about this?

00:00:47.940 | Well, this guy called Max Planck came along,

00:00:50.080 | and he was like, well, if we just assume

00:00:53.740 | that all the energy is quantized in some way,

00:00:56.520 | then we can solve this problem.

00:00:58.160 | Mathematically, it works out fine.

00:00:59.880 | But he introduced this new thing called H,

00:01:02.020 | which is like this constant that comes out of nowhere.

00:01:04.100 | And, you know, they didn't really know at the time

00:01:07.880 | like how to measure that.

00:01:09.960 | So another chap in 1909 called Millican came up with an experiment

00:01:17.880 | to sort of measure this value of H.

00:01:20.820 | So he did it indirectly by measuring the charge of an electron.

00:01:24.140 | And the way he did it is he put this sort of charged oil spray

00:01:29.900 | into some water with some charged plates and then looked at how fast this charge moves.

00:01:35.760 | The details aren't so important, but he got a result, and, you know, it was like big news in the scientific community.

00:01:44.220 | And everyone was like, oh, man, now we know what this charge is, we know what this H is.

00:01:50.280 | And all was good, and then, you know, sort of, he had this result and everyone was using it for their calculations and stuff.

00:01:58.960 | And, you know, many years went by and all the experiments seemed to agree with his reading.

00:02:04.660 | And, you know, it took until just after that last data point, like 1929, for the sort of people to realize that,

00:02:26.140 | okay, well, you know, this isn't the actual value.

00:02:29.380 | And I think what's interesting is when you look at the history,

00:02:33.220 | you've got this, like, period of 15 years or so where everyone thought that this was the sort of the right value.

00:02:42.440 | And, you know, you look back and you think, okay, why did they think this?

00:02:48.780 | And it turns out that, you know, people are doing the right experiments.

00:02:54.200 | They were doing a lot of experiments.

00:02:56.200 | And there's a lot of data points that would be in between here as well that's just, you know, been lost in the history books.

00:03:03.200 | But, you know, and part of that is because of embarrassment.

00:03:07.220 | This is, like, a very embarrassing thing for the scientific community.

00:03:09.880 | So, what happened?

00:03:12.420 | Well, basically, these scientists, they were running their experiments.

00:03:16.640 | They got a result and then they saw, oh, wait, this great Millikan guy that had this result in 1909, it doesn't agree with him.

00:03:25.420 | So I must be wrong.

00:03:27.640 | And then they sort of, like, fudged, you know, the experiments to, you know, make it work such that they get the same value as him.

00:03:36.640 | And that went on for a long time.

00:03:38.860 | And this is, like, this is not a trivial thing.

00:03:40.660 | This is quite an important thing to science in general.

00:03:42.700 | So this kind of gets me up to the point of, like, you know, sort of scientific rigor.

00:03:49.180 | And, you know, it's actually very tricky to do science properly.

00:03:54.640 | And, you know, the fact that, you know, these subsequent experiments, basically, they all agreed on the wrong thing.

00:04:04.700 | And it says something about, you know, sort of the way in which scientific progress happens.

00:04:11.120 | So, you know, there's a lot of inertia behind the current way of doing things.

00:04:15.160 | And, you know, there's another example, which is sort of about questioning assumptions.

00:04:20.320 | And, you know, there's another guy who is looking at something completely different.

00:04:25.120 | He was looking at experiments with rats.

00:04:28.800 | And he wanted to basically, he had a hypothesis.

00:04:32.380 | He wanted to test, like, some weird esoteric thing.

00:04:35.040 | He wanted to test that, like, you know, that rats, he could basically get rats to navigate a maze in a specific way.

00:04:43.760 | He wanted them to go through this corridor of doors, right?

00:04:47.720 | And they would go through a random door.

00:04:49.380 | And they wanted them, he wanted the rats to come out of a door that was three doors along from the one that they went in.

00:04:56.080 | It doesn't matter which one they go in.

00:04:57.420 | He always wanted it to be three.

00:04:58.620 | So he wanted to show that, like, they could actually think and, you know, be able to consistently go, like, three doors along.

00:05:05.320 | And basically, he tried, like, a bunch of stuff.

00:05:10.640 | And what kept happening was whatever door, like, he basically put, like, a piece of food on the door that was three doors along to make them go through that one.

00:05:21.820 | But what ended up happening is they just always went to the previous door.

00:05:25.080 | So if it was, like, you know, door one that they went through and he wanted it to come out of door four, and then he tested it again where they go through door two and they should go out of door five.

00:05:36.100 | They would just still go out of door four.

00:05:38.340 | And he tried a bunch of stuff.

00:05:41.660 | So, like, he was, like, okay, why is this happening?

00:05:43.660 | Like, how did the rats know, like, to go back to that same door?

00:05:47.780 | So he was, like, he very meticulously went through and he made sure that there was no pattern or anything that they could distinguish on the doors.

00:05:55.760 | He painted them all the same way.

00:05:57.340 | He made sure the patterns were all the same.

00:05:59.200 | But still it didn't work.

00:06:00.920 | He thought it might be the smell.

00:06:03.460 | So maybe there was, like, a smell that came from the food.

00:06:06.900 | He tried basically putting chemicals in so that they couldn't smell the food.

00:06:10.560 | And that didn't work either.

00:06:11.980 | And then he thought, okay, maybe it's something to do with the lighting.

00:06:15.880 | You know, like, a human could do this, like, through common sense, just sort of see, like, okay, the lighting is in such and such a way and see the pattern.

00:06:22.820 | And so he covered up the corridors and stuff and made sure that that couldn't be a thing.

00:06:26.560 | And still, you know, the same thing happened.

00:06:29.640 | And eventually what he found out was that the reason they could consistently go to that same door was because they remembered the sounds.

00:06:38.480 | So as they walked along, they remembered the pattern of the sounds in this corridor.

00:06:44.620 | So what he did was he put some sand in there so that they couldn't distinguish the sounds, basically.

00:06:52.120 | Now, from a scientific perspective, this is, like, S-tier science in terms of, you know, really clearly looking at, like, what are all the assumptions I'm making and just, like, systematically eliminating them.

00:07:05.360 | And, you know, this is great, great science.

00:07:08.900 | But, you know, the problem was that the scientific community didn't agree.

00:07:12.540 | So the people that were conducting experiments at the time, they made a lot of these assumptions.

00:07:17.840 | And they were kind of stuck in their ways.

00:07:20.160 | So they, you know, they discard this.

00:07:22.540 | So, you know, none of the, it wasn't cited, you know, this was, this was basically just forgotten.

00:07:27.780 | And so I think, you know, there's sort of this tendency to stick to the current way that things are done.

00:07:38.000 | And even the methodology, if it's spot on, if there's a certain way of doing things, then that kind of, you know, has a lot of inertia behind it.

00:07:47.080 | And, you know, Feynman talked about this in one of his talks called Cogga called Science, in, like, the 70s.

00:07:54.360 | And he had this, like, you know, quote, the first principle is that you must not fool yourself.

00:08:02.040 | And you are the easiest person to fool.

00:08:03.620 | And I think, you know, there's this tendency to sort of make, oversimplify the science.

00:08:13.280 | You know, it is hard to get it right.

00:08:14.920 | And if you're interested that Gwen wrote, like, a blog post all about this and there's some controversy about, like, who this guy was and stuff.

00:08:22.060 | But getting to AI, so, like, you know, there's a very similar thing happened in AI, you know, sort of about questioning assumptions, right?

00:08:33.480 | And, you know, just having a good idea out there is not enough.

00:08:36.720 | So, you know, in 1963, backpropagation was introduced in this paper.

00:08:43.540 | And then it was, you know, reinvented in 1976 in this paper and reinvented again in 1988.

00:08:50.320 | And then, you know, sort of deep convolutional neural networks were introduced here.

00:08:56.760 | And then it was only in 1989 that these two things were combined.

00:08:59.480 | So, you had, like, deep CNNs and backpropagation.

00:09:04.420 | And then it was only, like, three decades later that, you know, CNNs were widely accepted.

00:09:14.240 | There was still, like, this massive skepticism, even though the ideas were out there.

00:09:17.600 | And why was that?

00:09:20.060 | I think, like, a big part of it is sort of, again, being stuck in the way of doing things

00:09:25.420 | and looking at the existing hardware.

00:09:27.140 | So, if you look at, obviously, CPUs, they have this von Neumann bottleneck.

00:09:31.680 | You know, they have really good single-core performance.

00:09:35.180 | But, you know, if you're sort of having to read memory often,

00:09:39.540 | then it's usually bottlenecked by that.

00:09:41.980 | And, you know, you can sort of look at, like, at a systems level,

00:09:46.400 | why did GPUs fix that?

00:09:47.940 | And it sort of changed this, like, ratio of how many, you know,

00:09:52.840 | how many bytes you have to load to how many flops you can execute.

00:09:55.680 | And, you know, it's kind of striking, you know, like,

00:10:00.200 | when you look at the history of this,

00:10:02.840 | this was, like, a groundbreaking paper and a very famous paper,

00:10:05.940 | like, where they trained a network on 1,000 machines,

00:10:09.980 | 16,000 CPU cores, and it took three days.

00:10:12.880 | And then, like, less than a year later,

00:10:15.700 | there was this other paper that got the exact same results,

00:10:18.500 | but it took, like, you know, three machines in a couple of days.

00:10:22.580 | And this was using, you know, hardware acceleration,

00:10:26.080 | like, using GPUs.

00:10:26.980 | So, you know, this gets me to the hardware lottery,

00:10:33.260 | which is essentially this idea introduced by Sarah Hooker in 2020,

00:10:37.000 | which says that, you know,

00:10:39.800 | the best research ideas don't necessarily win.

00:10:42.460 | There's a lot of factors that sort of make it

00:10:44.680 | so that a great idea can be out there,

00:10:46.760 | great science can be being done,

00:10:48.240 | but, you know, it doesn't necessarily get adopted and accepted.

00:10:52.120 | There's a recent example of this,

00:10:56.660 | which I think is quite interesting,

00:10:57.860 | that I, you know, preparing for this,

00:11:00.760 | I had the realization that, you know,

00:11:04.480 | LLMs are kind of a forcing,

00:11:06.020 | like, they're kind of creating inertia as well,

00:11:08.980 | because the things that they're good at

00:11:11.060 | are the things that people will work on.

00:11:12.400 | So if they're good at generating Python code,

00:11:14.460 | then more people will write Python code.

00:11:15.740 | And it's this feedback loop of, you know,

00:11:19.080 | more people use it,

00:11:20.460 | and then the LLMs get better at that thing.

00:11:21.920 | And, you know, now,

00:11:22.800 | if you wanted to come out with a new programming language,

00:11:24.460 | maybe it is a really good idea.

00:11:25.780 | You wouldn't,

00:11:27.140 | it's much harder to get adoption

00:11:29.420 | if the tooling is, like, way worse

00:11:31.000 | if the LLMs don't have good support for it.

00:11:32.920 | And there's this paper that, you know,

00:11:34.520 | this result, like, basically,

00:11:37.160 | they made a table of, like,

00:11:39.600 | all the tasks that, like,

00:11:41.960 | different languages are best at,

00:11:42.940 | and basically everything was Python.

00:11:44.140 | And, yeah, like,

00:11:47.320 | what is it?

00:11:49.120 | 90 to 97% of all problems,

00:11:51.260 | yeah, Python was the best thing.

00:11:53.900 | So what if we did question our assumptions?

00:11:59.560 | So that brings me to sort of

00:12:02.280 | what I'm working on with EXO.

00:12:03.460 | So what we're doing is

00:12:06.460 | we're building a orchestration layer

00:12:10.000 | for, you know,

00:12:13.980 | AI that runs on different hardware targets.

00:12:16.780 | And it's sort of,

00:12:18.260 | we're sitting at this layer

00:12:19.240 | that I haven't seen much right now,

00:12:22.060 | and it's kind of a pain point

00:12:23.660 | in terms of just having this, like,

00:12:26.280 | reliable thing that can orchestrate

00:12:28.180 | a lot of different kinds of devices

00:12:30.980 | with different connections

00:12:32.400 | in, like, this ad hoc kind of mesh network.

00:12:34.860 | And just to give you, like,

00:12:37.000 | some idea of, like,

00:12:38.320 | kind of the kind of things

00:12:39.280 | that we're doing

00:12:41.080 | and the kind of, you know,

00:12:43.820 | the solution space that we sit in,

00:12:45.220 | like, you know,

00:12:46.200 | everything in EXO is modeled

00:12:48.180 | as a causally consistent set of events.

00:12:52.780 | So there's essentially, like,

00:12:55.700 | an ordering on everything

00:12:57.280 | that happens across

00:12:58.040 | the whole distributed system.

00:12:59.160 | If there's all these things going on,

00:13:00.540 | it's really hard to reason about, like,

00:13:01.840 | sort of, if you want to move

00:13:03.180 | a KV cache around,

00:13:04.160 | like, how do you know

00:13:05.000 | that it happened successfully?

00:13:05.920 | How do you know then

00:13:06.600 | if something depends on that?

00:13:08.040 | So you build this sort of causal graph,

00:13:11.000 | and, you know,

00:13:11.600 | you can then reason about the system

00:13:13.780 | and get some guarantees about,

00:13:15.760 | you know, sort of where the data is

00:13:19.900 | and, you know, what's going on.

00:13:21.580 | So just to give you a quick example

00:13:23.100 | to put this into, like,

00:13:24.060 | you know, more practical terms,

00:13:26.780 | like, what does this enable?

00:13:27.740 | Spark is coming out soon.

00:13:30.200 | I'm still waiting,

00:13:31.620 | but it's been delayed a few times,

00:13:33.420 | I think, but hopefully soon.

00:13:34.860 | And this is NVIDIA's new, like,

00:13:37.840 | consumer thing,

00:13:38.440 | if you haven't seen that.

00:13:39.460 | And it's, like,

00:13:40.060 | it has a lot of flops.

00:13:40.840 | It's pretty good for the cost.

00:13:42.560 | But the memory bandwidth

00:13:44.860 | is kind of lacking.

00:13:45.540 | It doesn't have that much memory.

00:13:46.580 | Like, Studio has a lot more memory bandwidth,

00:13:48.760 | but a lot less flops.

00:13:49.700 | So if you look at sort of just

00:13:51.440 | if you wanted to generate

00:13:53.760 | with an LLM,

00:13:55.400 | like, there's two phases, right?

00:13:56.780 | There's the pre-fill phase,

00:13:57.980 | which is compute bound,

00:13:58.840 | and the generation phase,

00:13:59.900 | which is memory bandwidth bound.

00:14:01.320 | As far as I know,

00:14:04.120 | there isn't really anything

00:14:04.900 | that can nicely do this reliably,

00:14:06.460 | where you would have these, like,

00:14:07.940 | different sets of devices,

00:14:09.160 | and then figure out

00:14:10.360 | the best way

00:14:10.920 | to, like, run this whole workload

00:14:12.660 | across, you know,

00:14:14.840 | all the device

00:14:15.560 | that you have available.

00:14:16.280 | But this is possible now

00:14:18.200 | with EXO.

00:14:18.880 | And just another example,

00:14:21.560 | this is some research

00:14:22.680 | that we're working on.

00:14:23.840 | It's more on the training side.

00:14:26.380 | So this will be out pretty soon.

00:14:32.220 | It will be made public.

00:14:33.500 | Essentially,

00:14:35.020 | we're kind of questioning

00:14:37.180 | all these assumptions

00:14:37.800 | about, like, you know,

00:14:38.620 | what hardware is best to run on.

00:14:39.800 | If you look at Apple Silicon,

00:14:40.860 | it has a lot more memory,

00:14:41.940 | obviously,

00:14:42.420 | but it's a lot more expensive

00:14:43.780 | per flop.

00:14:44.220 | But what if you could use

00:14:47.360 | that memory for something useful

00:14:49.040 | to make training more efficient?

00:14:51.340 | And there's been this whole area

00:14:53.060 | of research of, like,

00:14:53.860 | sort of second-order methods

00:14:54.900 | and, you know,

00:14:55.920 | different ways of making training

00:14:58.240 | more efficient,

00:14:58.700 | but a lot of them have been discarded

00:15:00.460 | because of the memory requirements.

00:15:02.160 | And so what we've done

00:15:05.040 | is we've come out with,

00:15:05.820 | we're going to come out with this,

00:15:06.940 | like, new optimizer,

00:15:08.120 | which is essentially, like,

00:15:09.840 | two times more efficient

00:15:11.500 | per flop than Atom,

00:15:16.180 | but it uses a lot more memory.

00:15:17.800 | So if you look at this sort of ratio

00:15:19.180 | of, like, memory to flops

00:15:20.220 | of Apple Silicon,

00:15:20.960 | it's probably, like, 20x.

00:15:22.140 | It's around 20x,

00:15:24.320 | depending on which one you get.

00:15:25.260 | NVIDIA.

00:15:26.280 | So you've got all that spare memory

00:15:27.840 | that you can, you know,

00:15:29.020 | think about, okay,

00:15:29.660 | what does the solution space look like

00:15:31.780 | if we make use of that memory?

00:15:33.160 | And, yeah,

00:15:35.840 | we're buying a lot of mags.

00:15:37.540 | We're, like, trying a lot of stuff.

00:15:38.680 | This is the first batch,

00:15:41.080 | but there's going to be another batch.

00:15:42.580 | And, yeah,

00:15:45.480 | we're just running a lot

00:15:46.420 | of experiments at scale.

00:15:47.780 | Like, there's a lot of stuff

00:15:48.480 | in that paper

00:15:48.940 | that's really interesting

00:15:49.720 | in terms of, like,

00:15:50.620 | not many people have tried

00:15:51.720 | doing large training runs

00:15:54.380 | on Apple Silicon.

00:15:54.860 | I mean, nobody.

00:15:55.440 | Nobody has.

00:15:56.080 | Even, I mean,

00:15:57.780 | talking to people at Apple,

00:15:58.760 | like, they're also surprised

00:16:00.100 | at these results.

00:16:01.020 | Like, they're not really aware

00:16:02.900 | of what the hardware is capable of.

00:16:06.500 | Just a final thing as well

00:16:09.060 | around sort of, you know,

00:16:10.880 | going back to the process

00:16:15.020 | of doing science, right?

00:16:16.400 | Like, if anything,

00:16:17.860 | like, publishing kind of,

00:16:19.320 | like, publishing results

00:16:22.360 | that are maybe not the best,

00:16:23.400 | I think, is something

00:16:24.680 | that we need to, like,

00:16:25.580 | more normalize.

00:16:26.460 | And, you know,

00:16:27.120 | we're sort of,

00:16:27.980 | we're doing a lot

00:16:30.560 | to make all the data accessible,

00:16:34.420 | whether it's good or bad.

00:16:35.600 | like, you know,

00:16:36.220 | right now you can go

00:16:37.040 | to benchmarks.exelabs.net

00:16:38.540 | and there's a lot

00:16:39.620 | of configurations there

00:16:40.480 | that are just really bad.

00:16:41.420 | But, you know,

00:16:43.640 | I think having that data

00:16:46.600 | out there

00:16:47.100 | and publishing stuff

00:16:47.980 | that isn't necessarily,

00:16:49.460 | like, the best thing

00:16:51.140 | is super important.

00:16:53.420 | So we've got this,

00:16:55.360 | these benchmarks.

00:16:57.700 | Part of that is also running,

00:16:59.320 | like, these Macs continuously

00:17:00.960 | in CI

00:17:01.380 | and different devices.

00:17:02.460 | We'll probably end up

00:17:03.200 | with, like,

00:17:03.560 | at least one

00:17:04.940 | of pretty much,

00:17:05.560 | like, any device

00:17:06.700 | that you can reasonably run

00:17:08.460 | an AI workload on

00:17:10.900 | and just have those

00:17:12.180 | continuously pushing out

00:17:13.160 | to the benchmarks.

00:17:13.780 | Yeah, so we're coming out

00:17:18.420 | with a big release

00:17:19.100 | in the end of this week.

00:17:22.840 | So that will sort of,

00:17:26.320 | you know,

00:17:26.640 | be this orchestration,

00:17:29.880 | this orchestration layer

00:17:32.760 | that I talked about.

00:17:34.280 | And with that will come

00:17:35.580 | a lot of this sort of tooling

00:17:36.720 | around it.

00:17:37.840 | So the benchmarks website,

00:17:39.040 | a lot of stuff around,

00:17:41.540 | you know,

00:17:42.060 | being able to test

00:17:43.980 | different algorithms

00:17:44.800 | that run on different devices.

00:17:46.480 | So we kind of have

00:17:49.340 | Exogym, for example,

00:17:50.540 | which is a way

00:17:51.500 | to run experiments

00:17:52.460 | on, you know,

00:17:54.620 | if you don't have

00:17:55.620 | 16 Macs like this,

00:17:57.740 | then, you know,

00:17:58.860 | being able to actually

00:17:59.940 | just run them locally

00:18:01.040 | quickly

00:18:02.280 | and test different

00:18:03.120 | distributed algorithms,

00:18:04.060 | that would be part

00:18:05.840 | of those releases.

00:18:06.880 | Yep, that's it.

00:18:12.560 | We've actually got

00:18:23.280 | like four minutes,

00:18:24.380 | so we're ahead of schedule.

00:18:25.080 | Anyone have questions?

00:18:25.740 | Don't get mad

00:18:28.400 | if I don't pick you,

00:18:28.980 | but I'll tell you,

00:18:29.640 | we'll do these three right here.

00:18:30.960 | Do you guys do

00:18:32.880 | communication parameters

00:18:34.880 | like all the issues

00:18:35.760 | that are at a higher level?

00:18:36.920 | Right now,

00:18:41.800 | like right now,

00:18:43.160 | it's a bit of a,

00:18:44.200 | like ideally not.

00:18:45.620 | Like ideally,

00:18:46.920 | we want to sit at

00:18:48.140 | like this,

00:18:48.760 | like higher up

00:18:51.200 | in the stack

00:18:51.720 | and just focus

00:18:52.480 | on this orchestration piece

00:18:53.440 | because I think

00:18:54.140 | that's where there's

00:18:54.740 | like not much.

00:18:56.120 | And, you know,

00:18:57.700 | the MLX team,

00:18:58.700 | for example,

00:18:59.060 | I've done a lot of work

00:19:00.020 | on like MLX distributed,

00:19:01.340 | which is really good.

00:19:02.160 | It's just,

00:19:03.120 | it's really fast,

00:19:04.080 | but it's like kind of brittle.

00:19:05.080 | So if you lose a connection,

00:19:06.300 | it would just break completely.

00:19:07.240 | Like you just,

00:19:08.000 | you get errors everywhere.

00:19:08.900 | And it's super specialized,

00:19:10.580 | obviously,

00:19:11.040 | to their configuration.

00:19:12.620 | So,

00:19:13.060 | you know,

00:19:15.940 | our hope is that

00:19:16.640 | there'll be a lot of work

00:19:17.700 | on that layer

00:19:18.320 | that is just done

00:19:19.760 | by all these frameworks

00:19:20.960 | like MLX and VLM.

00:19:22.200 | And then,

00:19:23.440 | and then we can sit on top.

00:19:24.460 | But right now,

00:19:24.980 | we're doing a lot of sort of

00:19:25.940 | just work with

00:19:26.760 | like the MLX team,

00:19:27.720 | for example,

00:19:28.120 | and just,

00:19:28.620 | you know,

00:19:29.420 | building out

00:19:30.900 | those primitives.

00:19:31.500 | Yeah,

00:19:42.420 | it's the orange one.

00:19:43.500 | how difficult would it be

00:19:46.480 | to scale up

00:19:46.900 | like the AMDB part

00:19:48.220 | because it's going to be

00:19:49.100 | like,

00:19:49.420 | it's going to be

00:19:50.100 | to be able to do that.

00:19:50.800 | Or is it optimistically

00:19:54.020 | that's like smaller

00:19:55.160 | than that?

00:19:56.040 | Yeah,

00:19:59.700 | so a lot of this stuff is,

00:20:00.800 | the absolute numbers

00:20:02.480 | don't matter too much.

00:20:03.260 | It's more about the ratios.

00:20:04.160 | So,

00:20:04.980 | AMD has way more flops

00:20:06.200 | than Apple Silicon would have.

00:20:07.440 | So,

00:20:07.600 | the ratio is probably

00:20:08.380 | going to be,

00:20:08.860 | you know,

00:20:09.620 | still really high.

00:20:10.520 | And then,

00:20:12.720 | yeah,

00:20:13.100 | it's just sort of,

00:20:14.220 | you know,

00:20:14.560 | what do you end up

00:20:15.100 | being bottlenecked by?

00:20:15.900 | Obviously,

00:20:16.320 | they have a lot more

00:20:16.860 | network bandwidth,

00:20:17.380 | but again,

00:20:17.780 | but it's relative

00:20:18.900 | to the flops,

00:20:19.440 | right?

00:20:19.700 | So,

00:20:20.260 | if you look at the ratio

00:20:21.560 | of network bandwidth

00:20:22.260 | to flops of Apple Silicon,

00:20:23.300 | it's actually better

00:20:23.960 | than AMD.

00:20:25.080 | So,

00:20:26.880 | I'm not sure,

00:20:27.440 | like I would need

00:20:27.880 | to look at the specific

00:20:28.860 | device that you're

00:20:30.380 | talking about,

00:20:30.760 | but maybe.

00:20:32.120 | Yes,

00:20:34.600 | I have one more

00:20:35.260 | short answer.

00:20:35.760 | How would you

00:20:37.060 | be

00:20:37.300 | yeah,

00:20:45.700 | we're not working

00:20:48.220 | on that.

00:20:48.740 | I think there's

00:20:49.540 | other projects

00:20:51.440 | that might be working

00:20:52.080 | on that kind of thing,

00:20:52.820 | like maybe

00:20:53.620 | Prime Intellect,

00:20:54.860 | Hyperbolic,

00:20:58.300 | perhaps,

00:20:58.680 | as well.

00:20:59.080 | A few of them

00:21:01.900 | in this room.

00:21:02.380 | But maybe

00:21:04.720 | there's some synergy

00:21:05.540 | that I don't know,

00:21:06.080 | like for now,

00:21:06.900 | at least there just

00:21:07.660 | seems to be

00:21:07.960 | a lot of work

00:21:08.380 | to be done

00:21:08.720 | on these private

00:21:10.340 | clusters where

00:21:11.080 | you have a fully

00:21:11.900 | trusted setup

00:21:12.480 | and you don't

00:21:15.280 | really care

00:21:15.660 | about all the

00:21:17.200 | hard problems

00:21:18.080 | that come

00:21:18.440 | with untrusted

00:21:20.680 | public networks.

00:21:22.120 | things.

00:21:22.260 | have a great day.

00:21:23.260 | have a great day.

00:21:24.260 | We'll see you next time.