back to index

Large Scale AI on Apple Silicon — Alex Cheema, EXO Labs


Whisper Transcript | Transcript Only Page

00:00:02.000 | - Yeah, thank you all for coming.
00:00:17.140 | I'm sure you're wondering what this has to do with AI,
00:00:19.980 | but we'll get there.
00:00:20.980 | So let me set the stage.
00:00:24.880 | At the turn of the 20th century,
00:00:26.440 | physics had a big problem.
00:00:29.080 | So the problem was that the current theory
00:00:32.940 | said that there would be an infinite amount of energy
00:00:35.020 | in the universe, which was very strange,
00:00:36.920 | because clearly that would violate
00:00:40.180 | a lot of physical principles.
00:00:42.640 | So what did physicists do about this?
00:00:47.940 | Well, this guy called Max Planck came along,
00:00:50.080 | and he was like, well, if we just assume
00:00:53.740 | that all the energy is quantized in some way,
00:00:56.520 | then we can solve this problem.
00:00:58.160 | Mathematically, it works out fine.
00:00:59.880 | But he introduced this new thing called H,
00:01:02.020 | which is like this constant that comes out of nowhere.
00:01:04.100 | And, you know, they didn't really know at the time
00:01:07.880 | like how to measure that.
00:01:09.960 | So another chap in 1909 called Millican came up with an experiment
00:01:17.880 | to sort of measure this value of H.
00:01:20.820 | So he did it indirectly by measuring the charge of an electron.
00:01:24.140 | And the way he did it is he put this sort of charged oil spray
00:01:29.900 | into some water with some charged plates and then looked at how fast this charge moves.
00:01:35.760 | The details aren't so important, but he got a result, and, you know, it was like big news in the scientific community.
00:01:44.220 | And everyone was like, oh, man, now we know what this charge is, we know what this H is.
00:01:50.280 | And all was good, and then, you know, sort of, he had this result and everyone was using it for their calculations and stuff.
00:01:58.960 | And, you know, many years went by and all the experiments seemed to agree with his reading.
00:02:04.660 | And, you know, it took until just after that last data point, like 1929, for the sort of people to realize that,
00:02:26.140 | okay, well, you know, this isn't the actual value.
00:02:29.380 | And I think what's interesting is when you look at the history,
00:02:33.220 | you've got this, like, period of 15 years or so where everyone thought that this was the sort of the right value.
00:02:42.440 | And, you know, you look back and you think, okay, why did they think this?
00:02:48.780 | And it turns out that, you know, people are doing the right experiments.
00:02:54.200 | They were doing a lot of experiments.
00:02:56.200 | And there's a lot of data points that would be in between here as well that's just, you know, been lost in the history books.
00:03:03.200 | But, you know, and part of that is because of embarrassment.
00:03:07.220 | This is, like, a very embarrassing thing for the scientific community.
00:03:09.880 | So, what happened?
00:03:12.420 | Well, basically, these scientists, they were running their experiments.
00:03:16.640 | They got a result and then they saw, oh, wait, this great Millikan guy that had this result in 1909, it doesn't agree with him.
00:03:25.420 | So I must be wrong.
00:03:27.640 | And then they sort of, like, fudged, you know, the experiments to, you know, make it work such that they get the same value as him.
00:03:36.640 | And that went on for a long time.
00:03:38.860 | And this is, like, this is not a trivial thing.
00:03:40.660 | This is quite an important thing to science in general.
00:03:42.700 | So this kind of gets me up to the point of, like, you know, sort of scientific rigor.
00:03:49.180 | And, you know, it's actually very tricky to do science properly.
00:03:54.640 | And, you know, the fact that, you know, these subsequent experiments, basically, they all agreed on the wrong thing.
00:04:04.700 | And it says something about, you know, sort of the way in which scientific progress happens.
00:04:11.120 | So, you know, there's a lot of inertia behind the current way of doing things.
00:04:15.160 | And, you know, there's another example, which is sort of about questioning assumptions.
00:04:20.320 | And, you know, there's another guy who is looking at something completely different.
00:04:25.120 | He was looking at experiments with rats.
00:04:28.800 | And he wanted to basically, he had a hypothesis.
00:04:32.380 | He wanted to test, like, some weird esoteric thing.
00:04:35.040 | He wanted to test that, like, you know, that rats, he could basically get rats to navigate a maze in a specific way.
00:04:43.760 | He wanted them to go through this corridor of doors, right?
00:04:47.720 | And they would go through a random door.
00:04:49.380 | And they wanted them, he wanted the rats to come out of a door that was three doors along from the one that they went in.
00:04:56.080 | It doesn't matter which one they go in.
00:04:57.420 | He always wanted it to be three.
00:04:58.620 | So he wanted to show that, like, they could actually think and, you know, be able to consistently go, like, three doors along.
00:05:05.320 | And basically, he tried, like, a bunch of stuff.
00:05:10.640 | And what kept happening was whatever door, like, he basically put, like, a piece of food on the door that was three doors along to make them go through that one.
00:05:21.820 | But what ended up happening is they just always went to the previous door.
00:05:25.080 | So if it was, like, you know, door one that they went through and he wanted it to come out of door four, and then he tested it again where they go through door two and they should go out of door five.
00:05:36.100 | They would just still go out of door four.
00:05:38.340 | And he tried a bunch of stuff.
00:05:41.660 | So, like, he was, like, okay, why is this happening?
00:05:43.660 | Like, how did the rats know, like, to go back to that same door?
00:05:47.780 | So he was, like, he very meticulously went through and he made sure that there was no pattern or anything that they could distinguish on the doors.
00:05:55.760 | He painted them all the same way.
00:05:57.340 | He made sure the patterns were all the same.
00:05:59.200 | But still it didn't work.
00:06:00.920 | He thought it might be the smell.
00:06:03.460 | So maybe there was, like, a smell that came from the food.
00:06:06.900 | He tried basically putting chemicals in so that they couldn't smell the food.
00:06:10.560 | And that didn't work either.
00:06:11.980 | And then he thought, okay, maybe it's something to do with the lighting.
00:06:15.880 | You know, like, a human could do this, like, through common sense, just sort of see, like, okay, the lighting is in such and such a way and see the pattern.
00:06:22.820 | And so he covered up the corridors and stuff and made sure that that couldn't be a thing.
00:06:26.560 | And still, you know, the same thing happened.
00:06:29.640 | And eventually what he found out was that the reason they could consistently go to that same door was because they remembered the sounds.
00:06:38.480 | So as they walked along, they remembered the pattern of the sounds in this corridor.
00:06:44.620 | So what he did was he put some sand in there so that they couldn't distinguish the sounds, basically.
00:06:52.120 | Now, from a scientific perspective, this is, like, S-tier science in terms of, you know, really clearly looking at, like, what are all the assumptions I'm making and just, like, systematically eliminating them.
00:07:05.360 | And, you know, this is great, great science.
00:07:08.900 | But, you know, the problem was that the scientific community didn't agree.
00:07:12.540 | So the people that were conducting experiments at the time, they made a lot of these assumptions.
00:07:17.840 | And they were kind of stuck in their ways.
00:07:20.160 | So they, you know, they discard this.
00:07:22.540 | So, you know, none of the, it wasn't cited, you know, this was, this was basically just forgotten.
00:07:27.780 | And so I think, you know, there's sort of this tendency to stick to the current way that things are done.
00:07:38.000 | And even the methodology, if it's spot on, if there's a certain way of doing things, then that kind of, you know, has a lot of inertia behind it.
00:07:47.080 | And, you know, Feynman talked about this in one of his talks called Cogga called Science, in, like, the 70s.
00:07:54.360 | And he had this, like, you know, quote, the first principle is that you must not fool yourself.
00:08:02.040 | And you are the easiest person to fool.
00:08:03.620 | And I think, you know, there's this tendency to sort of make, oversimplify the science.
00:08:13.280 | You know, it is hard to get it right.
00:08:14.920 | And if you're interested that Gwen wrote, like, a blog post all about this and there's some controversy about, like, who this guy was and stuff.
00:08:22.060 | But getting to AI, so, like, you know, there's a very similar thing happened in AI, you know, sort of about questioning assumptions, right?
00:08:33.480 | And, you know, just having a good idea out there is not enough.
00:08:36.720 | So, you know, in 1963, backpropagation was introduced in this paper.
00:08:43.540 | And then it was, you know, reinvented in 1976 in this paper and reinvented again in 1988.
00:08:50.320 | And then, you know, sort of deep convolutional neural networks were introduced here.
00:08:56.760 | And then it was only in 1989 that these two things were combined.
00:08:59.480 | So, you had, like, deep CNNs and backpropagation.
00:09:04.420 | And then it was only, like, three decades later that, you know, CNNs were widely accepted.
00:09:14.240 | There was still, like, this massive skepticism, even though the ideas were out there.
00:09:17.600 | And why was that?
00:09:20.060 | I think, like, a big part of it is sort of, again, being stuck in the way of doing things
00:09:25.420 | and looking at the existing hardware.
00:09:27.140 | So, if you look at, obviously, CPUs, they have this von Neumann bottleneck.
00:09:31.680 | You know, they have really good single-core performance.
00:09:35.180 | But, you know, if you're sort of having to read memory often,
00:09:39.540 | then it's usually bottlenecked by that.
00:09:41.980 | And, you know, you can sort of look at, like, at a systems level,
00:09:46.400 | why did GPUs fix that?
00:09:47.940 | And it sort of changed this, like, ratio of how many, you know,
00:09:52.840 | how many bytes you have to load to how many flops you can execute.
00:09:55.680 | And, you know, it's kind of striking, you know, like,
00:10:00.200 | when you look at the history of this,
00:10:02.840 | this was, like, a groundbreaking paper and a very famous paper,
00:10:05.940 | like, where they trained a network on 1,000 machines,
00:10:09.980 | 16,000 CPU cores, and it took three days.
00:10:12.880 | And then, like, less than a year later,
00:10:15.700 | there was this other paper that got the exact same results,
00:10:18.500 | but it took, like, you know, three machines in a couple of days.
00:10:22.580 | And this was using, you know, hardware acceleration,
00:10:26.080 | like, using GPUs.
00:10:26.980 | So, you know, this gets me to the hardware lottery,
00:10:33.260 | which is essentially this idea introduced by Sarah Hooker in 2020,
00:10:37.000 | which says that, you know,
00:10:39.800 | the best research ideas don't necessarily win.
00:10:42.460 | There's a lot of factors that sort of make it
00:10:44.680 | so that a great idea can be out there,
00:10:46.760 | great science can be being done,
00:10:48.240 | but, you know, it doesn't necessarily get adopted and accepted.
00:10:52.120 | There's a recent example of this,
00:10:56.660 | which I think is quite interesting,
00:10:57.860 | that I, you know, preparing for this,
00:11:00.760 | I had the realization that, you know,
00:11:04.480 | LLMs are kind of a forcing,
00:11:06.020 | like, they're kind of creating inertia as well,
00:11:08.980 | because the things that they're good at
00:11:11.060 | are the things that people will work on.
00:11:12.400 | So if they're good at generating Python code,
00:11:14.460 | then more people will write Python code.
00:11:15.740 | And it's this feedback loop of, you know,
00:11:19.080 | more people use it,
00:11:20.460 | and then the LLMs get better at that thing.
00:11:21.920 | And, you know, now,
00:11:22.800 | if you wanted to come out with a new programming language,
00:11:24.460 | maybe it is a really good idea.
00:11:25.780 | You wouldn't,
00:11:27.140 | it's much harder to get adoption
00:11:29.420 | if the tooling is, like, way worse
00:11:31.000 | if the LLMs don't have good support for it.
00:11:32.920 | And there's this paper that, you know,
00:11:34.520 | this result, like, basically,
00:11:37.160 | they made a table of, like,
00:11:39.600 | all the tasks that, like,
00:11:41.960 | different languages are best at,
00:11:42.940 | and basically everything was Python.
00:11:44.140 | And, yeah, like,
00:11:47.320 | what is it?
00:11:49.120 | 90 to 97% of all problems,
00:11:51.260 | yeah, Python was the best thing.
00:11:53.900 | So what if we did question our assumptions?
00:11:59.560 | So that brings me to sort of
00:12:02.280 | what I'm working on with EXO.
00:12:03.460 | So what we're doing is
00:12:06.460 | we're building a orchestration layer
00:12:10.000 | for, you know,
00:12:13.980 | AI that runs on different hardware targets.
00:12:16.780 | And it's sort of,
00:12:18.260 | we're sitting at this layer
00:12:19.240 | that I haven't seen much right now,
00:12:22.060 | and it's kind of a pain point
00:12:23.660 | in terms of just having this, like,
00:12:26.280 | reliable thing that can orchestrate
00:12:28.180 | a lot of different kinds of devices
00:12:30.980 | with different connections
00:12:32.400 | in, like, this ad hoc kind of mesh network.
00:12:34.860 | And just to give you, like,
00:12:37.000 | some idea of, like,
00:12:38.320 | kind of the kind of things
00:12:39.280 | that we're doing
00:12:41.080 | and the kind of, you know,
00:12:43.820 | the solution space that we sit in,
00:12:45.220 | like, you know,
00:12:46.200 | everything in EXO is modeled
00:12:48.180 | as a causally consistent set of events.
00:12:52.780 | So there's essentially, like,
00:12:55.700 | an ordering on everything
00:12:57.280 | that happens across
00:12:58.040 | the whole distributed system.
00:12:59.160 | If there's all these things going on,
00:13:00.540 | it's really hard to reason about, like,
00:13:01.840 | sort of, if you want to move
00:13:03.180 | a KV cache around,
00:13:04.160 | like, how do you know
00:13:05.000 | that it happened successfully?
00:13:05.920 | How do you know then
00:13:06.600 | if something depends on that?
00:13:08.040 | So you build this sort of causal graph,
00:13:11.000 | and, you know,
00:13:11.600 | you can then reason about the system
00:13:13.780 | and get some guarantees about,
00:13:15.760 | you know, sort of where the data is
00:13:19.900 | and, you know, what's going on.
00:13:21.580 | So just to give you a quick example
00:13:23.100 | to put this into, like,
00:13:24.060 | you know, more practical terms,
00:13:26.780 | like, what does this enable?
00:13:27.740 | Spark is coming out soon.
00:13:30.200 | I'm still waiting,
00:13:31.620 | but it's been delayed a few times,
00:13:33.420 | I think, but hopefully soon.
00:13:34.860 | And this is NVIDIA's new, like,
00:13:37.840 | consumer thing,
00:13:38.440 | if you haven't seen that.
00:13:39.460 | And it's, like,
00:13:40.060 | it has a lot of flops.
00:13:40.840 | It's pretty good for the cost.
00:13:42.560 | But the memory bandwidth
00:13:44.860 | is kind of lacking.
00:13:45.540 | It doesn't have that much memory.
00:13:46.580 | Like, Studio has a lot more memory bandwidth,
00:13:48.760 | but a lot less flops.
00:13:49.700 | So if you look at sort of just
00:13:51.440 | if you wanted to generate
00:13:53.760 | with an LLM,
00:13:55.400 | like, there's two phases, right?
00:13:56.780 | There's the pre-fill phase,
00:13:57.980 | which is compute bound,
00:13:58.840 | and the generation phase,
00:13:59.900 | which is memory bandwidth bound.
00:14:01.320 | As far as I know,
00:14:04.120 | there isn't really anything
00:14:04.900 | that can nicely do this reliably,
00:14:06.460 | where you would have these, like,
00:14:07.940 | different sets of devices,
00:14:09.160 | and then figure out
00:14:10.360 | the best way
00:14:10.920 | to, like, run this whole workload
00:14:12.660 | across, you know,
00:14:14.840 | all the device
00:14:15.560 | that you have available.
00:14:16.280 | But this is possible now
00:14:18.200 | with EXO.
00:14:18.880 | And just another example,
00:14:21.560 | this is some research
00:14:22.680 | that we're working on.
00:14:23.840 | It's more on the training side.
00:14:26.380 | So this will be out pretty soon.
00:14:32.220 | It will be made public.
00:14:33.500 | Essentially,
00:14:35.020 | we're kind of questioning
00:14:37.180 | all these assumptions
00:14:37.800 | about, like, you know,
00:14:38.620 | what hardware is best to run on.
00:14:39.800 | If you look at Apple Silicon,
00:14:40.860 | it has a lot more memory,
00:14:41.940 | obviously,
00:14:42.420 | but it's a lot more expensive
00:14:43.780 | per flop.
00:14:44.220 | But what if you could use
00:14:47.360 | that memory for something useful
00:14:49.040 | to make training more efficient?
00:14:51.340 | And there's been this whole area
00:14:53.060 | of research of, like,
00:14:53.860 | sort of second-order methods
00:14:54.900 | and, you know,
00:14:55.920 | different ways of making training
00:14:58.240 | more efficient,
00:14:58.700 | but a lot of them have been discarded
00:15:00.460 | because of the memory requirements.
00:15:02.160 | And so what we've done
00:15:05.040 | is we've come out with,
00:15:05.820 | we're going to come out with this,
00:15:06.940 | like, new optimizer,
00:15:08.120 | which is essentially, like,
00:15:09.840 | two times more efficient
00:15:11.500 | per flop than Atom,
00:15:16.180 | but it uses a lot more memory.
00:15:17.800 | So if you look at this sort of ratio
00:15:19.180 | of, like, memory to flops
00:15:20.220 | of Apple Silicon,
00:15:20.960 | it's probably, like, 20x.
00:15:22.140 | It's around 20x,
00:15:24.320 | depending on which one you get.
00:15:25.260 | NVIDIA.
00:15:26.280 | So you've got all that spare memory
00:15:27.840 | that you can, you know,
00:15:29.020 | think about, okay,
00:15:29.660 | what does the solution space look like
00:15:31.780 | if we make use of that memory?
00:15:33.160 | And, yeah,
00:15:35.840 | we're buying a lot of mags.
00:15:37.540 | We're, like, trying a lot of stuff.
00:15:38.680 | This is the first batch,
00:15:41.080 | but there's going to be another batch.
00:15:42.580 | And, yeah,
00:15:45.480 | we're just running a lot
00:15:46.420 | of experiments at scale.
00:15:47.780 | Like, there's a lot of stuff
00:15:48.480 | in that paper
00:15:48.940 | that's really interesting
00:15:49.720 | in terms of, like,
00:15:50.620 | not many people have tried
00:15:51.720 | doing large training runs
00:15:54.380 | on Apple Silicon.
00:15:54.860 | I mean, nobody.
00:15:55.440 | Nobody has.
00:15:56.080 | Even, I mean,
00:15:57.780 | talking to people at Apple,
00:15:58.760 | like, they're also surprised
00:16:00.100 | at these results.
00:16:01.020 | Like, they're not really aware
00:16:02.900 | of what the hardware is capable of.
00:16:06.500 | Just a final thing as well
00:16:09.060 | around sort of, you know,
00:16:10.880 | going back to the process
00:16:15.020 | of doing science, right?
00:16:16.400 | Like, if anything,
00:16:17.860 | like, publishing kind of,
00:16:19.320 | like, publishing results
00:16:22.360 | that are maybe not the best,
00:16:23.400 | I think, is something
00:16:24.680 | that we need to, like,
00:16:25.580 | more normalize.
00:16:26.460 | And, you know,
00:16:27.120 | we're sort of,
00:16:27.980 | we're doing a lot
00:16:30.560 | to make all the data accessible,
00:16:34.420 | whether it's good or bad.
00:16:35.600 | like, you know,
00:16:36.220 | right now you can go
00:16:37.040 | to benchmarks.exelabs.net
00:16:38.540 | and there's a lot
00:16:39.620 | of configurations there
00:16:40.480 | that are just really bad.
00:16:41.420 | But, you know,
00:16:43.640 | I think having that data
00:16:46.600 | out there
00:16:47.100 | and publishing stuff
00:16:47.980 | that isn't necessarily,
00:16:49.460 | like, the best thing
00:16:51.140 | is super important.
00:16:53.420 | So we've got this,
00:16:55.360 | these benchmarks.
00:16:57.700 | Part of that is also running,
00:16:59.320 | like, these Macs continuously
00:17:00.960 | in CI
00:17:01.380 | and different devices.
00:17:02.460 | We'll probably end up
00:17:03.200 | with, like,
00:17:03.560 | at least one
00:17:04.940 | of pretty much,
00:17:05.560 | like, any device
00:17:06.700 | that you can reasonably run
00:17:08.460 | an AI workload on
00:17:10.900 | and just have those
00:17:12.180 | continuously pushing out
00:17:13.160 | to the benchmarks.
00:17:13.780 | Yeah, so we're coming out
00:17:18.420 | with a big release
00:17:19.100 | in the end of this week.
00:17:22.840 | So that will sort of,
00:17:26.320 | you know,
00:17:26.640 | be this orchestration,
00:17:29.880 | this orchestration layer
00:17:32.760 | that I talked about.
00:17:34.280 | And with that will come
00:17:35.580 | a lot of this sort of tooling
00:17:36.720 | around it.
00:17:37.840 | So the benchmarks website,
00:17:39.040 | a lot of stuff around,
00:17:41.540 | you know,
00:17:42.060 | being able to test
00:17:43.980 | different algorithms
00:17:44.800 | that run on different devices.
00:17:46.480 | So we kind of have
00:17:49.340 | Exogym, for example,
00:17:50.540 | which is a way
00:17:51.500 | to run experiments
00:17:52.460 | on, you know,
00:17:54.620 | if you don't have
00:17:55.620 | 16 Macs like this,
00:17:57.740 | then, you know,
00:17:58.860 | being able to actually
00:17:59.940 | just run them locally
00:18:01.040 | quickly
00:18:02.280 | and test different
00:18:03.120 | distributed algorithms,
00:18:04.060 | that would be part
00:18:05.840 | of those releases.
00:18:06.880 | Yep, that's it.
00:18:12.560 | We've actually got
00:18:23.280 | like four minutes,
00:18:24.380 | so we're ahead of schedule.
00:18:25.080 | Anyone have questions?
00:18:25.740 | Don't get mad
00:18:28.400 | if I don't pick you,
00:18:28.980 | but I'll tell you,
00:18:29.640 | we'll do these three right here.
00:18:30.960 | Do you guys do
00:18:32.880 | communication parameters
00:18:34.880 | like all the issues
00:18:35.760 | that are at a higher level?
00:18:36.920 | Right now,
00:18:41.800 | like right now,
00:18:43.160 | it's a bit of a,
00:18:44.200 | like ideally not.
00:18:45.620 | Like ideally,
00:18:46.920 | we want to sit at
00:18:48.140 | like this,
00:18:48.760 | like higher up
00:18:51.200 | in the stack
00:18:51.720 | and just focus
00:18:52.480 | on this orchestration piece
00:18:53.440 | because I think
00:18:54.140 | that's where there's
00:18:54.740 | like not much.
00:18:56.120 | And, you know,
00:18:57.700 | the MLX team,
00:18:58.700 | for example,
00:18:59.060 | I've done a lot of work
00:19:00.020 | on like MLX distributed,
00:19:01.340 | which is really good.
00:19:02.160 | It's just,
00:19:03.120 | it's really fast,
00:19:04.080 | but it's like kind of brittle.
00:19:05.080 | So if you lose a connection,
00:19:06.300 | it would just break completely.
00:19:07.240 | Like you just,
00:19:08.000 | you get errors everywhere.
00:19:08.900 | And it's super specialized,
00:19:10.580 | obviously,
00:19:11.040 | to their configuration.
00:19:13.060 | you know,
00:19:15.940 | our hope is that
00:19:16.640 | there'll be a lot of work
00:19:17.700 | on that layer
00:19:18.320 | that is just done
00:19:19.760 | by all these frameworks
00:19:20.960 | like MLX and VLM.
00:19:22.200 | And then,
00:19:23.440 | and then we can sit on top.
00:19:24.460 | But right now,
00:19:24.980 | we're doing a lot of sort of
00:19:25.940 | just work with
00:19:26.760 | like the MLX team,
00:19:27.720 | for example,
00:19:28.120 | and just,
00:19:28.620 | you know,
00:19:29.420 | building out
00:19:30.900 | those primitives.
00:19:31.500 | Yeah,
00:19:42.420 | it's the orange one.
00:19:43.500 | how difficult would it be
00:19:46.480 | to scale up
00:19:46.900 | like the AMDB part
00:19:48.220 | because it's going to be
00:19:49.100 | like,
00:19:49.420 | it's going to be
00:19:50.100 | to be able to do that.
00:19:50.800 | Or is it optimistically
00:19:54.020 | that's like smaller
00:19:55.160 | than that?
00:19:56.040 | Yeah,
00:19:59.700 | so a lot of this stuff is,
00:20:00.800 | the absolute numbers
00:20:02.480 | don't matter too much.
00:20:03.260 | It's more about the ratios.
00:20:04.980 | AMD has way more flops
00:20:06.200 | than Apple Silicon would have.
00:20:07.600 | the ratio is probably
00:20:08.380 | going to be,
00:20:08.860 | you know,
00:20:09.620 | still really high.
00:20:10.520 | And then,
00:20:12.720 | yeah,
00:20:13.100 | it's just sort of,
00:20:14.220 | you know,
00:20:14.560 | what do you end up
00:20:15.100 | being bottlenecked by?
00:20:15.900 | Obviously,
00:20:16.320 | they have a lot more
00:20:16.860 | network bandwidth,
00:20:17.380 | but again,
00:20:17.780 | but it's relative
00:20:18.900 | to the flops,
00:20:19.440 | right?
00:20:20.260 | if you look at the ratio
00:20:21.560 | of network bandwidth
00:20:22.260 | to flops of Apple Silicon,
00:20:23.300 | it's actually better
00:20:23.960 | than AMD.
00:20:26.880 | I'm not sure,
00:20:27.440 | like I would need
00:20:27.880 | to look at the specific
00:20:28.860 | device that you're
00:20:30.380 | talking about,
00:20:30.760 | but maybe.
00:20:34.600 | I have one more
00:20:35.260 | short answer.
00:20:35.760 | How would you
00:20:37.300 | yeah,
00:20:45.700 | we're not working
00:20:48.220 | on that.
00:20:48.740 | I think there's
00:20:49.540 | other projects
00:20:51.440 | that might be working
00:20:52.080 | on that kind of thing,
00:20:52.820 | like maybe
00:20:53.620 | Prime Intellect,
00:20:54.860 | Hyperbolic,
00:20:58.300 | perhaps,
00:20:58.680 | as well.
00:20:59.080 | A few of them
00:21:01.900 | in this room.
00:21:02.380 | But maybe
00:21:04.720 | there's some synergy
00:21:05.540 | that I don't know,
00:21:06.080 | like for now,
00:21:06.900 | at least there just
00:21:07.660 | seems to be
00:21:07.960 | a lot of work
00:21:08.380 | to be done
00:21:08.720 | on these private
00:21:10.340 | clusters where
00:21:11.080 | you have a fully
00:21:11.900 | trusted setup
00:21:12.480 | and you don't
00:21:15.280 | really care
00:21:15.660 | about all the
00:21:17.200 | hard problems
00:21:18.080 | that come
00:21:18.440 | with untrusted
00:21:20.680 | public networks.
00:21:22.120 | things.
00:21:22.260 | have a great day.
00:21:23.260 | have a great day.
00:21:24.260 | We'll see you next time.