back to index

Making AI accessible with Andrej Karpathy and Stephanie Zhan


Whisper Transcript | Transcript Only Page

00:00:00.000 | [MUSIC PLAYING]
00:00:03.120 | I'm thrilled to introduce our next and final speaker,
00:00:05.560 | Andrej Karpathy.
00:00:06.580 | I think Karpathy probably needs no introduction.
00:00:08.760 | Most of us have probably watched his YouTube videos at length.
00:00:12.840 | But he's renowned for his research in deep learning.
00:00:17.620 | He designed the first deep learning class at Stanford,
00:00:20.800 | was part of the founding team at OpenAI,
00:00:23.440 | led the computer vision team at Tesla,
00:00:25.640 | and is now a mystery man again, now
00:00:27.480 | that he has just left OpenAI.
00:00:28.800 | So we're very lucky to have you here.
00:00:30.360 | I think, Andrej, you've been such a dream speaker.
00:00:32.520 | And so we're excited to have you and Stephanie close out
00:00:34.940 | the day.
00:00:35.800 | Thank you.
00:00:37.280 | [APPLAUSE]
00:00:42.240 | Andrej's first reaction as we walked up here
00:00:44.320 | was, oh my god, to his picture.
00:00:46.600 | It's like a very intimidating photo.
00:00:48.140 | I don't know what year it was taken, but he's impressed.
00:00:51.680 | OK, amazing.
00:00:53.000 | Andrej, thank you so much for joining us today,
00:00:55.440 | and welcome back.
00:00:57.080 | Thank you.
00:00:59.480 | Fun fact that most people don't actually know--
00:01:02.360 | how many folks here know where OpenAI's original office was?
00:01:07.960 | It's amazing.
00:01:10.480 | Nick?
00:01:11.840 | I'm going to guess right here.
00:01:13.200 | Right here.
00:01:13.840 | Right here on the opposite side of our San Francisco office,
00:01:18.440 | where actually many of you guys were just in huddles.
00:01:20.560 | So this is fun for us, because it brings us back
00:01:22.560 | to our roots, back when I first started at Sequoia,
00:01:25.000 | and when Andrej first started co-founding OpenAI.
00:01:29.040 | Andrej, in addition to living out the Willy Wonka,
00:01:32.120 | working atop a chocolate factory dream,
00:01:34.900 | what were some of your favorite moments working from here?
00:01:37.280 | Yeah, so OpenAI was right there.
00:01:39.640 | And this was the first office after, I guess,
00:01:41.960 | Greg's apartment, which maybe doesn't count.
00:01:44.960 | And so, yeah, we spent maybe two years here.
00:01:46.960 | And the chocolate factory was just downstairs,
00:01:48.840 | so it always smelled really nice.
00:01:50.640 | And yeah, I guess the team was 10, 20 plus.
00:01:55.360 | And yeah, we had a few very fun episodes here.
00:01:58.960 | One of them was alluded to by Jensen at GTC that happened
00:02:03.760 | just yesterday or two days ago.
00:02:05.800 | So Jensen was describing how he brought the first DGX
00:02:09.280 | and how he delivered it to OpenAI.
00:02:10.820 | So that happened right there.
00:02:12.320 | So that's where we all signed it.
00:02:13.700 | It's in the room over there.
00:02:15.480 | So Andrej needs no introduction, but I
00:02:17.120 | wanted to give a little bit of backstory on some
00:02:19.120 | of his journey to date.
00:02:20.920 | As Sonia had introduced, he was trained by Jeff Hinton and then
00:02:24.680 | Fei-Fei.
00:02:26.240 | His first claim to fame was his deep learning course
00:02:28.400 | at Stanford.
00:02:29.840 | He co-founded OpenAI back in 2015.
00:02:32.440 | In 2017, he was poached by Elon.
00:02:35.040 | I remember this very, very clearly.
00:02:37.040 | For folks who don't remember the context then,
00:02:39.960 | Elon had just transitioned through six different autopilot
00:02:43.040 | leaders, each of whom lasted six months each.
00:02:46.040 | And I remember when Andrej took this job,
00:02:47.980 | I thought, congratulations and good luck.
00:02:50.480 | Not too long after that, he went back to OpenAI
00:02:56.800 | and has been there for the last year.
00:02:59.000 | Now, unlike all the rest of us today,
00:03:01.880 | he is basking in the ultimate glory of freedom
00:03:05.720 | in all time and responsibility.
00:03:08.320 | And so we're really excited to see what you have to share
00:03:10.840 | today.
00:03:11.720 | A few things that I appreciate the most from Andrej
00:03:13.840 | are that he is an incredible, fascinating, futurist thinker.
00:03:17.920 | He is a relentless optimist.
00:03:20.460 | And he's a very practical builder.
00:03:22.260 | And so I think he'll share some of his insights around that
00:03:24.760 | today.
00:03:25.260 | To kick things off, AGI, even seven years ago,
00:03:29.180 | seemed like an incredibly impossible task
00:03:32.180 | to achieve, even in the span of our lifetimes.
00:03:34.820 | Now it seems within sight.
00:03:36.900 | What is your view of the future over the next N years?
00:03:39.660 | Yes, I think you're right.
00:03:43.260 | I think a few years ago, I sort of
00:03:44.900 | felt like AGI was--
00:03:48.920 | it wasn't clear how it was going to happen.
00:03:50.700 | It was very sort of academic.
00:03:51.900 | And you would think about different approaches.
00:03:53.340 | And now I think it's very clear.
00:03:54.380 | And there's a lot of space.
00:03:55.500 | And everyone is trying to fill it.
00:03:56.900 | And so there's a lot of optimization.
00:04:01.380 | And I think, roughly speaking, the way things are happening
00:04:04.020 | is everyone is trying to build what I refer to
00:04:07.740 | as kind of like this LLM OS.
00:04:10.820 | And basically, I like to think of it as an operating system.
00:04:13.860 | You have to get a bunch of peripherals
00:04:16.020 | that you plug into this new CPU or something like that.
00:04:18.460 | The peripherals are, of course, like text, images, audio,
00:04:21.380 | and all the modalities.
00:04:22.540 | And then you have a CPU, which is the LLM transformer itself.
00:04:25.720 | And then it's also connected to all the software 1.0
00:04:28.020 | infrastructure that we've already built up for ourselves.
00:04:30.460 | And so I think everyone is kind of trying
00:04:32.180 | to build something like that and then make it available
00:04:37.620 | as something that's customizable to all the different nooks
00:04:40.120 | and crannies of the economy.
00:04:41.380 | And so I think that's kind of roughly what everyone
00:04:43.220 | is trying to build out and what we sort of also
00:04:46.420 | heard about earlier today.
00:04:48.380 | So I think that's roughly where it's
00:04:50.540 | headed is we can bring up and down these relatively
00:04:56.660 | self-contained agents that we can give high-level tasks to
00:04:59.180 | and specialize in various ways.
00:05:00.540 | So yeah, I think it's going to be
00:05:01.920 | very interesting and exciting.
00:05:03.300 | And it's not just one agent.
00:05:04.500 | It's many agents.
00:05:05.180 | And what does that look like?
00:05:06.700 | And if that view of the future is true,
00:05:08.460 | how should we all be living our lives differently?
00:05:11.500 | I don't know.
00:05:15.420 | I guess we have to try to build it, influence it,
00:05:17.420 | make sure it's good, and just try to make
00:05:21.220 | sure it turns out well.
00:05:23.220 | So now that you're a free, independent agent,
00:05:25.980 | I want to address the elephant in the room, which
00:05:27.980 | is that open AI is dominating the ecosystem.
00:05:32.940 | And most of our audience here today
00:05:34.900 | are founders who are trying to carve out a little niche,
00:05:38.100 | praying that open AI doesn't take them out overnight.
00:05:41.260 | Where do you think opportunities exist for other players
00:05:45.180 | to build new independent companies
00:05:47.060 | versus what areas do you think open AI will continue
00:05:49.860 | to dominate, even as its ambition grows?
00:05:53.460 | Yes, so my high-level impression is basically
00:05:55.340 | open AI is trying to build out this LLMOS OS.
00:05:57.420 | And I think, as we heard earlier today,
00:06:02.060 | it's trying to develop this platform on top
00:06:03.860 | of which you can position different companies
00:06:05.300 | in different verticals.
00:06:06.300 | Now, I think the OS analogy is also really interesting,
00:06:08.180 | because when you look at something like Windows
00:06:10.180 | or something like that-- these are also operating systems--
00:06:12.180 | they come with a few default apps,
00:06:13.740 | like a browser comes with Windows, right?
00:06:15.620 | You can use the Edge browser.
00:06:16.980 | And so I think, in the same way, open AI or any
00:06:19.140 | of the other companies might come up with a few default
00:06:21.100 | apps, quote unquote.
00:06:21.980 | But that doesn't mean that you can
00:06:22.860 | have different browsers that are running on it,
00:06:24.780 | just like you can have different chat agents running
00:06:27.660 | on that infrastructure.
00:06:29.180 | And so there will be a few default apps,
00:06:31.020 | but there will also be, potentially,
00:06:32.520 | a vibrant ecosystem of all kinds of apps
00:06:34.580 | that are fine-tuned to all the different nooks
00:06:35.740 | and crannies of the economy.
00:06:36.980 | And I really like the analogy of the early iPhone apps
00:06:40.380 | and what they looked like.
00:06:41.500 | And they were all kind of like jokes.
00:06:43.080 | And it took time for that to develop.
00:06:44.980 | And I think, absolutely, I'd agree
00:06:46.620 | that we're going through the same thing right now.
00:06:48.180 | People are trying to figure out, what is this thing good at?
00:06:50.060 | What is it not good at?
00:06:51.620 | How do I work it?
00:06:52.580 | How do I program with it?
00:06:53.620 | How do I debug it?
00:06:54.420 | How do I just actually get it to perform real tasks?
00:06:59.180 | And what kind of oversight-- because it's quite autonomous,
00:07:01.660 | but not fully autonomous.
00:07:02.500 | So what does the oversight look like?
00:07:03.700 | What does the evaluation look like?
00:07:04.940 | So there's many things to think through and just
00:07:06.940 | to understand the psychology of it.
00:07:08.640 | And I think that's what's going to take some time to figure out
00:07:11.220 | exactly how to work with this infrastructure.
00:07:13.500 | So I think we'll see that over the next few years.
00:07:16.660 | So the race is on right now with LLMs--
00:07:18.800 | OpenAI, Anthropic, Mistral, Lama, Gemini--
00:07:23.500 | the whole ecosystem of open source models,
00:07:26.220 | now a whole long tail of small models.
00:07:28.700 | How do you foresee the future of the ecosystem playing out?
00:07:32.620 | So again, I think the operating systems analogy
00:07:35.880 | is interesting, because we have, say--
00:07:37.540 | we have basically an oligopoly of a few proprietary systems,
00:07:40.260 | like, say, Windows, Mac OS, et cetera.
00:07:42.460 | And then we also have Linux.
00:07:44.340 | And Linux has an infinity of distributions.
00:07:47.380 | And so I think maybe it's going to look something like that.
00:07:49.540 | I also think we have to be careful with the naming,
00:07:51.620 | because a lot of the ones that you listed,
00:07:53.100 | like Lama, Mistral, and so on, I wouldn't actually
00:07:54.660 | say they're open source, right?
00:07:55.960 | And so it's kind of like tossing over a binary
00:07:59.100 | for an operating system.
00:08:00.380 | Like, you can kind of work with it, and it's useful,
00:08:04.340 | but it's not fully useful, right?
00:08:06.740 | And there are a number of what I would say
00:08:11.140 | is fully open source LLMs.
00:08:14.020 | So there's Pythia models, LLM360, Olmo, et cetera.
00:08:20.060 | And they're fully releasing the entire infrastructure that's
00:08:22.560 | required to compile the operating system,
00:08:24.840 | to train the model from the data,
00:08:26.180 | to gather the data, et cetera.
00:08:28.260 | And so when you're just given a binary,
00:08:30.100 | it's much better, of course, because you can fine-tune
00:08:32.520 | the model, which is useful.
00:08:33.740 | But also, I think it's subtle, but you can't fully
00:08:36.380 | fine-tune the model, because the more you fine-tune the model,
00:08:39.260 | the more it's going to start regressing on everything else.
00:08:42.240 | And so what you actually really want to do, for example,
00:08:44.620 | if you want to add capability, and not regress
00:08:47.220 | the other capabilities, you may want
00:08:48.660 | to train on some kind of like a mixture of the previous data
00:08:52.820 | set distribution and the new data set distribution.
00:08:54.980 | Because you don't want to regress the old distribution,
00:08:56.460 | you just want to add knowledge.
00:08:57.780 | And if you're just given the weights,
00:08:59.100 | you can't do that, actually.
00:09:00.420 | You need the training loop, you need the data set, et cetera.
00:09:02.660 | So you are actually constrained in how
00:09:03.780 | you can work with these models.
00:09:05.120 | And again, I think it's definitely helpful,
00:09:08.260 | but I think we need slightly better language for it, almost.
00:09:12.140 | So there's open weights models, open source models,
00:09:14.700 | and then proprietary models, I guess.
00:09:17.020 | And that might be the ecosystem.
00:09:20.340 | And yeah, probably it's going to look very similar to the ones
00:09:22.840 | that we have today.
00:09:24.300 | And hopefully you'll continue to help build some of that out.
00:09:27.760 | So I'd love to address the other elephant in the room, which
00:09:30.120 | is scale.
00:09:31.280 | Simplistically, it seems like scale is all that matters.
00:09:33.800 | Scale of data, scale of compute, and therefore
00:09:36.480 | the large research labs, large tech giants
00:09:38.640 | have an immense advantage today.
00:09:41.200 | What is your view of that?
00:09:42.520 | And is that all that matters?
00:09:44.480 | And if not, what else does?
00:09:47.880 | So I would say scale is definitely number one.
00:09:51.440 | I do think there are details there to get right.
00:09:53.400 | And I think a lot also goes into the data set preparation
00:09:57.760 | and so on, making it very good and clean, et cetera.
00:10:00.080 | That matters a lot.
00:10:01.120 | These are all compute efficiency gains that you can get.
00:10:04.100 | So there's the data, the algorithms,
00:10:05.600 | and then, of course, the training of the model
00:10:08.440 | and making it really large.
00:10:09.560 | So I think scale will be the primary determining factor.
00:10:11.560 | It's like the first principal component of things, for sure.
00:10:14.280 | But there are many of the other things
00:10:16.600 | that you need to get right.
00:10:19.160 | So it's almost like the scale sets some kind of a speed
00:10:21.420 | limit, almost.
00:10:22.980 | But you do need some of the other things.
00:10:24.680 | But it's like, if you don't have the scale,
00:10:26.120 | then you fundamentally just can't
00:10:27.420 | train some of these massive models
00:10:28.960 | if you are going to be training models.
00:10:31.040 | If you're just going to be doing fine-tuning and so on,
00:10:33.380 | then I think maybe less scale is necessary.
00:10:36.240 | But we haven't really seen that just yet fully play out.
00:10:39.160 | And can you share more about some of the ingredients
00:10:41.280 | that you think also matter, maybe lower in priority
00:10:44.040 | behind scale?
00:10:47.020 | Yeah, so the first thing, I think,
00:10:49.280 | is you can't just train these models.
00:10:51.760 | If you're just given the money and the scale,
00:10:53.720 | it's actually still really hard to build these models.
00:10:55.960 | And part of it is that the infrastructure is still so new.
00:10:58.320 | And it's still being developed and not quite there.
00:10:59.960 | But training these models at scale is extremely difficult.
00:11:02.880 | And it's a very complicated distributed optimization
00:11:05.520 | problem.
00:11:06.160 | And there's actually-- the talent for this
00:11:07.900 | is fairly scarce right now.
00:11:09.560 | And it just basically turns into this insane thing running
00:11:13.560 | on tens of thousands of GPUs.
00:11:15.080 | All of them are failing at random
00:11:16.540 | at different points in time.
00:11:17.460 | And so instrumenting that and getting that to work
00:11:19.640 | is actually an extremely difficult challenge.
00:11:22.120 | GPUs were not intended for 10,000 GPU workloads
00:11:26.040 | until very recently.
00:11:27.480 | And so I think a lot of the infrastructure
00:11:30.320 | is creaking under that pressure.
00:11:32.320 | And we need to work through that.
00:11:34.040 | But right now, if you're just giving someone
00:11:35.280 | a ton of money or a ton of scale or GPUs,
00:11:36.880 | it's not obvious to me that they can just
00:11:38.080 | produce one of these models, which
00:11:39.500 | is why it's not just about scale.
00:11:42.560 | You actually need a ton of expertise,
00:11:44.680 | both on the infrastructure side, the algorithm side,
00:11:48.100 | and then the data side, and being careful with that.
00:11:50.500 | So I think those are the major components.
00:11:52.980 | The ecosystem is moving so quickly.
00:11:55.500 | Even some of the challenges we thought existed a year ago
00:11:58.180 | are being solved more and more today--
00:12:00.680 | hallucinations, context windows, multimodal capabilities,
00:12:04.500 | inference getting better, faster, cheaper.
00:12:07.940 | What are the LLM research challenges today
00:12:11.620 | that keep you up at night?
00:12:12.780 | What do you think are meaty enough problems, but also
00:12:15.360 | solvable problems, that we can continue to go after?
00:12:19.600 | So I would say on the algorithm side,
00:12:21.220 | one thing I'm thinking about quite a bit
00:12:22.920 | is this distinct split between diffusion models
00:12:26.640 | and autoregressive models.
00:12:27.800 | They're both ways of representing probability
00:12:29.320 | distributions.
00:12:30.200 | And it just turns out that different modalities
00:12:32.120 | are apparently a good fit for one of the two.
00:12:34.840 | I think that there's probably some space to unify them
00:12:37.360 | or to connect them in some way.
00:12:40.000 | And also, get some best of both worlds,
00:12:44.080 | or figure out how we can get a hybrid architecture, and so on.
00:12:48.160 | So it's just odd to me that we have
00:12:49.920 | two separate points in the space of models.
00:12:52.720 | And they're both extremely good.
00:12:54.280 | And it just feels wrong to me that there's nothing in between.
00:12:57.040 | So I think we'll see that carved out.
00:12:58.700 | And I think there are interesting problems there.
00:13:00.840 | And then the other thing that maybe I would point to
00:13:03.000 | is there's still a massive gap in just
00:13:08.080 | the energetic efficiency of running all this stuff.
00:13:10.760 | So my brain is 20 watts, roughly.
00:13:13.280 | Jensen was just talking at GTC about the massive supercomputers
00:13:16.220 | that they're going to be building now.
00:13:17.800 | These are-- the numbers are in mega megawatts, right?
00:13:21.420 | And so maybe you don't need all that to run a brain.
00:13:23.680 | I don't know how much you need exactly.
00:13:25.480 | But I think it's safe to say we're probably
00:13:27.320 | off by a factor of 1,000 to a million somewhere there,
00:13:29.840 | in terms of the efficiency of running these models.
00:13:32.820 | And I think part of it is just because the computers we've
00:13:35.240 | designed, of course, are just not
00:13:36.620 | a good fit for this workload.
00:13:38.960 | And I think NVIDIA GPUs are a good step in that direction,
00:13:44.240 | in terms of you need extremely high parallelism.
00:13:46.520 | We don't actually care about sequential computation that
00:13:48.860 | is data-dependent in some way.
00:13:50.840 | We just have these--
00:13:52.480 | we just need to blast the same algorithm
00:13:54.780 | across many different array elements,
00:13:57.960 | or something you can think about it that way.
00:13:59.840 | So I would say number one is just
00:14:01.720 | adapting the computer architecture to the new data
00:14:04.480 | workflows.
00:14:05.020 | Number two is pushing on a few things that we're currently
00:14:07.480 | seeing improvements on.
00:14:08.640 | So number one, maybe, is precision.
00:14:10.600 | We're seeing precision come down from what
00:14:12.280 | originally was those 64-bit for double.
00:14:15.180 | We're now down to--
00:14:16.560 | I don't know what it is--
00:14:17.520 | 4, 5, 6, or even 1.58, depending on which papers you read.
00:14:20.920 | And so I think precision is one big lever
00:14:22.960 | of getting a handle on this.
00:14:25.160 | And then the second one, of course, is sparsity.
00:14:27.360 | So that's also another big delta, I would say.
00:14:29.520 | Your brain is not always fully activated.
00:14:31.440 | And so sparsity, I think, is another big lever.
00:14:33.440 | But then the last lever, I also feel
00:14:34.940 | like just the von Neumann architecture of computers
00:14:37.200 | and how they build, where you're shuttling data in and out
00:14:39.080 | and doing a ton of data movement between memory
00:14:41.000 | and the cores that are doing all the compute.
00:14:42.880 | This is all broken as well, and it's not how your brain works.
00:14:45.400 | And that's why it's so efficient.
00:14:46.720 | And so I think it should be a very exciting time
00:14:48.440 | in computer architecture.
00:14:49.480 | I'm not a computer architect.
00:14:50.680 | But I think it seems like we're off
00:14:52.800 | by a factor of a million, 1,000 to a million,
00:14:54.680 | something like that.
00:14:55.560 | And there should be really exciting innovations there
00:14:59.960 | that bring that down.
00:15:02.200 | I think there are at least a few builders in the audience
00:15:04.540 | working on this problem.
00:15:06.600 | OK, switching gears a little bit,
00:15:08.280 | you've worked alongside many of the greats of our generation--
00:15:12.480 | Sam, Greg from OpenAI, and the rest of the OpenAI team,
00:15:15.540 | Elon Musk.
00:15:17.100 | Who here knows the joke about the rowing team,
00:15:20.680 | the American team versus the Japanese team?
00:15:25.080 | OK, great.
00:15:25.640 | So this will be a good one.
00:15:27.080 | Elon shared this at our last Base Camp.
00:15:28.920 | And I think it reflects a lot of his philosophy
00:15:31.080 | around how he builds cultures and teams.
00:15:33.720 | So you have two teams.
00:15:35.000 | The Japanese team has four rowers and one steerer.
00:15:38.360 | And the American team has four steerers and one rower.
00:15:42.880 | And can anyone guess, when the American team loses,
00:15:45.920 | what do they do?
00:15:48.640 | Shout it out.
00:15:51.240 | Exactly.
00:15:52.080 | They fire the rower.
00:15:53.520 | And Elon shared this example, I think,
00:15:55.840 | as a reflection of how he thinks about hiring
00:15:57.880 | the right people, building the right people,
00:16:00.280 | building the right teams at the right ratio.
00:16:03.640 | From working so closely with folks
00:16:05.400 | like these incredible leaders, what have you learned?
00:16:10.360 | Yeah, so I would say, definitely,
00:16:12.000 | Elon runs his companies in an extremely unique style.
00:16:14.400 | I don't actually think that people
00:16:15.820 | appreciate how unique it is.
00:16:17.280 | You sort of even read about it in some way,
00:16:19.080 | but you don't understand it, I think.
00:16:21.160 | It's even hard to describe.
00:16:22.400 | I don't even know where to start.
00:16:23.780 | But it's a very unique, different thing.
00:16:25.560 | I like to say that he runs the biggest startups.
00:16:28.240 | And I think it's just--
00:16:32.880 | I don't even know, basically, how to describe it.
00:16:35.440 | It almost feels like it's a longer sort of thing
00:16:37.280 | that I have to think through.
00:16:38.480 | But number one is, so he likes very small, strong, highly
00:16:42.280 | technical teams.
00:16:44.640 | So that's number one.
00:16:45.720 | So I would say, at companies, by default,
00:16:49.920 | the teams grow and they get large.
00:16:52.120 | Elon was always a force against growth.
00:16:54.040 | I would have to work and expend effort to hire people.
00:16:56.680 | I would have to basically plead to hire people.
00:16:59.860 | And then the other thing is that big companies, usually,
00:17:02.160 | you want--
00:17:03.280 | it's really hard to get rid of low performers.
00:17:05.200 | And I think Elon is very friendly to, by default,
00:17:08.200 | getting rid of low performers.
00:17:09.480 | So I actually had to fight for people
00:17:10.980 | to keep them on the team, because he would, by default,
00:17:13.280 | want to remove people.
00:17:15.160 | And so that's one thing.
00:17:16.440 | So keep a small, strong, highly technical team.
00:17:19.200 | No middle management.
00:17:20.200 | That is kind of non-technical, for sure.
00:17:23.360 | So that's number one.
00:17:24.360 | Number two is the vibes of how everything runs
00:17:27.160 | and how it feels when he walks into the office.
00:17:29.440 | He wants it to be a vibrant place.
00:17:31.120 | People are walking around.
00:17:32.880 | They're pacing around.
00:17:34.240 | They're working on exciting stuff.
00:17:36.040 | They're charting something.
00:17:37.200 | They're coding.
00:17:38.080 | He doesn't like stagnation.
00:17:39.320 | He doesn't like for it to look that way.
00:17:41.640 | He doesn't like large meetings.
00:17:43.480 | He always encourages people to leave meetings
00:17:45.520 | if they're not being useful.
00:17:46.960 | So actually, do see this.
00:17:49.200 | It's a large meeting.
00:17:50.320 | If you're not contributing and you're not learning,
00:17:52.400 | just walk out.
00:17:53.040 | And this is fully encouraged.
00:17:54.680 | And I think this is something that you don't normally see.
00:17:57.160 | So I think vibes is a second big lever that I think he really
00:18:00.720 | instills culturally.
00:18:02.320 | Maybe part of that also is, I think a lot of big companies,
00:18:05.080 | they're pamper employees.
00:18:06.480 | I think there's much less of that.
00:18:08.960 | The culture of it is you're there
00:18:10.280 | to do your best technical work.
00:18:12.240 | And there's the intensity and so on.
00:18:15.840 | And I think maybe the last one that
00:18:17.360 | is very unique and very interesting and very strange
00:18:19.560 | is just how connected he is to the team.
00:18:23.120 | So usually, a CEO of a company is a remote person,
00:18:27.360 | five layers up, who talks to their VPs,
00:18:29.440 | who talk to their reports and directors.
00:18:32.000 | And eventually, you talk to your manager.
00:18:33.760 | That's not how you're as companies, right?
00:18:35.520 | He will come to the office.
00:18:37.080 | He will talk to the engineers.
00:18:38.760 | Many of the meetings that we had were like, OK,
00:18:42.120 | 50 people in the room with Elon.
00:18:44.120 | And he talks directly to the engineers.
00:18:46.800 | He doesn't want to talk just to the VPs and the directors.
00:18:50.000 | So normally, people would spend like 99% of the time
00:18:54.320 | maybe talking to the VPs.
00:18:55.400 | He spends maybe 50% of the time.
00:18:56.720 | And he just wants to talk to the engineers.
00:18:58.480 | So if the team is small and strong,
00:19:00.600 | then engineers and the code are the source of truth.
00:19:03.320 | And so they have the source of truth, not some manager.
00:19:05.840 | And he wants to talk to them to understand
00:19:08.240 | the actual state of things and what
00:19:10.200 | should be done to improve it.
00:19:11.960 | So I would say the degree to which
00:19:13.520 | he's connected with the team and not something remote
00:19:15.680 | is also unique.
00:19:16.840 | And also, just like his large hammer and his willingness
00:19:20.360 | to exercise it within the organization.
00:19:22.320 | So maybe if he talks to the engineers
00:19:24.400 | and they bring up that, what's blocking you?
00:19:26.840 | OK, I just don't have enough GPUs to run my thing.
00:19:29.080 | And he's like, oh, OK.
00:19:30.680 | And if he hears that twice, he's going to be like, OK,
00:19:33.080 | this is a problem.
00:19:34.040 | So what is our timeline?
00:19:35.920 | And when you don't have satisfying answers, he's like,
00:19:38.360 | OK, I want to talk to the person in charge of the GPU cluster.
00:19:41.240 | And someone dials the phone.
00:19:42.840 | And he's just like, OK, double the cluster right now.
00:19:45.280 | [LAUGHTER]
00:19:47.400 | Like, let's have a meeting tomorrow.
00:19:49.040 | From now on, send me daily updates until the cluster
00:19:51.200 | is twice the size.
00:19:53.240 | And then they push back.
00:19:54.400 | And they're like, OK, well, we have this procurement set up.
00:19:56.600 | We have this timeline.
00:19:57.480 | And NVIDIA says that we don't have enough GPUs.
00:20:00.000 | And it will take six months or something.
00:20:02.280 | And then you get a rise of an eyebrow.
00:20:04.120 | And then he's like, OK, I want to talk to Jensen.
00:20:06.160 | And then he just removes bottlenecks.
00:20:08.080 | So I think the extent to which he's extremely involved
00:20:10.840 | and removes bottlenecks and applies his hammer,
00:20:13.080 | I think is also not appreciated.
00:20:15.040 | So I think there's a lot of these kinds of aspects
00:20:16.820 | that are very unique, I would say, and very interesting.
00:20:19.200 | And honestly, going to a normal company outside of that,
00:20:24.120 | you definitely miss aspects of that.
00:20:26.800 | And so I think, yeah, maybe that's a long rant.
00:20:30.760 | But that's just kind of like--
00:20:32.000 | I don't think I hit all the points.
00:20:33.460 | But it is a very unique thing.
00:20:34.760 | And it's very interesting.
00:20:36.560 | And yeah, I guess that's my rant.
00:20:40.360 | Hopefully, tactics that most people here can employ.
00:20:44.640 | Taking a step back, you've helped
00:20:46.160 | build some of the most generational companies.
00:20:48.360 | You've also been such a key enabler
00:20:49.960 | for many people, many of whom are in the audience today,
00:20:52.780 | of getting into the field of AI.
00:20:55.360 | Knowing you, what you care most about
00:20:57.400 | is democratizing access to AI--
00:21:00.440 | education, tools, helping create more quality
00:21:04.360 | in the whole ecosystem.
00:21:05.840 | At large, there are many more winners.
00:21:08.700 | As you think about the next chapter in your life,
00:21:11.000 | what gives you the most meaning?
00:21:13.960 | Yeah, I think you've described it in the right way.
00:21:17.520 | Where my brain goes by default is--
00:21:21.300 | I've worked for a few companies.
00:21:22.720 | But I think, ultimately, I care not
00:21:24.560 | about any one specific company.
00:21:25.880 | I care a lot more about the ecosystem.
00:21:27.460 | I want the ecosystem to be healthy.
00:21:29.000 | I want it to be thriving.
00:21:30.020 | I want it to be like a coral reef
00:21:31.440 | of a lot of cool, exciting startups
00:21:33.040 | and all the nooks and crannies of the economy.
00:21:35.040 | And I want the whole thing to be like this boiling
00:21:37.120 | soup of cool stuff.
00:21:38.840 | Genuinely, Andre dreams about coral reefs.
00:21:43.200 | I want it to be like a cool place.
00:21:44.600 | And I think that's why I love startups and I love companies.
00:21:48.040 | And I want there to be a vibrant ecosystem of them.
00:21:50.760 | And by default, I would say a little bit more hesitant
00:21:54.520 | about five megacorps taking over.
00:22:01.800 | Especially with AGI being such a magnifier of power,
00:22:05.760 | I'm worried about what that could look like and so on.
00:22:08.760 | So I have to think that through more.
00:22:10.480 | But yeah, I love the ecosystem.
00:22:13.480 | And I want it to be healthy and vibrant.
00:22:16.000 | Amazing.
00:22:17.120 | We'd love to have some questions from the audience.
00:22:20.120 | Yes, Brian.
00:22:21.520 | Hi, I'm Brian Halligan.
00:22:24.240 | Would you recommend founders follow Elon's management
00:22:27.520 | methods?
00:22:28.120 | Or is it kind of unique to him, and you
00:22:30.240 | shouldn't try to copy him?
00:22:31.320 | Yeah, I think that's a good question.
00:22:39.760 | I think it's up to the DNA of the founder.
00:22:41.520 | Like, you have to have that same kind of a DNA
00:22:43.440 | and that some kind of vibe.
00:22:44.720 | And I think when you're hiring the team,
00:22:46.420 | it's really important that you're
00:22:48.960 | making it clear upfront that this is the kind of company
00:22:51.160 | that you have.
00:22:51.920 | And when people sign up for it, they're
00:22:54.800 | very happy to go along with it, actually.
00:22:56.520 | But if you change it later, I think
00:22:58.040 | people are unhappy with that.
00:22:59.240 | And that's very messy.
00:23:00.360 | So as long as you do it from the start and you're consistent,
00:23:02.860 | I think you can run a company like that.
00:23:05.280 | And it has its own pros and cons as well.
00:23:12.560 | And I think-- so up to the people.
00:23:17.880 | But I think it's a consistent model of company
00:23:21.360 | building and running.
00:23:23.800 | Yes, Alex.
00:23:28.160 | I'm curious if there are any types of model composability
00:23:31.600 | that you're really excited about,
00:23:33.640 | maybe other than mixture of experts.
00:23:35.800 | I'm not sure what you think about model merges,
00:23:38.320 | Franken merges, or any other things
00:23:41.160 | to make model development more composable.
00:23:44.040 | Yeah, that's a good question.
00:23:46.720 | I see papers in this area, but I don't know that anything
00:23:49.480 | has really stuck.
00:23:50.240 | Maybe the composability-- I don't
00:23:51.560 | know exactly what you mean.
00:23:52.680 | But there's a ton of work on primary efficient training
00:23:56.040 | and things like that.
00:23:56.880 | I don't know if you would put that
00:23:57.880 | in the category of composability in the way I understand it.
00:24:01.380 | It's only the case that, like, traditional code is very
00:24:04.140 | composable.
00:24:04.980 | And I would say neural nets are a lot more fully connected
00:24:08.180 | and less composable by default.
00:24:10.180 | But they do compose and can fine tune as a part of a whole.
00:24:13.300 | So as an example, if you're doing, like,
00:24:15.060 | a system that you want to have [INAUDIBLE]
00:24:17.180 | and just images or something like that,
00:24:18.840 | it's very common that you pre-train components.
00:24:20.860 | And then you plug them in and fine tune maybe
00:24:22.220 | through the whole thing, as an example.
00:24:23.900 | So there's composability in those aspects
00:24:25.320 | where you can pre-train small pieces of the cortex
00:24:27.520 | outside and compose them later.
00:24:29.300 | Also through initialization and fine tuning.
00:24:31.180 | So I think to some extent, it's--
00:24:33.460 | so maybe those are my scattered thoughts on it.
00:24:35.420 | But I don't know if I have anything very coherent
00:24:37.620 | otherwise.
00:24:38.120 | Yes, Nick.
00:24:42.060 | So we've got these next word prediction things.
00:24:45.740 | Do you think there's a path towards building
00:24:47.620 | a physicist or a von Neumann type model that
00:24:50.420 | has a mental model of physics that's self-consistent
00:24:53.460 | and can generate new ideas for how you actually do fusion?
00:24:56.420 | How do you get faster than light travel,
00:24:58.620 | if it's even possible?
00:24:59.900 | Is there any path towards that?
00:25:01.900 | Or is it a fundamentally different vector
00:25:04.180 | in terms of these AI model developments?
00:25:06.500 | I think it's fundamentally different in one aspect.
00:25:08.900 | I guess what you're talking about maybe
00:25:10.100 | is just a capability question.
00:25:11.580 | Because the current models are just not good enough.
00:25:13.740 | And I think there are big rocks to be turned here.
00:25:15.980 | And I think people still haven't really seen what's
00:25:18.540 | possible in this space at all.
00:25:21.620 | And roughly speaking, I think we've done step one of AlphaGo.
00:25:25.460 | This is what the team-- we've done imitation learning part.
00:25:28.180 | There's step two of AlphaGo, which is the RL.
00:25:31.140 | And people haven't done that yet.
00:25:32.580 | And I think it's going to fundamentally--
00:25:33.940 | this is the part that actually made it work
00:25:35.780 | and made something superhuman.
00:25:37.620 | And so I think there's big rocks and capability
00:25:43.660 | to still be turned over here.
00:25:47.300 | And the details of that are kind of tricky, potentially.
00:25:51.060 | But I think we just haven't done step two of AlphaGo,
00:25:53.460 | long story short.
00:25:54.220 | And we've just done imitation.
00:25:55.500 | And I don't think that people appreciate--
00:25:57.340 | for example, number one, how terrible the data collection
00:26:00.580 | is for things like Chai-CPT.
00:26:02.940 | Say you have a problem.
00:26:04.140 | Some prompt is some kind of a mathematical problem.
00:26:06.300 | A human comes in and gives the ideal solution
00:26:09.740 | to that problem.
00:26:10.660 | The problem is that the human psychology is different
00:26:12.980 | from the model psychology.
00:26:14.300 | What's easy or hard for the human
00:26:16.300 | are different to what's easy or hard for the model.
00:26:18.900 | And so human kind of fills out some kind of a trace
00:26:22.020 | that comes to the solution.
00:26:23.540 | But some parts of that are trivial to the model.
00:26:25.540 | And some parts of that are a massive leap
00:26:26.900 | that the model doesn't understand.
00:26:28.340 | And so you're kind of just losing it.
00:26:30.580 | And then everything else is polluted by that later.
00:26:32.900 | And so fundamentally, what you need
00:26:34.420 | is the model needs to practice itself
00:26:38.260 | how to solve these problems.
00:26:39.900 | It needs to figure out what works for it
00:26:41.660 | or does not work for it.
00:26:43.060 | Maybe it's not very good at four-digit addition,
00:26:45.820 | so it's going to fall back and use a calculator.
00:26:48.180 | But it needs to learn that for itself based
00:26:49.980 | on its own capability and its own knowledge.
00:26:51.900 | So that's number one.
00:26:52.780 | That's totally broken, I think.
00:26:54.780 | It's a good initializer, though, for something agent-like.
00:26:57.620 | And then the other thing is we're doing reinforcement
00:26:59.860 | learning from human feedback.
00:27:01.220 | But that's a super weak form of reinforcement learning.
00:27:03.620 | It doesn't even count as reinforcement learning,
00:27:05.620 | I think.
00:27:06.380 | What is the equivalent in AlphaGo for RLHF?
00:27:09.980 | What is the reward model?
00:27:12.620 | What I call it is a vibe check.
00:27:15.660 | If you wanted to train an AlphaGo RLHF,
00:27:17.980 | you would be giving two people two boards and said,
00:27:21.140 | which one do you prefer?
00:27:22.260 | And then you would take those labels
00:27:23.220 | and you would train the model.
00:27:24.180 | And then you would RL against that.
00:27:25.660 | Well, what are the issues with that?
00:27:27.180 | It's like, number one, it's just vibes of the board.
00:27:29.980 | That's what you're training against.
00:27:31.060 | Number two, if it's a reward model that's a neural net,
00:27:33.460 | then it's very easy to overfit to that reward model
00:27:36.180 | for the model you're optimizing over.
00:27:37.700 | And it's going to find all these spurious ways of hacking
00:27:42.140 | that massive model is the problem.
00:27:44.500 | So AlphaGo gets around these problems
00:27:46.660 | because they have a very clear objective function
00:27:48.660 | you can RL against it.
00:27:50.380 | So RLHF is like nowhere near, I would say, RL.
00:27:53.060 | It's like silly.
00:27:54.140 | And the other thing is imitation learning, super silly.
00:27:56.380 | RLHF is nice improvement, but it's still silly.
00:27:59.780 | And I think people need to look for better ways of training
00:28:02.580 | these models so that it's in the loop with itself
00:28:04.420 | and it's on psychology.
00:28:05.580 | And I think there will probably be unlocks in that direction.
00:28:09.300 | So it's sort of like graduate school for AI models.
00:28:11.700 | It needs to sit in a room with a book
00:28:13.660 | and quietly question itself for a decade?
00:28:15.820 | Yeah.
00:28:17.300 | I think that would be part of it, yes.
00:28:18.940 | And I think when you are learning stuff
00:28:20.620 | and you're going through textbooks,
00:28:22.940 | there's exercises in the textbook.
00:28:24.180 | Where are those?
00:28:24.880 | Those are prompts to you to exercise the material, right?
00:28:28.740 | And when you're learning material,
00:28:30.140 | not just reading left or right, number one, you're exercising.
00:28:33.100 | But maybe you're taking notes.
00:28:34.340 | You're rephrasing, reframing.
00:28:36.220 | You're doing a lot of manipulation of this knowledge
00:28:38.300 | in a way of you learning that knowledge.
00:28:41.020 | And we haven't seen equivalence of that at all in LLMs.
00:28:43.700 | So it's super early days, I think.
00:28:45.620 | Mm-hmm.
00:28:46.120 | Yes, Yuzi?
00:28:51.140 | Yeah, it's cool to be optimal and practical at the same time.
00:28:59.420 | So I would be asking, how would you
00:29:01.060 | be aligning the priority of A, either doing cost reduction
00:29:04.420 | and revenue generation, or B, finding the better quality
00:29:07.980 | models with better reasoning capabilities?
00:29:10.180 | How would you be aligning that?
00:29:12.440 | So maybe I understand the question.
00:29:13.940 | I think what I see a lot of people
00:29:14.940 | do is they start out with the most capable model that
00:29:18.980 | doesn't matter what the cost is.
00:29:20.320 | So you use GPT-4, you use super prompted, et cetera.
00:29:23.740 | You do reg, et cetera.
00:29:24.700 | So you're just trying to get your thing to work.
00:29:26.700 | So you're going after accuracy first.
00:29:30.420 | And then you make concessions later.
00:29:31.980 | You check if you can fall back to 3.5
00:29:33.660 | for certain types of queries.
00:29:36.460 | And you make it cheaper later.
00:29:38.300 | So I would say, go after performance first.
00:29:40.620 | And then you make it cheaper later.
00:29:43.740 | It's kind of like the paradigm that I've seen--
00:29:45.580 | a few people that I've talked to about this say works for them.
00:29:51.540 | And maybe it's not even just a single prompt.
00:29:53.460 | I like to think about, what are the ways in which you can even
00:29:56.380 | just make it work at all?
00:29:57.700 | Because if you just can make it work at all,
00:29:59.500 | say you make 10 prompts or 20 prompts,
00:30:01.460 | and you pick the best one, and you have some debate,
00:30:03.100 | or I don't know what kind of a crazy flow
00:30:04.800 | you can come up with, just get your thing to work really well.
00:30:07.540 | Because if you have a thing that works really well,
00:30:09.660 | then one other thing you can do is you can distill that.
00:30:12.420 | So you can get a large distribution
00:30:13.540 | of possible problem types.
00:30:14.740 | You run your super expensive thing on it to get your labels.
00:30:16.860 | And then you get a smaller, cheaper thing
00:30:18.580 | that you fine-tune on it.
00:30:20.020 | And so I would say, I would always
00:30:21.620 | go after getting it to work as well as possible,
00:30:24.300 | no matter what, first.
00:30:25.460 | And then make it cheaper, is the thing I would suggest.
00:30:28.540 | Hi, Sam.
00:30:30.300 | One question.
00:30:31.580 | So this past year, we saw a lot of impressive results
00:30:35.220 | from the open source ecosystem.
00:30:36.940 | I'm curious what your opinion is of how
00:30:38.820 | that will continue to keep pace, or not keep pace,
00:30:41.300 | with closed source development as the models continue
00:30:44.060 | to improve in scale?
00:30:46.980 | Yeah, I think that's a very good question.
00:30:49.420 | Yeah, I think that's a very good question.
00:30:52.140 | I don't really know.
00:30:54.740 | Fundamentally, these models are so capital intensive, right?
00:30:57.180 | Like, one thing that is really interesting is, for example,
00:30:58.780 | you have Facebook and Meta and so on who
00:31:00.720 | can afford to train these models at scale.
00:31:02.920 | But then it's also not part of-- it's not the thing that they do.
00:31:05.300 | And it's not involved-- like, their money printer
00:31:07.340 | is unrelated to that.
00:31:08.740 | And so they have actual incentive
00:31:11.260 | to potentially release some of these models
00:31:13.460 | so that they empower the ecosystem as a whole,
00:31:15.500 | so they can actually borrow all the best ideas.
00:31:17.420 | So that, to me, makes sense.
00:31:19.140 | But so far, I would say they've only just
00:31:21.140 | done the open weights model.
00:31:22.460 | And so I think they should actually go further.
00:31:24.500 | And that's what I would hope to see.
00:31:26.220 | And I think it would be better for everyone.
00:31:28.180 | And I think, potentially, maybe there's
00:31:29.800 | squeamish about some of the aspects of it
00:31:32.820 | eventually with respect to data and so on.
00:31:34.540 | I don't know how to overcome that.
00:31:36.700 | Maybe they should try to just find data sources
00:31:40.900 | that they think are very easy to use or something like that
00:31:44.700 | and try to constrain themselves to those.
00:31:46.400 | So I would say those are kind of our champions, potentially.
00:31:50.580 | And I would like to see more transparency also coming from--
00:31:55.020 | and I think Meta and Facebook are doing pretty well.
00:31:57.180 | They've released papers.
00:31:58.140 | They published a logbook and so on.
00:32:01.460 | So I think they're doing well.
00:32:04.220 | But they could do much better in terms
00:32:06.260 | of fostering the ecosystem.
00:32:07.340 | And I think maybe that's coming.
00:32:08.640 | We'll see.
00:32:10.100 | Peter.
00:32:10.700 | Yeah.
00:32:11.380 | Maybe this is an obvious answer given the previous question.
00:32:13.940 | But what do you think would make the AI ecosystem cooler
00:32:17.300 | and more vibrant?
00:32:18.140 | Or what's holding it back?
00:32:19.420 | Is it openness?
00:32:20.700 | Or do you think there's other stuff that is also a big thing
00:32:23.620 | that you'd want to work on?
00:32:32.540 | Yeah, I certainly think one big aspect is just
00:32:35.020 | like the stuff that's available.
00:32:36.460 | I had a tweet recently about, number one, build the thing.
00:32:39.220 | Number two, build the ramp.
00:32:40.480 | I would say there's a lot of people building a thing.
00:32:42.120 | I would say there's a lot less happening
00:32:43.780 | of building the ramps so that people can actually
00:32:45.420 | understand all this stuff.
00:32:46.500 | And I think we're all new to all of this.
00:32:48.460 | We're all trying to understand how it works.
00:32:50.780 | We all need to ramp up and collaborate to some extent
00:32:53.540 | to figure out how to use this effectively.
00:32:55.620 | So I would love for people to be a lot more open with respect
00:32:59.120 | to what they've learned, how they've trained all this,
00:33:01.980 | how what works, what doesn't work for them, et cetera.
00:33:04.700 | And yes, just from us to learn a lot more from each other,
00:33:08.460 | that's number one.
00:33:09.380 | And then number two, I also think
00:33:11.500 | there is quite a bit of momentum in the open ecosystems as well.
00:33:15.520 | So I think that's already good to see.
00:33:17.100 | And maybe there's some opportunities for improvement
00:33:19.020 | I talked about already.
00:33:20.020 | So yeah.
00:33:25.440 | Last question from the audience.
00:33:26.820 | Michael.
00:33:30.220 | To get to the next big performance leap from models,
00:33:34.740 | do you think that it's sufficient to modify
00:33:36.820 | the transformer architecture with, say, thought tokens
00:33:39.620 | or activation beacons?
00:33:40.680 | Or do we need to throw that out entirely
00:33:42.820 | and come up with a new fundamental building block
00:33:44.900 | to take us to the next big step forward or AGI?
00:33:47.820 | Yeah, I think that's a good question.
00:33:52.180 | Well, the first thing I would say is transformer is amazing.
00:33:58.020 | It's just so incredible.
00:33:59.280 | I don't think I would have seen that coming for sure.
00:34:03.300 | For a while before the transformer arrived,
00:34:05.140 | I thought there would be an insane diversification
00:34:07.220 | of neural networks.
00:34:08.340 | And that was not the case.
00:34:09.620 | It's the complete opposite, actually.
00:34:12.260 | It's a complete-- it's all the same model, actually.
00:34:15.740 | So it's incredible to me that we have that.
00:34:17.500 | I don't know that it's the final neural network.
00:34:19.660 | I think there will definitely be--
00:34:22.180 | I would say it's really hard to say that,
00:34:24.540 | given the history of the field, and I've
00:34:26.340 | been in it for a while, it's really hard to say
00:34:27.940 | that this is the end of it.
00:34:29.540 | Absolutely, it's not.
00:34:30.400 | And I feel very optimistic that someone
00:34:33.080 | will be able to find a pretty big change to how
00:34:35.240 | we do things today.
00:34:36.320 | I would say on the front of the autoregressive
00:34:38.160 | or diffusion, which is kind of like the modeling
00:34:40.160 | and the loss setup, I would say there's definitely
00:34:43.080 | some fruit there, probably.
00:34:44.360 | But also on the transformer, and like I mentioned,
00:34:46.840 | these levers of precision and sparsity and as we drive that,
00:34:50.120 | and together with the co-design of the hardware
00:34:52.320 | and how that might evolve, and just making network
00:34:55.320 | architectures that are a lot more sort of well-tuned
00:34:58.760 | to those constraints and how all that works.
00:35:00.760 | To some extent, also, I would say
00:35:05.280 | like transformer is kind of designed for the GPU,
00:35:07.120 | by the way.
00:35:07.620 | That was the big leap, I would say, in the transformer paper.
00:35:10.260 | And that's where they were coming from,
00:35:11.420 | is we want an architecture that is fundamentally
00:35:13.720 | extremely paralyzable.
00:35:15.140 | And because the recurrent neural network
00:35:16.760 | has sequential dependencies, terrible for GPU,
00:35:19.500 | transformer basically broke that through the attention.
00:35:21.760 | And this was like the major sort of insight there.
00:35:25.440 | And it has some predecessors of insights,
00:35:27.360 | like the neural GPU and other papers at Google
00:35:29.560 | that are sort of thinking about this.
00:35:31.100 | But that is a way of targeting the algorithm to the hardware
00:35:34.300 | that you have available.
00:35:35.320 | So I would say that's kind of like in that same spirit.
00:35:37.600 | But long story short, I think it's
00:35:39.560 | very likely we'll see changes to it still.
00:35:42.840 | But it's been proven remarkably resilient.
00:35:45.720 | I have to say, like, it came out many years ago now.
00:35:49.960 | Like, I don't know, six, seven?
00:35:54.080 | Yeah, so you know, like the original transformer
00:35:58.280 | and what we're using today are not super different.
00:36:03.840 | Yeah.
00:36:05.160 | As a parting message to all the founders and builders
00:36:07.520 | in the audience, what advice would you
00:36:09.640 | give them as they dedicate the rest of their lives
00:36:11.840 | to helping shape the future of AI?
00:36:17.640 | So yeah, I don't usually have crazy generic advice.
00:36:21.560 | I think maybe the thing that's top of my mind is I
00:36:24.560 | think founders, of course, care a lot about their startup.
00:36:28.660 | I also want, like, how do we have a vibrant ecosystem
00:36:31.840 | of startups?
00:36:32.440 | How do startups continue to win, especially with respect
00:36:35.120 | to, like, big tech?
00:36:37.280 | And how does the ecosystem become healthier?
00:36:41.160 | And what can you do?
00:36:43.720 | Sounds like you should become an investor.
00:36:47.120 | Amazing.
00:36:47.840 | Thank you so much for joining us, Andre, for this
00:36:49.800 | and also for the whole day today.
00:36:51.240 | [APPLAUSE]
00:36:54.280 | (audience applauding)