Back to Index

Making AI accessible with Andrej Karpathy and Stephanie Zhan


Transcript

I'm thrilled to introduce our next and final speaker, Andrej Karpathy. I think Karpathy probably needs no introduction. Most of us have probably watched his YouTube videos at length. But he's renowned for his research in deep learning. He designed the first deep learning class at Stanford, was part of the founding team at OpenAI, led the computer vision team at Tesla, and is now a mystery man again, now that he has just left OpenAI.

So we're very lucky to have you here. I think, Andrej, you've been such a dream speaker. And so we're excited to have you and Stephanie close out the day. Thank you. Andrej's first reaction as we walked up here was, oh my god, to his picture. It's like a very intimidating photo.

I don't know what year it was taken, but he's impressed. OK, amazing. Andrej, thank you so much for joining us today, and welcome back. Thank you. Fun fact that most people don't actually know-- how many folks here know where OpenAI's original office was? It's amazing. Nick? I'm going to guess right here.

Right here. Right here on the opposite side of our San Francisco office, where actually many of you guys were just in huddles. So this is fun for us, because it brings us back to our roots, back when I first started at Sequoia, and when Andrej first started co-founding OpenAI.

Andrej, in addition to living out the Willy Wonka, working atop a chocolate factory dream, what were some of your favorite moments working from here? Yeah, so OpenAI was right there. And this was the first office after, I guess, Greg's apartment, which maybe doesn't count. And so, yeah, we spent maybe two years here.

And the chocolate factory was just downstairs, so it always smelled really nice. And yeah, I guess the team was 10, 20 plus. And yeah, we had a few very fun episodes here. One of them was alluded to by Jensen at GTC that happened just yesterday or two days ago.

So Jensen was describing how he brought the first DGX and how he delivered it to OpenAI. So that happened right there. So that's where we all signed it. It's in the room over there. So Andrej needs no introduction, but I wanted to give a little bit of backstory on some of his journey to date.

As Sonia had introduced, he was trained by Jeff Hinton and then Fei-Fei. His first claim to fame was his deep learning course at Stanford. He co-founded OpenAI back in 2015. In 2017, he was poached by Elon. I remember this very, very clearly. For folks who don't remember the context then, Elon had just transitioned through six different autopilot leaders, each of whom lasted six months each.

And I remember when Andrej took this job, I thought, congratulations and good luck. Not too long after that, he went back to OpenAI and has been there for the last year. Now, unlike all the rest of us today, he is basking in the ultimate glory of freedom in all time and responsibility.

And so we're really excited to see what you have to share today. A few things that I appreciate the most from Andrej are that he is an incredible, fascinating, futurist thinker. He is a relentless optimist. And he's a very practical builder. And so I think he'll share some of his insights around that today.

To kick things off, AGI, even seven years ago, seemed like an incredibly impossible task to achieve, even in the span of our lifetimes. Now it seems within sight. What is your view of the future over the next N years? Yes, I think you're right. I think a few years ago, I sort of felt like AGI was-- it wasn't clear how it was going to happen.

It was very sort of academic. And you would think about different approaches. And now I think it's very clear. And there's a lot of space. And everyone is trying to fill it. And so there's a lot of optimization. And I think, roughly speaking, the way things are happening is everyone is trying to build what I refer to as kind of like this LLM OS.

And basically, I like to think of it as an operating system. You have to get a bunch of peripherals that you plug into this new CPU or something like that. The peripherals are, of course, like text, images, audio, and all the modalities. And then you have a CPU, which is the LLM transformer itself.

And then it's also connected to all the software 1.0 infrastructure that we've already built up for ourselves. And so I think everyone is kind of trying to build something like that and then make it available as something that's customizable to all the different nooks and crannies of the economy.

And so I think that's kind of roughly what everyone is trying to build out and what we sort of also heard about earlier today. So I think that's roughly where it's headed is we can bring up and down these relatively self-contained agents that we can give high-level tasks to and specialize in various ways.

So yeah, I think it's going to be very interesting and exciting. And it's not just one agent. It's many agents. And what does that look like? And if that view of the future is true, how should we all be living our lives differently? I don't know. I guess we have to try to build it, influence it, make sure it's good, and just try to make sure it turns out well.

So now that you're a free, independent agent, I want to address the elephant in the room, which is that open AI is dominating the ecosystem. And most of our audience here today are founders who are trying to carve out a little niche, praying that open AI doesn't take them out overnight.

Where do you think opportunities exist for other players to build new independent companies versus what areas do you think open AI will continue to dominate, even as its ambition grows? Yes, so my high-level impression is basically open AI is trying to build out this LLMOS OS. And I think, as we heard earlier today, it's trying to develop this platform on top of which you can position different companies in different verticals.

Now, I think the OS analogy is also really interesting, because when you look at something like Windows or something like that-- these are also operating systems-- they come with a few default apps, like a browser comes with Windows, right? You can use the Edge browser. And so I think, in the same way, open AI or any of the other companies might come up with a few default apps, quote unquote.

But that doesn't mean that you can have different browsers that are running on it, just like you can have different chat agents running on that infrastructure. And so there will be a few default apps, but there will also be, potentially, a vibrant ecosystem of all kinds of apps that are fine-tuned to all the different nooks and crannies of the economy.

And I really like the analogy of the early iPhone apps and what they looked like. And they were all kind of like jokes. And it took time for that to develop. And I think, absolutely, I'd agree that we're going through the same thing right now. People are trying to figure out, what is this thing good at?

What is it not good at? How do I work it? How do I program with it? How do I debug it? How do I just actually get it to perform real tasks? And what kind of oversight-- because it's quite autonomous, but not fully autonomous. So what does the oversight look like?

What does the evaluation look like? So there's many things to think through and just to understand the psychology of it. And I think that's what's going to take some time to figure out exactly how to work with this infrastructure. So I think we'll see that over the next few years.

So the race is on right now with LLMs-- OpenAI, Anthropic, Mistral, Lama, Gemini-- the whole ecosystem of open source models, now a whole long tail of small models. How do you foresee the future of the ecosystem playing out? So again, I think the operating systems analogy is interesting, because we have, say-- we have basically an oligopoly of a few proprietary systems, like, say, Windows, Mac OS, et cetera.

And then we also have Linux. And Linux has an infinity of distributions. And so I think maybe it's going to look something like that. I also think we have to be careful with the naming, because a lot of the ones that you listed, like Lama, Mistral, and so on, I wouldn't actually say they're open source, right?

And so it's kind of like tossing over a binary for an operating system. Like, you can kind of work with it, and it's useful, but it's not fully useful, right? And there are a number of what I would say is fully open source LLMs. So there's Pythia models, LLM360, Olmo, et cetera.

And they're fully releasing the entire infrastructure that's required to compile the operating system, to train the model from the data, to gather the data, et cetera. And so when you're just given a binary, it's much better, of course, because you can fine-tune the model, which is useful. But also, I think it's subtle, but you can't fully fine-tune the model, because the more you fine-tune the model, the more it's going to start regressing on everything else.

And so what you actually really want to do, for example, if you want to add capability, and not regress the other capabilities, you may want to train on some kind of like a mixture of the previous data set distribution and the new data set distribution. Because you don't want to regress the old distribution, you just want to add knowledge.

And if you're just given the weights, you can't do that, actually. You need the training loop, you need the data set, et cetera. So you are actually constrained in how you can work with these models. And again, I think it's definitely helpful, but I think we need slightly better language for it, almost.

So there's open weights models, open source models, and then proprietary models, I guess. And that might be the ecosystem. And yeah, probably it's going to look very similar to the ones that we have today. And hopefully you'll continue to help build some of that out. So I'd love to address the other elephant in the room, which is scale.

Simplistically, it seems like scale is all that matters. Scale of data, scale of compute, and therefore the large research labs, large tech giants have an immense advantage today. What is your view of that? And is that all that matters? And if not, what else does? So I would say scale is definitely number one.

I do think there are details there to get right. And I think a lot also goes into the data set preparation and so on, making it very good and clean, et cetera. That matters a lot. These are all compute efficiency gains that you can get. So there's the data, the algorithms, and then, of course, the training of the model and making it really large.

So I think scale will be the primary determining factor. It's like the first principal component of things, for sure. But there are many of the other things that you need to get right. So it's almost like the scale sets some kind of a speed limit, almost. But you do need some of the other things.

But it's like, if you don't have the scale, then you fundamentally just can't train some of these massive models if you are going to be training models. If you're just going to be doing fine-tuning and so on, then I think maybe less scale is necessary. But we haven't really seen that just yet fully play out.

And can you share more about some of the ingredients that you think also matter, maybe lower in priority behind scale? Yeah, so the first thing, I think, is you can't just train these models. If you're just given the money and the scale, it's actually still really hard to build these models.

And part of it is that the infrastructure is still so new. And it's still being developed and not quite there. But training these models at scale is extremely difficult. And it's a very complicated distributed optimization problem. And there's actually-- the talent for this is fairly scarce right now. And it just basically turns into this insane thing running on tens of thousands of GPUs.

All of them are failing at random at different points in time. And so instrumenting that and getting that to work is actually an extremely difficult challenge. GPUs were not intended for 10,000 GPU workloads until very recently. And so I think a lot of the infrastructure is creaking under that pressure.

And we need to work through that. But right now, if you're just giving someone a ton of money or a ton of scale or GPUs, it's not obvious to me that they can just produce one of these models, which is why it's not just about scale. You actually need a ton of expertise, both on the infrastructure side, the algorithm side, and then the data side, and being careful with that.

So I think those are the major components. The ecosystem is moving so quickly. Even some of the challenges we thought existed a year ago are being solved more and more today-- hallucinations, context windows, multimodal capabilities, inference getting better, faster, cheaper. What are the LLM research challenges today that keep you up at night?

What do you think are meaty enough problems, but also solvable problems, that we can continue to go after? So I would say on the algorithm side, one thing I'm thinking about quite a bit is this distinct split between diffusion models and autoregressive models. They're both ways of representing probability distributions.

And it just turns out that different modalities are apparently a good fit for one of the two. I think that there's probably some space to unify them or to connect them in some way. And also, get some best of both worlds, or figure out how we can get a hybrid architecture, and so on.

So it's just odd to me that we have two separate points in the space of models. And they're both extremely good. And it just feels wrong to me that there's nothing in between. So I think we'll see that carved out. And I think there are interesting problems there. And then the other thing that maybe I would point to is there's still a massive gap in just the energetic efficiency of running all this stuff.

So my brain is 20 watts, roughly. Jensen was just talking at GTC about the massive supercomputers that they're going to be building now. These are-- the numbers are in mega megawatts, right? And so maybe you don't need all that to run a brain. I don't know how much you need exactly.

But I think it's safe to say we're probably off by a factor of 1,000 to a million somewhere there, in terms of the efficiency of running these models. And I think part of it is just because the computers we've designed, of course, are just not a good fit for this workload.

And I think NVIDIA GPUs are a good step in that direction, in terms of you need extremely high parallelism. We don't actually care about sequential computation that is data-dependent in some way. We just have these-- we just need to blast the same algorithm across many different array elements, or something you can think about it that way.

So I would say number one is just adapting the computer architecture to the new data workflows. Number two is pushing on a few things that we're currently seeing improvements on. So number one, maybe, is precision. We're seeing precision come down from what originally was those 64-bit for double. We're now down to-- I don't know what it is-- 4, 5, 6, or even 1.58, depending on which papers you read.

And so I think precision is one big lever of getting a handle on this. And then the second one, of course, is sparsity. So that's also another big delta, I would say. Your brain is not always fully activated. And so sparsity, I think, is another big lever. But then the last lever, I also feel like just the von Neumann architecture of computers and how they build, where you're shuttling data in and out and doing a ton of data movement between memory and the cores that are doing all the compute.

This is all broken as well, and it's not how your brain works. And that's why it's so efficient. And so I think it should be a very exciting time in computer architecture. I'm not a computer architect. But I think it seems like we're off by a factor of a million, 1,000 to a million, something like that.

And there should be really exciting innovations there that bring that down. I think there are at least a few builders in the audience working on this problem. OK, switching gears a little bit, you've worked alongside many of the greats of our generation-- Sam, Greg from OpenAI, and the rest of the OpenAI team, Elon Musk.

Who here knows the joke about the rowing team, the American team versus the Japanese team? OK, great. So this will be a good one. Elon shared this at our last Base Camp. And I think it reflects a lot of his philosophy around how he builds cultures and teams. So you have two teams.

The Japanese team has four rowers and one steerer. And the American team has four steerers and one rower. And can anyone guess, when the American team loses, what do they do? Shout it out. Exactly. They fire the rower. And Elon shared this example, I think, as a reflection of how he thinks about hiring the right people, building the right people, building the right teams at the right ratio.

From working so closely with folks like these incredible leaders, what have you learned? Yeah, so I would say, definitely, Elon runs his companies in an extremely unique style. I don't actually think that people appreciate how unique it is. You sort of even read about it in some way, but you don't understand it, I think.

It's even hard to describe. I don't even know where to start. But it's a very unique, different thing. I like to say that he runs the biggest startups. And I think it's just-- I don't even know, basically, how to describe it. It almost feels like it's a longer sort of thing that I have to think through.

But number one is, so he likes very small, strong, highly technical teams. So that's number one. So I would say, at companies, by default, the teams grow and they get large. Elon was always a force against growth. I would have to work and expend effort to hire people. I would have to basically plead to hire people.

And then the other thing is that big companies, usually, you want-- it's really hard to get rid of low performers. And I think Elon is very friendly to, by default, getting rid of low performers. So I actually had to fight for people to keep them on the team, because he would, by default, want to remove people.

And so that's one thing. So keep a small, strong, highly technical team. No middle management. That is kind of non-technical, for sure. So that's number one. Number two is the vibes of how everything runs and how it feels when he walks into the office. He wants it to be a vibrant place.

People are walking around. They're pacing around. They're working on exciting stuff. They're charting something. They're coding. He doesn't like stagnation. He doesn't like for it to look that way. He doesn't like large meetings. He always encourages people to leave meetings if they're not being useful. So actually, do see this.

It's a large meeting. If you're not contributing and you're not learning, just walk out. And this is fully encouraged. And I think this is something that you don't normally see. So I think vibes is a second big lever that I think he really instills culturally. Maybe part of that also is, I think a lot of big companies, they're pamper employees.

I think there's much less of that. The culture of it is you're there to do your best technical work. And there's the intensity and so on. And I think maybe the last one that is very unique and very interesting and very strange is just how connected he is to the team.

So usually, a CEO of a company is a remote person, five layers up, who talks to their VPs, who talk to their reports and directors. And eventually, you talk to your manager. That's not how you're as companies, right? He will come to the office. He will talk to the engineers.

Many of the meetings that we had were like, OK, 50 people in the room with Elon. And he talks directly to the engineers. He doesn't want to talk just to the VPs and the directors. So normally, people would spend like 99% of the time maybe talking to the VPs.

He spends maybe 50% of the time. And he just wants to talk to the engineers. So if the team is small and strong, then engineers and the code are the source of truth. And so they have the source of truth, not some manager. And he wants to talk to them to understand the actual state of things and what should be done to improve it.

So I would say the degree to which he's connected with the team and not something remote is also unique. And also, just like his large hammer and his willingness to exercise it within the organization. So maybe if he talks to the engineers and they bring up that, what's blocking you?

OK, I just don't have enough GPUs to run my thing. And he's like, oh, OK. And if he hears that twice, he's going to be like, OK, this is a problem. So what is our timeline? And when you don't have satisfying answers, he's like, OK, I want to talk to the person in charge of the GPU cluster.

And someone dials the phone. And he's just like, OK, double the cluster right now. Like, let's have a meeting tomorrow. From now on, send me daily updates until the cluster is twice the size. And then they push back. And they're like, OK, well, we have this procurement set up.

We have this timeline. And NVIDIA says that we don't have enough GPUs. And it will take six months or something. And then you get a rise of an eyebrow. And then he's like, OK, I want to talk to Jensen. And then he just removes bottlenecks. So I think the extent to which he's extremely involved and removes bottlenecks and applies his hammer, I think is also not appreciated.

So I think there's a lot of these kinds of aspects that are very unique, I would say, and very interesting. And honestly, going to a normal company outside of that, you definitely miss aspects of that. And so I think, yeah, maybe that's a long rant. But that's just kind of like-- I don't think I hit all the points.

But it is a very unique thing. And it's very interesting. And yeah, I guess that's my rant. Hopefully, tactics that most people here can employ. Taking a step back, you've helped build some of the most generational companies. You've also been such a key enabler for many people, many of whom are in the audience today, of getting into the field of AI.

Knowing you, what you care most about is democratizing access to AI-- education, tools, helping create more quality in the whole ecosystem. At large, there are many more winners. As you think about the next chapter in your life, what gives you the most meaning? Yeah, I think you've described it in the right way.

Where my brain goes by default is-- I've worked for a few companies. But I think, ultimately, I care not about any one specific company. I care a lot more about the ecosystem. I want the ecosystem to be healthy. I want it to be thriving. I want it to be like a coral reef of a lot of cool, exciting startups and all the nooks and crannies of the economy.

And I want the whole thing to be like this boiling soup of cool stuff. Genuinely, Andre dreams about coral reefs. I want it to be like a cool place. And I think that's why I love startups and I love companies. And I want there to be a vibrant ecosystem of them.

And by default, I would say a little bit more hesitant about five megacorps taking over. Especially with AGI being such a magnifier of power, I'm worried about what that could look like and so on. So I have to think that through more. But yeah, I love the ecosystem. And I want it to be healthy and vibrant.

Amazing. We'd love to have some questions from the audience. Yes, Brian. Hi, I'm Brian Halligan. Would you recommend founders follow Elon's management methods? Or is it kind of unique to him, and you shouldn't try to copy him? Yeah, I think that's a good question. I think it's up to the DNA of the founder.

Like, you have to have that same kind of a DNA and that some kind of vibe. And I think when you're hiring the team, it's really important that you're making it clear upfront that this is the kind of company that you have. And when people sign up for it, they're very happy to go along with it, actually.

But if you change it later, I think people are unhappy with that. And that's very messy. So as long as you do it from the start and you're consistent, I think you can run a company like that. And it has its own pros and cons as well. And I think-- so up to the people.

But I think it's a consistent model of company building and running. Yes, Alex. Hi. I'm curious if there are any types of model composability that you're really excited about, maybe other than mixture of experts. I'm not sure what you think about model merges, Franken merges, or any other things to make model development more composable.

Yeah, that's a good question. I see papers in this area, but I don't know that anything has really stuck. Maybe the composability-- I don't know exactly what you mean. But there's a ton of work on primary efficient training and things like that. I don't know if you would put that in the category of composability in the way I understand it.

It's only the case that, like, traditional code is very composable. And I would say neural nets are a lot more fully connected and less composable by default. But they do compose and can fine tune as a part of a whole. So as an example, if you're doing, like, a system that you want to have and just images or something like that, it's very common that you pre-train components.

And then you plug them in and fine tune maybe through the whole thing, as an example. So there's composability in those aspects where you can pre-train small pieces of the cortex outside and compose them later. Also through initialization and fine tuning. So I think to some extent, it's-- so maybe those are my scattered thoughts on it.

But I don't know if I have anything very coherent otherwise. Yes, Nick. So we've got these next word prediction things. Do you think there's a path towards building a physicist or a von Neumann type model that has a mental model of physics that's self-consistent and can generate new ideas for how you actually do fusion?

How do you get faster than light travel, if it's even possible? Is there any path towards that? Or is it a fundamentally different vector in terms of these AI model developments? I think it's fundamentally different in one aspect. I guess what you're talking about maybe is just a capability question.

Because the current models are just not good enough. And I think there are big rocks to be turned here. And I think people still haven't really seen what's possible in this space at all. And roughly speaking, I think we've done step one of AlphaGo. This is what the team-- we've done imitation learning part.

There's step two of AlphaGo, which is the RL. And people haven't done that yet. And I think it's going to fundamentally-- this is the part that actually made it work and made something superhuman. And so I think there's big rocks and capability to still be turned over here. And the details of that are kind of tricky, potentially.

But I think we just haven't done step two of AlphaGo, long story short. And we've just done imitation. And I don't think that people appreciate-- for example, number one, how terrible the data collection is for things like Chai-CPT. Say you have a problem. Some prompt is some kind of a mathematical problem.

A human comes in and gives the ideal solution to that problem. The problem is that the human psychology is different from the model psychology. What's easy or hard for the human are different to what's easy or hard for the model. And so human kind of fills out some kind of a trace that comes to the solution.

But some parts of that are trivial to the model. And some parts of that are a massive leap that the model doesn't understand. And so you're kind of just losing it. And then everything else is polluted by that later. And so fundamentally, what you need is the model needs to practice itself how to solve these problems.

It needs to figure out what works for it or does not work for it. Maybe it's not very good at four-digit addition, so it's going to fall back and use a calculator. But it needs to learn that for itself based on its own capability and its own knowledge. So that's number one.

That's totally broken, I think. It's a good initializer, though, for something agent-like. And then the other thing is we're doing reinforcement learning from human feedback. But that's a super weak form of reinforcement learning. It doesn't even count as reinforcement learning, I think. What is the equivalent in AlphaGo for RLHF?

What is the reward model? What I call it is a vibe check. If you wanted to train an AlphaGo RLHF, you would be giving two people two boards and said, which one do you prefer? And then you would take those labels and you would train the model. And then you would RL against that.

Well, what are the issues with that? It's like, number one, it's just vibes of the board. That's what you're training against. Number two, if it's a reward model that's a neural net, then it's very easy to overfit to that reward model for the model you're optimizing over. And it's going to find all these spurious ways of hacking that massive model is the problem.

So AlphaGo gets around these problems because they have a very clear objective function you can RL against it. So RLHF is like nowhere near, I would say, RL. It's like silly. And the other thing is imitation learning, super silly. RLHF is nice improvement, but it's still silly. And I think people need to look for better ways of training these models so that it's in the loop with itself and it's on psychology.

And I think there will probably be unlocks in that direction. So it's sort of like graduate school for AI models. It needs to sit in a room with a book and quietly question itself for a decade? Yeah. I think that would be part of it, yes. And I think when you are learning stuff and you're going through textbooks, there's exercises in the textbook.

Where are those? Those are prompts to you to exercise the material, right? And when you're learning material, not just reading left or right, number one, you're exercising. But maybe you're taking notes. You're rephrasing, reframing. You're doing a lot of manipulation of this knowledge in a way of you learning that knowledge.

And we haven't seen equivalence of that at all in LLMs. So it's super early days, I think. Mm-hmm. Yes, Yuzi? Yeah, it's cool to be optimal and practical at the same time. So I would be asking, how would you be aligning the priority of A, either doing cost reduction and revenue generation, or B, finding the better quality models with better reasoning capabilities?

How would you be aligning that? So maybe I understand the question. I think what I see a lot of people do is they start out with the most capable model that doesn't matter what the cost is. So you use GPT-4, you use super prompted, et cetera. You do reg, et cetera.

So you're just trying to get your thing to work. So you're going after accuracy first. And then you make concessions later. You check if you can fall back to 3.5 for certain types of queries. And you make it cheaper later. So I would say, go after performance first. And then you make it cheaper later.

It's kind of like the paradigm that I've seen-- a few people that I've talked to about this say works for them. And maybe it's not even just a single prompt. I like to think about, what are the ways in which you can even just make it work at all?

Because if you just can make it work at all, say you make 10 prompts or 20 prompts, and you pick the best one, and you have some debate, or I don't know what kind of a crazy flow you can come up with, just get your thing to work really well.

Because if you have a thing that works really well, then one other thing you can do is you can distill that. So you can get a large distribution of possible problem types. You run your super expensive thing on it to get your labels. And then you get a smaller, cheaper thing that you fine-tune on it.

And so I would say, I would always go after getting it to work as well as possible, no matter what, first. And then make it cheaper, is the thing I would suggest. Hi, Sam. Hi. One question. So this past year, we saw a lot of impressive results from the open source ecosystem.

I'm curious what your opinion is of how that will continue to keep pace, or not keep pace, with closed source development as the models continue to improve in scale? Yeah, I think that's a very good question. Yeah, I think that's a very good question. I don't really know. Fundamentally, these models are so capital intensive, right?

Like, one thing that is really interesting is, for example, you have Facebook and Meta and so on who can afford to train these models at scale. But then it's also not part of-- it's not the thing that they do. And it's not involved-- like, their money printer is unrelated to that.

And so they have actual incentive to potentially release some of these models so that they empower the ecosystem as a whole, so they can actually borrow all the best ideas. So that, to me, makes sense. But so far, I would say they've only just done the open weights model.

And so I think they should actually go further. And that's what I would hope to see. And I think it would be better for everyone. And I think, potentially, maybe there's squeamish about some of the aspects of it eventually with respect to data and so on. I don't know how to overcome that.

Maybe they should try to just find data sources that they think are very easy to use or something like that and try to constrain themselves to those. So I would say those are kind of our champions, potentially. And I would like to see more transparency also coming from-- and I think Meta and Facebook are doing pretty well.

They've released papers. They published a logbook and so on. So I think they're doing well. But they could do much better in terms of fostering the ecosystem. And I think maybe that's coming. We'll see. Peter. Yeah. Maybe this is an obvious answer given the previous question. But what do you think would make the AI ecosystem cooler and more vibrant?

Or what's holding it back? Is it openness? Or do you think there's other stuff that is also a big thing that you'd want to work on? Yeah, I certainly think one big aspect is just like the stuff that's available. I had a tweet recently about, number one, build the thing.

Number two, build the ramp. I would say there's a lot of people building a thing. I would say there's a lot less happening of building the ramps so that people can actually understand all this stuff. And I think we're all new to all of this. We're all trying to understand how it works.

We all need to ramp up and collaborate to some extent to figure out how to use this effectively. So I would love for people to be a lot more open with respect to what they've learned, how they've trained all this, how what works, what doesn't work for them, et cetera.

And yes, just from us to learn a lot more from each other, that's number one. And then number two, I also think there is quite a bit of momentum in the open ecosystems as well. So I think that's already good to see. And maybe there's some opportunities for improvement I talked about already.

So yeah. Last question from the audience. Michael. To get to the next big performance leap from models, do you think that it's sufficient to modify the transformer architecture with, say, thought tokens or activation beacons? Or do we need to throw that out entirely and come up with a new fundamental building block to take us to the next big step forward or AGI?

Yeah, I think that's a good question. Well, the first thing I would say is transformer is amazing. It's just so incredible. I don't think I would have seen that coming for sure. For a while before the transformer arrived, I thought there would be an insane diversification of neural networks.

And that was not the case. It's the complete opposite, actually. It's a complete-- it's all the same model, actually. So it's incredible to me that we have that. I don't know that it's the final neural network. I think there will definitely be-- I would say it's really hard to say that, given the history of the field, and I've been in it for a while, it's really hard to say that this is the end of it.

Absolutely, it's not. And I feel very optimistic that someone will be able to find a pretty big change to how we do things today. I would say on the front of the autoregressive or diffusion, which is kind of like the modeling and the loss setup, I would say there's definitely some fruit there, probably.

But also on the transformer, and like I mentioned, these levers of precision and sparsity and as we drive that, and together with the co-design of the hardware and how that might evolve, and just making network architectures that are a lot more sort of well-tuned to those constraints and how all that works.

To some extent, also, I would say like transformer is kind of designed for the GPU, by the way. That was the big leap, I would say, in the transformer paper. And that's where they were coming from, is we want an architecture that is fundamentally extremely paralyzable. And because the recurrent neural network has sequential dependencies, terrible for GPU, transformer basically broke that through the attention.

And this was like the major sort of insight there. And it has some predecessors of insights, like the neural GPU and other papers at Google that are sort of thinking about this. But that is a way of targeting the algorithm to the hardware that you have available. So I would say that's kind of like in that same spirit.

But long story short, I think it's very likely we'll see changes to it still. But it's been proven remarkably resilient. I have to say, like, it came out many years ago now. Like, I don't know, six, seven? Yeah, so you know, like the original transformer and what we're using today are not super different.

Yeah. As a parting message to all the founders and builders in the audience, what advice would you give them as they dedicate the rest of their lives to helping shape the future of AI? So yeah, I don't usually have crazy generic advice. I think maybe the thing that's top of my mind is I think founders, of course, care a lot about their startup.

I also want, like, how do we have a vibrant ecosystem of startups? How do startups continue to win, especially with respect to, like, big tech? And how does the ecosystem become healthier? And what can you do? Sounds like you should become an investor. Amazing. Thank you so much for joining us, Andre, for this and also for the whole day today.

(audience applauding)