Hey, everyone. Welcome to the Lead in Space Podcast. This is Alessio, partner and CTO in Residence at Decibel Partners. And I'm joined by my co-host, Svex, founder of Small.ai. Hey, and today we have in the studio, Sumev Chantala. Welcome. Thanks for having me. On one of your rare visits from New York, where you live.
You got your start in computer vision at NYU with Yann LeCun. That was a very fortuitous start. I was actually listening to your interview on the Gradient Podcast. So if people want to know more about the history of Sumev, history of PyTorch, they can go to that podcast. We won't spend that much time there.
But I just was marveling at your luck, or I don't know if it's your luck or your drive to find AI early and then find the right quality mentor. Because I guess Yann really introduced you to that world. You're talking about extrinsic success, right? A lot of people just have drive to do things that they think is fun.
And a lot of those things might or might not be extrinsically perceived as good and successful. I think I just happen to like something that is now like one of the coolest things in the world or whatever. But if I happen-- the first thing I tried to become was 3D VFX artists.
And I was really interested in doing that, but I turned out to be very bad at it. So I ended up not doing that further. But even if I was good at that, whatever, and I ended up going down that path, I probably would have been equally happy. It's just like maybe the perception of, oh, is this person successful or not might be different.
But I think after a baseline, your happiness is probably more correlated with your intrinsic stuff. Yes. I think Dan Pink has this book on drive that I often refer to about the power of intrinsic motivation versus extrinsic and how long extrinsic lasts. It's not very long at all. But anyway, now you are an investor in Runway.
So in a way, you're working on VFX. Yes. I mean, in a very convoluted way. It reminds me of the Ed Catmull. I don't know if you guys know. He actually tried to become an animator in his early years and failed, or didn't get accepted by Disney, and then went and created Pixar and then got bought by Disney and created Toy Story.
So you joined Facebook in 2014 and eventually became creator and maintainer of PyTorch. And there's this long story there you can refer to on the gradient. But you also-- I think maybe people don't know that you also involved in more hardware and cluster decision affair. And we can dive into more details there, because we're all about hardware this month.
And then finally, I don't know what else should people know about you on the personal side or the professional side. I think open source is definitely like a big passion of mine and probably forms a little bit of my identity at this point. I am irrationally interested in open source.
It's like one of those things that I attribute to-- I think open source has that fundamental way to distribute opportunity in a way that is very powerful. I grew up in India. I didn't have internet for a while. And in college, actually, I didn't have internet, except for like GPRS or whatever.
So just having-- and knowledge was very centralized. But I saw that evolution of knowledge slowly getting decentralized. And that ended up helping me learn quicker and faster for like $0. And I think that was a strong reason why I ended up where I am. So the open source side of things, I always push regardless of what I get paid for.
I think I would do that as a passion project on the side. Yeah, that's wonderful. And we will talk about the challenges as well that open source has, open models versus closed models. But maybe you want to touch a little bit on PyTorch before we move on to sort of meta AI in general.
Yeah, we kind of touched on PyTorch in a lot of episodes. So we had George Hotz from TinyGret. He called PyTorch a CISC and TinyGret a RISC. I would love to get your thoughts on PyTorch design direction as far as-- I know you talk a lot about kind of having a happy path to start with and then making complexity hidden away, but then available to the end user.
One of the things that George mentioned is I think you have like 250 primitive operators in PyTorch. I think TinyGret is four. So how do you think about some of the learnings that maybe he's going to run into that you already had in the past seven, eight years almost of running PyTorch?
Yeah, I think everyone starts-- there's different models here, but I think it's two different models that people generally start with. Either they go like, I have a grand vision, and I'm going to build a giant system that achieves this grand vision. And my B1 is like super complex, feature complete, whatever.
Or other people say they will get incrementally ambitious. They say, oh, we'll start with something simple, and then we'll slowly layer out complexity in a way that optimally applies Huffman coding or whatever. Where the density of users are and what they're using, I would want to keep it in the easy, happy path.
And where the more niche advanced use cases, I still want people to try them, but they need to take additional frictional steps. George, I think, just like we started with PyTorch, George started with the incrementally ambitious thing. I remember TinyGrad used to be like we would be limited to 1,000 lines of code, and I think now it's like 5,000.
So I think there is no real magic to which why PyTorch has a kind of complexity. I think it's probably partly necessitated and partly because we built with the technology available under us at that time. PyTorch is like 190,000 lines of code or something at this point. I think if we had to rewrite it, we would probably think about ways to rewrite it in a vastly simplified way, for sure.
But a lot of that complexity comes from the fact that in a very simple, explainable way, you have memory hierarchies. CPU has like three levels of caches, and then you have DRAM and SSD, and then you have network. Similarly, GPU has several levels of memory, and then you have different levels of network hierarchies, NVLink plus InfiniBand or Rocky or something like that.
And the way the flops are available on your hardware, they are available in a certain way, and your computation is in a certain way, and you have to retrofit your computation onto both the memory hierarchy and the flops available. When you're doing this, it is actually like a fairly hard mathematical problem to do this setup, like find the optimal thing.
And finding the optimal thing is like, what is optimal? What is optimal depends on the input variables themselves. So like, OK, what is the shape of your input tensors, and what is the operation you're trying to do, and various things like that. Finding that optimal configuration and writing it down in code is not the same for every input configuration you have.
For example, just as the shape of the tensors change, let's say you have three input tensors into a sparse dot product or something like that. The shape of each of these input tensors will vastly change how you do this optimally placing this operation onto the hardware in a way that will get you maximal throughput.
So a lot of our complexity comes from writing out like hundreds of configurations for each single PyTorch operator and templatizing these things and symbolically generating the final CUDA code or CPU code. There's no way to avoid it, because mathematically we haven't found symbolic ways to do this that also keep compile time near zero.
You can write a very simple framework, but then you also should be willing to eat the long compile times of searching for that optimal performance at runtime. So that's the trade-off. I don't think, unless we have great breakthroughs, George's vision is achievable. Or he should be thinking about a narrower problem, such as, I'm only going to make this work for self-driving car continents.
Or I'm only going to make this work for LLM transformers of the llama style. If you start narrowing the problem down, you can make a vastly simpler framework. But if you don't, if you need the generality to power all of the AI research that is happening and keep zero compile time and all these other factors, I think it's not easy to avoid the complexity.
That's interesting. We kind of touched on this with Chris Ladner when he was on the podcast. If you think about frameworks, they have the model target. They have the hardware target. They have different things to think about. He mentioned, when he was at Google, TensorFlow is trying to be optimized to make TPUs go brr and go as fast.
I think George is trying to make, especially AMD stack, be better than Rockum. How come PyTorch has been such as Switzerland versus just making meta hardware go brr? First, meta is not in the business of selling hardware. Meta is not in the business of cloud compute. We kind of-- the way meta thinks about funding PyTorch is it's just like we're funding it because it's net good for meta to fund PyTorch because PyTorch has become a standard and a big open source project.
And generally, it gives us a timeline edge. It gives us various leverage and all that within our own work. So why is PyTorch more of a Switzerland rather than being opinionated? I think the way we think about it is not in terms of Switzerland or not. Actually, the way we articulated to all hardware vendors and software vendors and all who come to us being like, we want to build a backend in core for PyTorch and ship it by default is we just only look at our user side of things.
If users are using a particular piece of hardware, then we want to support it. We very much don't want to king make the hardware side of things. So as the MacBooks have GPUs and as that stuff started getting increasingly interesting, we pushed Apple to push some engineers and work on the MPS support.
And we spent significant time from like meta funded engineers on that as well. Because a lot of people are using the Apple GPUs and there is demand. So we kind of mostly look at it from the demand side. We never look at it from like, oh, which hardware should we start taking opinions on?
Is there a future in which-- because Mojo or Modulus Mojo is kind of a superset of Python-- is there a future in which PyTorch might use Mojo features optionally? I think it depends on how well integrated it is into the Python ecosystem. So if Mojo is like a PIP install and it's readily available and users feel like they can use Mojo so smoothly within their workflows within-- in a way that just is slow friction, we would definitely look into that.
In the same way, PyTorch now depends on Triton, like OpenAI Triton. And we never had a conversation that was like, huh, that's like a dependency. Should we just build a Triton of our own or should we use Triton? It almost doesn't-- those conversations don't really come up for us.
The conversations are more like, well, does Triton have 10,000 dependencies and is it hard to install? We almost don't look at these things from a strategic leverage point of view. We look at these things from a user experience point of view. Is it easy to install? Is it like smoothly integrated?
If so, we should consider-- and does it give enough benefits for us to start depending on it? If so, yeah, we should consider it. That's how we think about it. You're inclusive by default as long as it meets the minimum bar. Yeah. But maybe I phrased it wrongly. Maybe it's more like, OK, what problems would you look to solve that you have right now?
I think it depends on what problems Mojo will be useful at. It's more performance, mainly a performance pitch, some amount of cross-compiling pitch. Yeah, I think the performance pitch for Mojo was like, we're going to performant even if you have a lot of custom stuff. You can write arbitrary custom things, and we will be performant.
And that value proposition is not clear to us from the PyTorch side to consider it for PyTorch. So PyTorch exposes-- it's actually not 250 operators, like 1,000 operators. PyTorch exposes about 1,000 operators, and people write their ideas in the 1,000 operators of PyTorch. Mojo is like, well, maybe it's OK to completely sidestep those 1,000 operators of PyTorch and just write it in a more natural form, just write like raw Python, write for loops or whatever.
So from the consideration of how do we intersect PyTorch with Mojo, I can see one use case where you have custom stuff for some parts of your program, but mostly it's PyTorch. And so we can probably figure out how to make it easier for, say, torch.compile to smoothly also consume Mojo subgraphs, and the interoperability being actually usable.
That I think is valuable. But Mojo as a fundamental front end would be replacing PyTorch, not augmenting PyTorch. So in that sense, I don't see a synergy in more deeply integrating Mojo. So call out to Mojo whenever they have written something in Mojo and there's some performance related thing going on.
And then since you mentioned Apple, what should people think of PyTorch versus MLX? I mean, MLX is early, and I know the folks well. Ani used to work at FAIR, and I used to chat with him all the time. He used to be based out of New York as well.
The way I think about MLX is that MLX is specialized for Apple right now. It has a happy path because it's defined its product in a narrow way. At some point, MLX either says we will only be supporting Apple and we will just focus on enabling-- this is a framework if you use your MacBook, but once you go server side or whatever, that's not my problem and I don't care.
Or MLX, it enters the server side set of things as well. One of these two things will happen, right? If the first thing will happen, MLX's overall addressable market will be small, but it'll probably do well within that addressable market. If it enters the second phase, they're going to run into all the same complexities that we have to deal with.
They will not have any magic wand, and they will have vastly more complex work to do. They probably wouldn't be able to move as fast in certain ways. Like having to deal with distributed compute. Distributed, NVIDIA-named GPUs, just like having a generalization of the concept of a back end, how they treat compilation with plus overheads.
Right now, they deeply assume the whole MPS graph thing. So they need to think about all these additional things if they end up expanding onto the server side. And they'll probably build something like PyTorch as well, right? Eventually, that's where it will land. And I think there they will fail on the lack of differentiation.
It wouldn't be obvious to people why they would want to use it. I mean, there are some cloud companies offering M1 and M2 chips on servers. I feel like it might be interesting for Apple to pursue that market, but it's not their core. Yeah, I mean, if Apple can figure out their interconnect story, maybe, then it can become a thing.
Honestly, that's more interesting than the cars. Yes. I think the mode that NVIDIA has right now, I feel like, is that they have the interconnect that no one else has. AMD GPUs are pretty good. I'm sure there is very silicon that is not bad at all. But the interconnect, like NVLink, is uniquely awesome.
So I'm sure the other hardware providers are working on it. I feel like when you say it's uniquely awesome, you have some appreciation of it that the rest of us don't. I mean, the rest of us just like-- we hear marketing lines, but what do you mean when you say NVIDIA is very good at networking?
Obviously, they made the acquisition maybe 15 years ago. It's just like the bandwidth it offers and the latency it offers. I mean, TPUs also have a good interconnect, but you can't buy them. So you have to go to Google to use it. Who are some of the other fair PyTorch alumni that are building cool companies?
I know you have Fireworks AI, Lightning AI, Lepton. And Youngking, you knew since college when he was building coffee. Yeah, so Yanqing and I used to be framework rivals, like Cafe, Torch. I mean, we were all a very small, close-knit community back then. Cafe, Torch, Tiano, Chainer, Keras, various frameworks.
I mean, it used to be more like 20 frameworks. I can't remember all the names. CCB by Liu Liu, who is also based out of SF. And one of the ways it was interesting is you went into the framework guts and saw if someone wrote their own convolution kernel, or they were just copying someone else's.
And there were four or five convolution kernels that were unique and interesting. There was one from this guy out of Russia. I forgot the name. But I remembered who was awesome enough to have written their own kernel. And at some point there, I built out these benchmarks called ConNet benchmarks that they were just benchmarking all the convolution kernels that were available at that time.
And it hilariously became big enough that at that time, AI was getting important, but not important enough that industrial strength players came in to do these kind of benchmarking and standardization. Like we have MLPerf today. So a lot of the startups were using ConNet benchmarks in their pitch decks as like, oh, you know, on ConNet benchmarks, this is how we fare, so you should fund us.
I remember Nirvana actually was at the top of the pack because Scott Gray wrote amazingly fast convolution kernels at that time. Very interesting, but separate times. But to answer your question, Alessio, I think mainly Lepton fireworks are the two most obvious ones. But I'm sure the fingerprints are a lot wider.
They're just people who worked within the PyTorch Cafe to a cohort of things and now end up at various other places. I think both as an investor and people looking to build on top of their services, it's an uncomfortable slash I don't know what I don't know pitch. Because I've met Yang Ting and I've met-- Lin Chao.
Yeah, I've met these folks. And they're like, you know, we are deep in the PyTorch ecosystem, and we serve billions of inferences a day or whatever at Facebook, and now we can do it for you. And I'm like, OK, that's great. What should I be wary of or cautious of when these things happen?
Because I'm like, obviously, this experience is extremely powerful and valuable. I just don't know what I don't know. What should people know about these sort of new inference as a service companies? At that point, you would be investing in them for their expertise of one kind. So if they've been at a large company, but they've been doing amazing work, you would be thinking about it as like, OK, what these people bring to the table is that they're really good at GPU programming or understanding the complexity of serving models once it hits a certain scale, various expertise from the infra and AI and GPUs point of view.
What you would obviously want to figure out is whether their understanding of the external markets is clear, whether they know and understand how to think about running a business, understanding how to be disciplined about making money, or various things like that. Maybe I'll put it-- actually, I will de-emphasize the investing bit, and just more as a potential customer.
It's more like, OK, you're PyTorch gods, of course. What else should I know? I mean, I would not care about who's building something if I'm trying to be a customer. I would care about whether-- The benchmarks. Yeah, I use it. And it's usability, and reliability, and speed. Quality as well.
Yeah, if someone from some random unknown place came to me and said, user stuff is great, and I have the bandwidth, I probably will give it a shot. And if it turns out to be great, I'll just use it. OK, great. And then maybe one more thing about benchmarks, since we already brought it up, and you brought up Confnet benchmarks.
There was some recent drama around Antiscale. Antiscale released their own benchmarks, and obviously they looked great on their own benchmarks. But maybe didn't give the other-- I feel like there are two lines of criticism. One, which is they didn't test apples for apples on the kind of endpoints that the other providers that they are competitors with on their benchmarks.
And that is due diligence baseline. And then the second would be more just optimizing for the right thing. You had some commentary on it. I'll just let you riff. Yeah, I mean, in summary, basically my criticism that Antiscale built these benchmarks for end users to just understand what they should pick.
And that's a very good thing to do. I think what they didn't do a good job of is give that end user a full understanding of what they should pick. They just gave them a very narrow slice of understanding. I think they just gave them latency numbers, and that's not sufficient.
You need to understand your total cost of ownership at some reasonable scale. Not like, oh, like one API call is like $0.01, but like 1,000 API calls are like $0.10. People can misprice to cheat on those benchmarks. So you want to understand, OK, how much is it going to cost me if I actually subscribe to you and do like a million API calls a month or something?
And then you want to understand the latency and reliability, not just from one call you made, but an aggregate of calls you made over various times of the day and times of the week and the nature of the workloads. Is it just like some generic single paragraph that you're sending that is cashable, or is it like testing a real world workload?
I think that kind of rigor in presenting that benchmark wasn't there. It was a much more narrow sliver of what should have been a good benchmark. That was my main criticism. And I'm pretty sure if before they released it, they showed it to their other stakeholders who would be caring about this benchmark because they are present in it, they would have easily just pointed out these gaps.
And I think they didn't do that, and they just released it. So I think those were the two main criticisms. And I think they were fair, and Robert took it well. He took it very well. Yeah, we'll have him on at some point, and we'll discuss it. But I think it's important for-- I think the market being maturing enough that people start caring and competing on these kinds of things means that we need to establish what best practice is, because otherwise everyone's going to play dirty.
Yeah, absolutely. My view of the LLM inference market in general is that it's like the laundromat model. The margins are going to drive down towards the bare minimum. It's going to be all kinds of arbitrage between how much you can get the hardware for and then how much you sell the API and how much latency your customers are willing to let go.
You need to figure out how to squeeze your margins. What is your unique thing here? I think Together and Fireworks and all these people are trying to build some faster CUDA kernels and faster hardware kernels in general. But those modes only last for a month or two. These ideas quickly propagate.
Even if they're not published? Even if they're not published, the idea space is small. So even if they're not published, the discovery rate is going to be pretty high. It's not like we're talking about a combinatorial thing that is really large. You're talking about like llama-style LLM models, and we're going to beat those to death on a few different hardware SKUs.
It's not even like we have a huge diversity of hardware you're going to aim to run it on. Now when you have such a narrow problem and you have a lot of people working on it, the rate at which these ideas are going to get figured out is going to be pretty rapid.
Is it like a standard bag of tricks? The standard one that I know of is fusing operators and-- Yeah, it's the standard bag of tricks on figuring out how to improve your memory bandwidth and all that. OK, interesting. Any ideas instead of things that are not being beaten to death that people should be paying more attention to?
One thing I was like, you have 1,000 operators. What's the most interesting usage of PyTorch that you're seeing maybe outside of this little bubble? So PyTorch, it's very interesting and scary at the same time. But basically, it's used in a lot of exotic ways from the ML angle, like, OK, what kind of models are being built?
And you get all the way from state space model then all these things to stuff like nth-order differentiable models, like neural IDs and stuff like that. I think there's one set of interestingness factor from the ML side of things. And then there's the other set of interesting factor from the applications point of view.
It's used in Mars Rover simulations, to drug discovery, to Tesla cars. And there's a huge diversity of applications in which it is used. So in terms of the most-- I think in terms of the most interesting application side of things, I think I am scared at how many interesting things that are also very critical and really important it is used in.
I think the scariest was when I went to visit CERN at some point. And they said they were using PyTorch. And they were using GANs at the same time for particle physics research. And I was scared more about the fact that they were using GANs than they were using PyTorch.
Because at that time, I was a researcher focusing on GANs. The diversity is probably the most interesting, how many different things it is being used in. I think that's the most interesting to me from the application's perspective. From the model's perspective, I think I've seen a lot of them.
The really interesting ones to me are where we're starting to combine search and symbolic stuff with differentiable models. I think the whole AlphaGo style model is one example. And then I think we're attempting to do it for elements as well with various reward models and then search. I don't think PyTorch is being used in this, but the whole alpha geometry thing was interesting.
Because again, it's an example of combining the symbolic models with the gradient-based ones. But there are stuff like alpha geometry that PyTorch is used at, especially when you intersect biology and chemistry with ML. In those areas, you want stronger guarantees on the output. So yeah, maybe from the ML side, those things to me are very interesting right now.
- Yeah. People are very excited about the alpha geometry thing. For me, it's theoretical. It's great. You can solve some Olympiad questions. I'm not sure how to make that bridge over into the real-world applications, but I'm sure it-- - Well, OK. - --will figure it out. - Let me give you an example of it.
You know how the whole thing about synthetic data will be the next rage in LLMs is a thing? - Already is a rage. - Which I think is fairly misplaced in how people perceive it. People think synthetic data is some kind of magic wand that you wave, and it's going to be amazing.
Synthetic data is useful in neural networks right now because we, as humans, have figured out a bunch of symbolic models of the world or made up certain symbolic models because of human innate biases. So we've figured out how to ground particle physics in a 30-parameter model. And it's just very hard to compute.
As in, it takes a lot of flops to compute, but it only has 30 parameters or so. I mean, I'm not a physics expert, but it's a very low-rank model. We built mathematics as a field that basically is very low-rank. Language, a deep understanding of language, like the whole syntactic parse trees and just understanding how language can be broken down into formal symbolism is something that we've figured out.
So we basically, as humans, have accumulated all this knowledge on these subjects, either synthetically-- I mean, we created those subjects in our heads, or we've grounded some real-world phenomenon into a set of symbols. But we haven't figured out how to teach neural networks symbolic world models directly. The only way we have to teach them is generating a bunch of inputs and outputs and gradient descending over them.
So in areas where we have the symbolic models and we need to teach all the knowledge we have that is better encoded in the symbolic models, what we're doing is we're generating a bunch of synthetic data, a bunch of input-output pairs, and then giving that to the neural network and asking it to learn the same thing that we already have a better low-rank model of in gradient descent in a much more overparameterized way.
Outside of this, where we don't have good symbolic models, synthetic data obviously doesn't make any sense. So synthetic data is not a magic wand where it'll work in all cases and every case or whatever. It's just where we as humans already have good symbolic models of, we need to impart that knowledge to neural networks and we figured out the synthetic data is a vehicle to impart this knowledge to.
But people, because maybe they don't know enough about synthetic data as a notion, but they hear the next wave of data revolution is synthetic data, they think it's some kind of magic where we just create a bunch of random data somehow. They don't think about how. And then they think that's just a revolution, and I think that's maybe a gap in understanding most people have in this hype cycle.
- Yeah, well, it's a relatively new concept. - Yeah. - There's two more that I'll put in front of you and then see what you respond. One is, I have this joke that it's only synthetic data if it's from the Mistral region of France, otherwise it's a sparkling distillation, which is what news research is doing.
They're distilling GPT-4 by creating synthetic data from GPT-4, creating mock textbooks inspired by Phi-2, and then fine-tuning open source models like LAMA. - Yeah. - And so, should we call that synthetic data? Should we call it something else? I don't know, but it's-- - Yeah, I mean, the outputs of LLMs, are they synthetic data?
They probably are, but I think it depends on the goal you have. If your goal is you're creating synthetic data with the goal of trying to distill GPT-4's superiority into another model, I guess you can call it synthetic data, but it also feels disingenuous because your goal is like, I need to copy the behavior of GPT-4 and-- - It's also not just behavior, but data set.
- Yeah. - I've often thought of this as data set washing. You need one model at the top of the chain. - Yeah, yeah. - Unnamed French company that makes a model that has all the data in it that we don't know where it's from, but it's open source, hey, and then we distill from that.
- Yeah. - And it's great. (laughing) - Yeah. - But they also, to be fair, they also use larger models as judges or for preference ranking, right? - Yes. - That is, I think, a very, very accepted use of synthetic. - Correct. I think it's a very interesting time where we don't really have good social models of what is acceptable depending on how many bits of information you use from someone else, right?
It's like, okay, you use like one bit, is that okay? Yeah, that's accepted to be okay. Okay, what about if you use like 20 bits, is that okay? But I don't know. What if you use like 200 bits? Like, I don't think we as society have ever been in this conundrum where we have to be like, where is the boundary of copyright or where is the boundary of socially accepted understanding of copying someone else?
Like, we haven't been tested this mathematically before, in my opinion. - Yeah, where there's transformative use. - Yes. - So yeah, I think this New York Times open AI case is gonna go to the Supreme Court. - Yeah. - And we'll have to decide it 'cause-- - I think it'll be very interesting.
- Never had to deal with it before. And then finally, for synthetic data, the thing that I'm personally exploring is solving this very stark paradigm difference between rag and fine tuning, where you can kind of create synthetic data off of your retrieved documents. - Yeah. - And then fine tune on that.
That's kind of synthetic. All you need is variation or diversity of samples for you to fine tune on. And then you can fine tune your knowledge into your model. - Yeah. - I don't know if you've seen that as a direction for synthetic data. - I think that is, that is like you're basically trying to create, like what you're doing is you're saying, well, language, I know how to parameterize language to an extent.
- Yeah. - And I need to teach my model variations of this input data so that it's resilient or invariant to language uses of that data. - Yeah, it doesn't overfit on-- - Yeah, so I think that's 100% like synthetic, right? You understand, like the key is like, you create variations of your documents and you know how to do that because you have a symbolic model or like some implicit symbolic model of language.
- Okay. Do you think the issue with symbolic models is just the architecture of the language models that we're building? I think like the, maybe the thing that people grasp is like the inability of transformers to deal with numbers because of the tokenizer. Is it a fundamental issue there too and do you see alternative architectures that will be better with symbolic understanding?
- I am not sure if it's a fundamental issue or not. I think we just don't understand transformers enough. I don't even mean transformers as an architecture. I mean like the use of transformers today, like combining the tokenizer and transformers and the dynamics of training, like when you show math heavy questions versus not.
I don't have a good calibration of whether I know the answer or not. I, you know, there's common criticisms that are like, well, you know, transformers will just fail at X but then when you scale them up to sufficient scale, they actually don't fail at that X. I think this is, this entire subfield where they're trying to figure out these answers called like the science of deep learning or something.
So we'll get to know more. I don't know the answer. - Got it. Let's touch a little bit on just meta AI and you know, stuff that's going on there. Maybe, I don't know how deeply you're personally involved in it but you're our first guest from meta AI which is really fantastic.
And LlamaOne was, you know, you are such a believer in open source. LlamaOne was more or less like the real breakthrough in open source AI. The most interesting thing for us covering in this podcast was the depth of Chinchilla, as people say. Any interesting insights there around like the scaling models for open source models or smaller models or whatever that design decision was when you guys were doing it?
- So LlamaOne was Guillaume Lample and team. There was OPT before, which I'm also very proud of. - That's true. - Because we bridged the gap in understanding of how complex it is to train these models to the world. Like until then, no one really, in gory detail, published-- - The logs.
- Yeah, like why is it complex? And everyone says like, oh, it's complex. But no one really talked about why it's complex. So I think OPT was cool. We probably-- - I met Susan and she's very, very outspoken. - Yeah, we probably, I think, didn't train it for long enough, right?
Like, you know, that's kind of obvious in retrospect. - For a 175B? - Yeah. - But you trained it according to Chinchilla at the time or? - I can't remember the details, but I think it's a commonly held belief at this point that like, well, if we trade OPT longer, it would actually end up being better.
Llama one, I think was, yeah, Guillaume Lample and team Guillaume is fantastic and went on to build Mistral. I wasn't too involved in that side of things. So I don't know what you're asking me, which is like, well, like, how did they think about scaling laws and all of that?
Llama two, I was more closely involved in. I helped them a reasonable amount with like their infrastructure needs and stuff. Llama two, I think was more like, let's get to the evolution. At that point, we kind of understood what we were missing from the industry's understanding of LLMs and we needed more data and we needed more to train the models for longer.
And we made, I think, a few tweaks to the architecture and we scaled up more and like that is llama two. I think llama two, you can think of it as like, after Guillaume left, the team kind of rebuilt their muscle around llama two. And Hugo, I think, who's the first daughter is fantastic.
And I think he did play a reasonable big role in llama one as well and he overlaps between llama one and two. So in llama three, obviously, hopefully will be awesome. - Just one question on llama two and then we'll try and fish llama three spoilers out of you.
In the llama two paper, the loss curves of the 34 and 70B parameter, they still seem kind of steep, but they could go lower. How, from an infrastructure level, how do you allocate resources? Could they have just gone longer or were you just like, hey, this is all the GPUs that we can burn and let's just move on to llama three and then make that one better?
- Instead of answering specifically about that llama two situation or whatever, I'll tell you how we think about things. Generally, we have, I mean, Mark really is some numbers, right? So let's cite those things again. All I remember is like 600K GPUs. - That is by the end of this year and 600K H100 equivalents with 250K H100s and including all of the other GPU or accelerator stuff, it would be 600 and something K aggregate capacity.
That's a lot of GPUs, we'll talk about it separately, but the way we think about it is we have a train of models, right? Llama one, two, three, four. And we have a bunch of GPUs. I don't think we're short of GPUs. - Yeah, no, I wouldn't say so.
- Yeah, so I think it's all a matter of time. I think time is the biggest bottleneck. It's like when do you stop training the previous one and when do you start training the next one and how do you make those decisions? The data, do you have net new data, better clean data for the next one in a way that it's not worth like really focusing on the previous one.
It's just a standard iterative product. You're like, when is the iPhone one? When you start working iPhone two, where is the iPhone? Like so on, right? So mostly the considerations are time and generation rather than GPUs in my opinion. - So one of the things with the scaling laws, like Chinchilla is like optimal to balance training and inference costs.
I think at Facebook scale or Metascale, you would rather pay a lot more maybe at training and then save on inference. How do you think about that from a infrastructure perspective? I think in your tweet you say you can try and guess on like how we're using these GPUs.
Can you just give people a bit of understanding? It's like, because I've already seen a lot of VCs say, Llama 3 has been trained on 600,000 GPUs and that's obviously not true, I'm sure. How do you allocate between the research like FAIR and the Llama training, the inference on Instagram suggestions that got me to scroll, like AI generated stickers on WhatsApp and all that?
- Yeah, we haven't talked about any of this publicly but like as a broad stroke, it's like how we would allocate resources of any other kinds at any company. You run a company, you run like a VC portfolio, like how do you allocate your investments between different companies or whatever?
You kind of make various trade offs and you kind of decide should I invest in this project or this other project or how much should I invest in this project? It's very much like a zero sum of trade offs and it also comes into play like how is your, how are your like clusters configured?
Like overall, like what you can fit of what size and what cluster and so on. So broadly, there's no magic sauce here. Like, I mean, I think the details would add more spice but also wouldn't add more understanding. It's just gonna be like, oh, okay. I mean, this looks like they just think about this as I would normally do.
- Right, so even the GPU rich run through the same struggles while having to decide where to allocate things? - Yeah, I mean like at some point, I forgot who said it but it's like you kind of fit your bottles to the amount of compute you have. If you don't have enough compute, you figure out how to make do with smaller models but like no one as of today, I think would feel like they have enough compute.
I don't think like I have heard any company within the AI space be like, oh yeah, like we feel like we have sufficient compute and we couldn't have done better. So like that conversation, I don't think I've heard from any of my friends at other companies. - Stella from Eleuther sometimes says that because she has a lot of donated compute and she's trying to put it to interesting uses but for some reason, she's decided to stop making large models.
- I mean, that's a cool high conviction opinion that might pay out, right? I mean, she's taking a path that most people don't care to take about in this climate and she probably will have very differentiated ideas and I mean, think about the correlation of ideas in AI right now, it's so bad, right?
So everyone's fighting for the same pie. In some weird sense, like that's partly why I don't really directly work on LLMs. I used to be a, I used to do image models and stuff and I actually stopped doing GANs because GANs were getting so hot that I didn't have any calibration of whether my work would be useful or not because oh yeah, someone else did the same thing you did.
It's like, there's so much to do, I don't understand why I need to fight for the same pie. So I think Stella's decision is very smart. - And how do you reconcile that with how we started the discussion about intrinsic versus extrinsic kind of like accomplishment or success? How should people think about that one, especially when they're doing a PhD or like early in their career?
It seems like, I think in Europe's, I walked through a lot of the posters and whatnot, there seems to be multiple apps in a way in the research, a lot of people working on the same things. Is it worth for like a PhD to not take a bet on something that is like maybe not as interesting, just because of funding and visibility and whatnot or yeah, what suggestions would you give?
- I think there's a baseline level of compatibility you need to have with the field. Basically, you need to figure out if you will get paid enough to eat, right? Like, and like whatever reasonable, normal lifestyle you want to have as a baseline. So you at least have to pick a problem within the neighborhood of like fundable.
Like you wouldn't want to be doing something so obscure that people are like, I don't know, like you can work on it. With a limit on fundability, I'm just like observing something like three months of compute, right? That's the top line. That's the like max that you can spend on any one project.
- But like, I think that's very ill specified, like how much compute? - Yeah. - So I think the notion of fundability is broader. It's more like, hey, are these family of models within the acceptable set of you're not crazy or something, right? Like even something like neural RDEs, which is a very like boundary pushing thing or like state space models or whatever.
Like all of these things I think are still in fundable territory. When you're talking about, I'm gonna do one of the neuromorphic models and then apply like image classification to them or something, then it becomes like a bit questionable. Again, it depends on your motivation. Maybe if you're a neuroscientist, it actually is feasible.
But if you're like a AI engineer, like the audience of these podcasts, then it's less, it's more questionable. So I think like, the way I think about it is like, you need to figure out how you can be in the baseline level of fundability just so that you can just live.
And then after that, really focus on intrinsic motivation and depends on your strengths, like how you can play to your strengths and your interests at the same time. Like you, like I try to look at a bunch of ideas that are interesting to me, but also try to play to my strengths.
I'm not gonna go work on theoretical ML. I'm interested in it, but when I want to work on something like that, I try to partner with someone who is actually a good like theoretical ML person and see if I actually have any value to provide. And if they think I do, then I come in.
So I think you'd want to find that intersection of ideas you like, and that also play to your strengths. And I'd go from there. Everything else, like actually finding extrinsic success and all of that I think is, the way I think about it is like somewhat immaterial. When you're talking about building ecosystems and stuff, like slightly different considerations come into play, but that's a different conversation.
- Yeah, I should, we're gonna pivot a little bit to just talk about open source AI. But one more thing I wanted to establish for meta is like this 600K number, just kind of rounding out the discussion, that's for all meta. So including your own inference needs, right? It's not just about training.
- It's for all, it's gonna be the number in our data centers for all of meta, yeah. - Yeah, so like, there's a decent amount of workload serving Facebook and Instagram and you know, whatever. And then is there interest in like your own hardware? - We already talked about our own hardware.
It's called MTIA, our own silicon. I think we've even showed like the standard photograph of you holding the chip that doesn't work. I mean, like as in the chip that you basically just get like-- - As a test? - Yeah, a test chip or whatever. So we are working on our silicon and we'll probably talk more about it when the time is right, but-- - Like what gaps do you have that the market doesn't offer?
- Okay, I mean this is easy to answer. So basically, remember how I told you about the whole, like there's this memory hierarchy and like sweet spots and all of that? Fundamentally, like when you build a hardware, like you make it general enough that a wide set of customers and a wide set of workloads can use it effectively while trying to get the maximum level of performance they can.
The more specialized you make the chip, the more hardware efficient it's going to be, the more power efficient it's gonna be, the more easier it's going to be to find like the software, like the kernel's right to just map one, that one or two workloads to that hardware and so on.
So it's pretty well understood across the industry that if you have a sufficiently large volume enough workload, you can specialize it and get some efficiency gains, like power gains and so on. So the way you can think about everyone building, every large company building silicon, like I think a bunch of the other large companies are building their own silicon as well, is each large company has a sufficient enough set of verticalized workloads that have a pattern to them that say a more generic accelerator like an Nvidia or an AMD GPU does not exploit.
So there is some level of power efficiency that you're leaving on the table by not exploiting that. And you have sufficient skill and you have sufficient forecasted stability that those workloads will exist in the same form, that it's worth spending the time to build out a chip to exploit that sweet spot.
Like obviously something like this is only useful if you hit a certain scale and that you're like forecasted prediction of those kinds of workloads being in the same kind of specializable exploitable way is true. So yeah, that's why we're building our own chips. - Amazing, awesome. Yeah, I know we've been talking a lot on a lot of different topics and going back to open source, you had a very good tweet.
You said that a single company's close source effort rate limits against people's imaginations and needs. How do you think about that? How do you think about all the impact that some of the meta AI work in open source has been doing and maybe directions of the whole open source AI space?
- Yeah. In general, I think first I think it's worth talking about this in terms of open and not just open source because like with the whole notion of model weights, no one even knows what source means for these things. But just for the discussion, when I say open source, you can assume it's just I'm talking about open.
And then there's the whole notion of like licensing and all that like, you know-- - Commercial. - Commercial, non-commercial, commercial with clauses and all that. I think like at a fundamental level, the most benefited value of open source is that you make the distribution to be very wide. Like it's just available with no friction and like people can do transformative things.
In a way that's very accessible. Like maybe like it's open source, but it has a commercial license and I'm a student like in India. I don't care about the license. I just don't even understand the license. But like the fact that I can use it and do something with it is very transformative to me.
Like I got this thing in a very accessible way. And then like so it's very, very various degrees, right? And then like if it's open source, but it's like actually like a commercial license, then a lot of companies are gonna benefit from like gaining value that they didn't previously have that they maybe had to pay a closed source company for it.
So open source is just a very interesting tool that you can use in various ways. So there's, again, two kinds of open source. One is like some large company doing a lot of work and then open sourcing it. And that kind of effort is not really feasible by say like a band of volunteers doing it the same way.
So there's both a capital and operational expenditure that the large company just decided to ignore and give it away to the world for some benefits of some kind. They're not as tangible as like direct revenue or something. So in that part, Meta has been doing incredibly good things. They fund a huge amount of the PyTorch development.
They've open sourced Llama and those family of models. And several other fairly transformative projects. FICE is one, Segment Anything, Detectron, Detectron 2, Densepose, I mean it's-- - Seamless. - Yeah, Seamless. It's just like the list is so long that we're not gonna cover. So I think Meta comes into that category where we spend a lot of capex and opex and we have a high talent density of great AI people.
And we open our stuff. And the thesis for that, I remember when Fair was started, the common thing was like wait, why would Meta wanna start a open AI lab? What exactly is the benefit from a commercial perspective? And then the thesis was very simple. It was like AI is currently rate limiting Meta's ability to do things.
Our ability to build various product integrations, moderation, various other factors. AI was the limiting factor. And we just wanted AI to advance more. And we didn't care if the IP of the AI was uniquely in our possession or not for us. However the field advances, that accelerates Meta's ability to build a better product.
So we just built an open AI lab and we said, if this helps accelerate the progress of AI, that's strictly great for us. But very easy rational, right? Still the same to a large extent with the Llama stuff and it's a bit more, I think it's the same values, but the argument, it's a bit more nuanced.
And then there's the second kind of open source, which is oh, we built this project nights and weekends and we're very smart people and we open sourced it and then we built a community around it. This is like the Linux kernel and various software projects like that. So I think about open source, like both of these things being beneficial and both of these things being different.
They're different and beneficial in their own ways. The second one is really useful when there's an active arbitrage to be done. If someone's not really looking at a particular space, because it's not commercially viable or whatever, like a band of volunteers can just coordinate online and do something and then make that happen.
And that's great. I wanna cover a little bit about open source LLMs maybe. So open source LLMs have been very interesting because I think we were trending towards an increase in open source in AI from 2010 all the way to like 2017 or something. Like where more and more pressure within the community was to open source their stuff so that their methods and stuff get adopted.
And then the LLM revolution kind of took the opposite effect. Open AI stopped open sourcing their stuff and DeepMind kind of like all the other cloud and all these other providers, they didn't open source their stuff. And it was not good in the sense that first, like science done in isolation probably will just form its own bubble where like people believe their own bullshit or whatever, right?
So there's that problem. And then there was the other problem which was the accessibility part. Like, okay, I again always go back to like, I'm a student in India with no money. What is my accessibility to any of these closer models? At some scale I have to pay money.
That makes it a non-starter and stuff. And there is also the control thing. I strongly believe the best, if you want human-aligned stuff, you want all humans to give feedback and you want all humans to have access to their technology in the first place. And I actually have seen, living in New York, whenever I come to Silicon Valley I see a different cultural bubble.
Like all the friends I hang out with talk about some random thing, like Dyson spheres or whatever, that's a thing. And most of the world doesn't know or care about any of this stuff. Like it's definitely like a bubble and bubbles can form very easily. And when you make a lot of decisions because you're in a bubble, they're probably not globally optimal decisions.
So I think open source, the distribution of open source, powers a certain kind of non-falsifiability that I think is very important. So I think on the open source models, it's going great in the fact that Laura, I think, came out of the necessity of open source models needing to be fine-tunable in some way.
- Yeah, and I think DPO also came out of the academic open source side of things. So do any of the closed source labs, did any of them already have Laura or DPO internally? Maybe, but that does not advance humanity in any way. It advances some company's probability of doing the winner takes all that I talked about earlier in the podcast.
So I don't know, it just feels fundamentally good. Like when people try to, people are like, well, what are the ways in which it is not okay? And this might be a little controversial, but I find a lot of arguments based on whether closed source models are safer or open source models are safer, very much related to what kind of cultural culture they grew up in, what kind of society they grew up in.
If they grew up in a society that they trusted, then I think they take the closed source argument. And if they grew up in a society that they couldn't trust, where the norm was that you didn't trust your government, obviously, like it's corrupt or whatever, then I think the open source argument is what they take.
I think there's a deep connection to people's innate biases from their childhood and their trust in society and governmental aspects that push them towards one opinion or the other. And I'm definitely in the camp of open source is definitely going to actually have better outcomes for society. Closed source to me just means that centralization of power, which is really hard to trust.
So I think it's going well in so many ways. We're actively disaggregating the centralization of power to just two or three providers. We are, I think, benefiting from so many people using these models in so many ways that aren't allowed by say Silicon Valley left wing tropes. Some of these things are good or bad, but they're not culturally accepted universally in the world.
So those are things worth thinking about. And I think open source is not winning in certain ways. These are all the things in which, as I mentioned, it's actually being very good and beneficial and winning. I think one of the ways in which it's not winning, at some point I should write a long form post about this, is I think it has a classic coordination problem.
I mean, open source in general always has a coordination problem. If there's a vertically integrated provider with more resources, they will just be better coordinated than open source. And so now open source has to figure out how to have coordinated benefits. And the reason you want coordinated benefits is because these models are getting better based on human feedback.
And if you see with open source models, like if you go to Reddit, local llama, subreddit, there's so many variations of models that are being produced from say, NOS research. I mean, there's so many variations built by so many people. And one common theme is they're all using these fine tuning or human preferences data sets that are very limited and someone published them somewhere and they're not sufficiently diverse.
And you look at the other side, like say front-ends like Uber or Hugging Chat or Ollama, they don't really have feedback buttons. All the people using all of these front-ends, they probably want to give feedback but there's no way for them to give feedback. So these models are being built, they're being arbitrarily measured and then they are being deployed into all these open source front-ends or like apps that are closed source, they're serving open source models.
And these front-ends don't have, they are not exposing the ability to give feedback. So we're just losing all of this feedback. Maybe open source models are being as used as GPT is at this point in all kinds of, in a very fragmented way. Like in aggregate, all the open source models together are probably being used as much as GPT is, maybe close to that.
But the amount of feedback that is driving back into the open source ecosystem is like negligible, maybe less than 1% of the usage. So I think like some, like the blueprint here I think is, you'd want someone to create a sinkhole for the feedback, some centralized sinkhole, like maybe Hugging Face or someone just finds like, okay, like I will make available a call to log a string along with like a bit of information of positive or negative or something like that.
And then you would want to send pull requests to all the open source front ends, like Uber and all, being like, hey, we're just integrating like a feedback UI. And then work with like the closed source people is also being like, look, it doesn't cost you anything, just like have a button.
And then the sinkhole will have a bunch of this data coming in. And then I think a bunch of open source researchers should figure out how to filter the feedback into only the like high quality one. I'm sure like it will be exploited by spam bots or whatever, right?
This is like the perfect way to inject your advertising product into like the next-- - Buy Coca Cola now. - So there needs to be some level of that. In the same way, I'm sure like all the close providers are doing today, like OpenAI, Claude, like the feedback that comes in, I'm sure they are figuring out if that's legit or not.
That kind of data filtering needs to be done. And that loop has to be set up. And this requires that central sinkhole and that like data cleaning effort both to be like there. They're not there right now. They're not there right now. I think for capital reasons, but also for coordination reasons.
Okay, if that central sinkhole is there, who's gonna go coordinate all of this integration across all of these like open source front ends. But I think if we do that, if that actually happens, I think that probably has a real chance of the open source models having a runaway effect against OpenAI with their current like daily active users rumored.
Probably doesn't have a chance against Google because Google has Android and Chrome and Gmail and Google Docs and everything. So people just use that a lot. But like I think like there's a clear chance we can take at truly winning open source. - Do you think this feedback is helpful to make open source models better or to get to like open source AGI?
Because in a way like OpenAI's goal is to get to AGI, right? So versus I think in open source, we're more focused on personal better usage or like-- - Yeah, I think that's a good question. But I think like, largely I actually don't think people have a good understanding of AGI and I don't mean definition level.
I mean, people are like, okay, we're gonna, AGI means it's powering 40% of world economic output or something like that, right? But what does that mean? So do you think electricity is powering 40% of world economic output or is it not? Like, generally the notion of like powering X percent of economic output is not defined well at all for me to understand like how to know when we got to AGI or how to measure whether we're getting AGI.
Like, you know, you can look at it in terms of intelligence or task automation or whatever. I think that's what we are doing right now. We're basically integrating like the current set of AI technologies into so many real world use cases where we find value that if some new version of AI comes in, we can find, like we can be like, ah, this helps me more.
In that sense, I think like the whole process of like how we think we got to AGI will be continuous and not like not discontinuous like how I think the question is posed. So I think the open source thing will be very much in line with getting to AGI because open source has that like natural selection effect.
Like if a better open source model comes, really no one says, huh, I don't wanna use it because there are ecosystem effect. I'm logged into my ecosystem or like, I don't know if I like the models, you know, whatever. It's just a very pure direct thing. So if there's a better model that comes out, then it will be used.
So I definitely think it has a good chance of achieving how I would think about as a continuous path to what we might define as AGI. - For the listeners, I will actually mention a couple other maybe related notes on just this very interesting concept of feedbacks and coal for open source to really catch up in terms of the overall Google versus OpenAI debate.
Open Assistant was led by Yannick Kilcher who recently ended his effort. I think the criticism there was like the kind of people that go to a specific website to give feedback is not representative of real world usage and that's why the models trained on Open Assistant didn't really seem like they'd have caught on in the open source world.
The two leading candidates in my mind are LMSIS out of UC Berkeley who have the LMSIS arena which is being touted as one of the only ways, only reliable benchmarks anymore. I kinda call them non-parametric benchmarks 'cause there's nothing to cheat on except for ELO. And then the other one is OpenRouter which is Alex Atala's thing.
I don't know if you've talked to any of these people. - I obviously know all of the efforts that you talked about. I haven't talked to them directly about this yet but the way I think about it is the way these models are going to be used is always going to be way more distributed than centralized.
Like which is the power of the open source movement. Like the UI within which these models are going to be used is going to be decentralized. Like these models are going to be integrated into like hundreds and thousands of projects and products and all of that, right? And I think that is important to recognize.
Like the LMSIS leaderboard is the best thing we have right now to understand whether a model is better or not versus another model. But it's also biased and only having a sliver of view into how people actually use these models. Like the people who actually end up coming to the LMSIS leaderboard and then using a model only use it for certain things.
Like GitHub co-pilot style usage is not captured in say like LMSIS thing and so many other styles like the character AI style things is not captured in LMSIS. - Which OpenRouter could do. They don't do it right now. - Yeah, so like I think like yeah, my point is like the way these models are going to be used is going to be always a large surface area.
And I think we need to figure out how to provide infrastructure to integrate with all these like ways in which it's being used. Even if you get like the top hundred front ends that the model like open source models are used through, subscribe to like the sinkhole. I think that's already like a substantial thing.
I think like thinking one or two things will by themselves get a lot of data I think is not going to happen. - Yep, fair enough. Before we let you go, can we do just a quick beyond text segment? So you're an investor in Runway, which is a leader generation.
You're an investor in 1X, which is a humanoid assistant. Osmo, which is focused on using AI for smell recognition and synthesis. You advise a bunch of robotics projects at NYU. - And he builds his own home robot. - Yeah, exactly. On a more, yeah, maybe you have another thing.
What are like the things that you're most excited about beyond like text generation and kind of the more mundane usage? - Yeah, I mean in general, I have more things I'm generally excited about than I can possibly do. Investing is one way to try to clear those urges. I'm generally excited about robotics being a possibility, home robotics being like five to seven years away into commercialization.
I think like it's not like next year or two years from now, but like five to seven years from now, I think like a lot more robotics companies might pop out. There's not a good consensus on whether hardware is a bottleneck or AI is a bottleneck in robotics right now.
My view is actually hardware is still the bottleneck and AI is also a little bit of bottleneck, but like I don't think there's any like obvious breakthroughs we need. I think it just work. So I'm generally excited about robotics. I spend a lot of time, a lot of personal time, I spend like every Wednesday afternoon at NYU working with Laurel Pinto and team and just getting towards my like home robot that just does my dishes and stuff.
- What's the status of it? Like what does it do for you now? - As of today, we just deployed a couple months ago, we deployed our home robotic stuff into like several tens of New York City homes and like try to make it do a bunch of tasks.
And we're basically starting to build out a framework that gets to a certain level of robustness on fairly simple tasks, like picking this cup and putting it somewhere else or like taking a few pieces of cloth on the ground and put it somewhere else or open your microwave. Like various like baseline tasks like that with low sample complexity.
So like the key thing, I think one of the things people don't spend their time in robotics is like the user experience, which I think in the research I do at NYU, we spend a huge amount of time on. I think the key there is sample complexity has to be really low.
A lot of the current robotics research if you see, they're like, oh yeah, we collected like 50 demos and now it's able to do this task or we collected like 300 demos or like... It's a sample, the number of samples you need for this thing to do the task is really high.
So we're focusing a lot on... You show it like two or three times and that's sufficient for it to actually like do the task. But it comes with like less generalization, right? Like there's some initial conditions that have to be true for it to do the task. So we're making progress.
That's very interesting in general, the space. I don't think people in this space have settled on the hardware, like how the hardware looks like for it to be truly useful in the home or whatever. Or the UX or the like AI/ML stuff needed to make it sample efficient and all of that.
But I think like lots of work is happening in the field. - Yeah, one of my friends, Carlo at Berkeley, he worked on a project called M3L, which is two CNNs, one for tactile feedback and one for image. When you say hardware, is it running all these things on the edge or is it just like the actual servos and the...
- Yeah, by hardware I mean like the actual like servos, like the motors, servos, even like the sensors, I think we have incredible vision that still like is so much better compared to in the field of view and in resolution compared to any of the cameras we can buy.
We have, our skin is like all available touch sensing and we have like some of the most efficient, some of the most high capacity motors that can lift large loads in like the dexterity of a hand and stuff. So in terms of hardware, I mean like in terms of those capabilities, like we haven't figured out how to do a lot of this stuff.
I mean Tesla has been making incredible progress. OneX I think announced their new thing that looks incredible. Some of the other companies figure and like others are doing great work. But we're really not anywhere close to like the hardware that we feel like we need. And there's obviously the other thing I want to call out is a lot of what people show works, but like has to be fixed all the time.
I mean like that's the other thing we are incredible at. Like we don't need any maintenance or like the maintenance is part of us. If you buy a product, an electronics product of any kind, you buy a PS5, you don't say, oh yeah, my PS5 breaks like every six days and I have to like do some reasonable amount of work on it.
But like that's robotics, like if it's not industrial robotics where it's very controlled and specialized or whatever, like you're talking about reliability like in those ranges. So I think people don't talk about the reliability thing enough. Like when I mean like we're gonna enter the commercialization phase, I mean like we're gonna start thinking about, okay, now we have this thing and we need to figure out how to get reliability high enough to deploy it into homes and like just sell it to people and like Best Buy or something.
So that's the other factor that we have to make a lot of progress on. - I just realized that Google has a play in this with like Palm E and stuff and OpenAI obviously has a long history of doing this stuff. Is there anything at Meta? No robotics stuff at Meta?
- I used to, we have a small robotics program at Meta out of FAIR. I actually used to do it at FAIR a little bit before I moved into Infra and focused on my Meta time on a lot of like other infrastructural stuff. So yeah, Meta's robotics program is a lot smaller.
- Seems like it would be a fit in personal computing. - You can think of it as like Meta has a ridiculously large device strategy, right? Like this is how our reality labs stuff. Like we're going at it from VR and AR and we showcase all that stuff. I think for Meta, like the robot is not as important as like the physical devices kind of stuff, for sure.
- Okay, I want to touch on Osmo a bit because very unusual company too. The stuff that we normally discuss, not robotics, sense of smell. The original pitch I heard from the founder, maybe you can correct me, is that he realized that you can smell cancer. Is that intuitive?
Is that what you get? - Yeah, I mean first like the very interesting reason I invested in Osmo is because Alex Wilszko, the founder of Osmo, also was like a, before PyTorch there was Torch. And Alex Wilszko actually worked on Torch. He's actually like a frameworks guy. He built this thing called Tangent from Google, like another like alternative framework and stuff.
So I know him from that side of things. And then he's a neurobiologist by training. He just happens to also love like neural networks and like hacking on those frameworks. So incredibly smart guy, one of the smartest people I know. So when he was going in this direction, I thought it was incredible that like smell is something that we haven't even started to scrape in terms of digitization.
When we think about audio or images or video, they're like so advanced that we have the concept of color spaces. We have the concept of like frequency spectrums. Like, you know, we figured out how ears process like frequencies in mouth spectrum or whatever, like logarithmically scaled images for like RGB, YUV.
Like we have so many different kinds of parameterizations. We have formalized these two senses ridiculously. (laughing) Touch and smell, nada. We're like where we were with images and say in 1920 or maybe even the 1800s, right? That's where we're at. And Alex has this incredible vision of like having a smell sensor just eventually just be part of your daily life.
Like as of today, you don't really think about like when you're watching an Instagram reel or something, huh, like I also would love to know what it smelled like and if you're watching a reel of a food or something. You don't because we really haven't as a society got that muscle to even understand what a smell sensor can do.
I think the more near term effects are obviously going to be around things that provide more obvious utility in the short term, like maybe smelling cancer or like repelling mosquitoes better or stuff like that. - More recently he's been talking about like categorizing perfumes. - Yeah, exactly. - That's a market that you can pursue.
- Yeah, like I mean think about how you can customize a perfume to your own liking in the same way you can customize a shoe or something, right? So that's I think all the near term stuff. I think if he's able to figure out a near term value for it, they as a company can sustain themselves to then eventually like try to make progress on the long term which is really in uncharted territory.
Like think about it, 50 years from now, it would be pretty obvious to like kids of the generation to just like, I guess I was saying, I was gonna say scroll a reel on their phone and maybe phones would be there. They're just like on their glasses, they're watching something and then they immediately get like a smell sense off that remote experience as well.
Like we haven't really progressed enough in that dimension and I think they have a chance to do it. - Awesome. Awesome, I mean we touched on a lot of things. Anything, we're missing anything you wanna direct people to or? - Yeah, call to action, call for research, call for startups.
- I don't really have a lot of calls to action because usually I think people should be intrinsically like. (laughing) - That's a good-- - Look inside yourself. (laughing) - That's good, awesome. Thank you so much for coming on. - Yeah, for sure. - Thanks a bit. (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music) (upbeat music)