back to index‘Everything is Going to Be Robotic’ Nvidia Promises, as AI Gets More Real
00:00:00.000 |
The CEO of NVIDIA revealed that he wants his company to become ultimately one giant AI. 00:00:07.200 |
Even if that feels a little ways away, he did showcase in the last couple of days 00:00:11.920 |
a string of capabilities that are possible now with AI. 00:00:16.560 |
Yes, we're going to hear three big promises about the future of AI, 00:00:20.160 |
but we're going to see a host of demos of things that are possible right now. 00:00:24.320 |
I'll bring in clips from some recent interviews I've conducted, 00:00:28.080 |
and we'll hear from the chief of staff of one prominent AI company predicting 00:00:32.800 |
the end of employment as we know it in three to five years, which I think is a tad overstated. 00:00:39.520 |
Speaking of which, you'll also see some AI fails as a spam campaign flops hard. 00:00:45.520 |
So what about those three promises I mentioned from the CEO of NVIDIA, 00:00:50.160 |
which looks set to become the largest company in the world if current trends hold? 00:00:55.440 |
Well, first we heard and saw that NVIDIA anticipates robots revolutionizing industry. 00:01:01.840 |
That's still pretty general though, right? So how about the prediction 00:01:23.280 |
The next wave of AI is physical AI. AI that understands the laws of physics. 00:01:34.400 |
Of course, when I say robotics, there's a humanoid robotics that's usually the representation of that. 00:01:41.200 |
Everything is going to be robotic. All of the factories will be robotic. 00:01:45.600 |
The factories will orchestrate robots and those robots will be building products that are robotic. 00:01:52.560 |
Robots interacting with robots, building products that are robotic. 00:01:58.320 |
And of course, we don't just have robots building robots. 00:02:01.520 |
We have artificial intelligence improving artificial intelligence. 00:02:05.600 |
Here is Jason Huang on a separate, less reported occasion, 00:02:17.440 |
At night, our AIs are exploring design spaces vast and wide that we would never do ourselves 00:02:24.800 |
because it costs too much money to explore it. 00:02:26.640 |
We can't write software without without AI anymore. 00:02:29.120 |
We have to explore all the, you know, the design space of optimizing compilers is too large. 00:02:37.200 |
So our bug, you know, our bugs database actually tells you what's wrong with the code, 00:02:41.920 |
who's likely involved and activates that person to go fix it. 00:02:46.560 |
You know, and so I think we, I want everybody, 00:02:51.040 |
every organization or company to use AI very aggressively. 00:02:56.640 |
But it's well past time that I become a bit more concrete about what models can do right now, today. 00:03:03.040 |
Here is a 30 second clip from NVIDIA that actually undersold what AI is capable of. 00:03:11.040 |
Multimodal LLMs are breakthroughs that enable robots to learn, 00:03:15.760 |
perceive and understand the world around them and plan how they'll act. 00:03:22.960 |
robots can now learn the skills required to interact with the world using gross and fine motor skills. 00:03:29.840 |
But how was that underselling the capabilities of AI? 00:03:35.840 |
Well, they focused on AI learning from human demonstrations. 00:03:40.160 |
But if you've watched my Dr. Eureka video recently, 00:03:42.800 |
you'll know that it's not just about LLMs coming up with high level plans 00:03:47.360 |
and then relying on human demonstrations to exercise fine grained robotic control, 00:03:54.720 |
LLMs are actually really good at programming the robo dog to, 00:03:59.600 |
in this case, stay balanced on a moving rolling yoga ball. 00:04:03.200 |
And I spoke with Jason Ma, the lead author of the Dr. Eureka paper, 00:04:12.240 |
Robot capabilities will be bootstrapped by large language models. 00:04:16.080 |
And I think that's the most interesting thing of using LLM for robotics, honestly. 00:04:19.200 |
Like there's a lot of work in using large language models for robotics 00:04:24.560 |
I can plan the sequence of tasks the robot needs to do, 00:04:27.360 |
but I think fundamentally the bottleneck for robotics 00:04:29.760 |
is still like the low level of physical control, right? 00:04:34.640 |
but if the robot can't even pick up a knife properly, 00:04:37.920 |
But I think a lot of Eureka where my work is focused on 00:04:41.360 |
how do we use this highly capable reasoning, coding, 00:04:44.320 |
text models, multimodal models to supervise the low level learning. 00:04:48.240 |
So the robots can do the very complex tasks in the first place. 00:04:53.120 |
The key edge that AI has is that it can iterate thousands and thousands of times 00:04:58.480 |
in parallel in simulation until it's got a program it's happy with. 00:05:03.200 |
And dipping back into the virtual world for a moment, 00:05:06.560 |
how about the long awaited promise of being able to interact live with video game characters? 00:05:33.600 |
And speaking of realism, before I get to the latest clips from NVIDIA, 00:05:38.320 |
here's me speaking six weeks ago about how good lip syncing was getting. 00:05:43.440 |
Using just a single photo of you, we can now get you to say anything. 00:06:01.280 |
I have to remind myself that these aren't projections. 00:06:05.920 |
Imagine that accuracy of lip syncing on a digital human of this level of realism. 00:06:27.920 |
I do wonder sometimes how many decades away we are 00:06:31.360 |
from a time where you could be speaking to someone and not be 00:06:35.280 |
entirely certain in the real world whether or not they are embodied AI. 00:06:40.160 |
I might previously have said that's a hundred years away, 00:06:45.600 |
But I'm off track because I promised more demos of things that are possible with AI today. 00:06:50.720 |
So how about a weather report that's localized to your building, your pavement? 00:06:58.240 |
The next frontier is hyperlocal forecasting down to tens of meters 00:07:03.520 |
where the effects of city infrastructure are taken into account. 00:07:06.560 |
When combined with weather simulation windfields, it can model the airflow around buildings. 00:07:12.400 |
We expect to predict phenomena such as downwash, 00:07:15.920 |
where strong winds funnel down to street level, causing damage and affecting pedestrians. 00:07:23.440 |
NVIDIA Earth 2, an excellent example of a digital twin that fuses AI, 00:07:29.120 |
physics simulations, and observed data can help countries and companies 00:07:34.880 |
see the future and respond to the impact of extreme weather. 00:07:39.040 |
Or what about a coffee shop which is staffed by dozens of robots 00:07:42.880 |
with just one or two humans to oversee things? 00:07:47.840 |
All of these things feel futuristic and far away until they actually happen. 00:07:53.120 |
And how about a sound effect generator that can generate any sound? 00:07:58.240 |
Well, that is now possible today with Eleven Labs. 00:08:01.680 |
Actually, I'm going to test it with something like a robot being crushed. 00:08:05.920 |
Let's see if it comes up with something interesting or not. 00:08:10.560 |
So far, about five, six, seven seconds. Not too bad. 00:08:19.460 |
Not perfect, obviously, but if you feel that all of this is in the future, 00:08:27.200 |
let me bring you a video from a graphic designer who lost his job recently to AI. 00:08:33.680 |
He just lost my job and I lost it to AI, which is very unfortunate. 00:08:39.040 |
I think many people joke about the, you know, the fact that, 00:08:43.280 |
oh, AI is going to take all our jobs and we're all going to get replaced. 00:08:46.400 |
And especially within my industry, which is graphic design. 00:08:50.400 |
And it turns out basically all of the material that I've provided over the past six years 00:08:59.360 |
So a design that would take me 30 minutes now takes AI 30 seconds 00:09:08.560 |
Essentially, I think it just literally reuses my templates 00:09:11.760 |
and then they can input the hex codes they want the email or the website designed to be, 00:09:16.800 |
drag and drop in the client's logo, upload the client's font and boom, 00:09:21.680 |
it will generate my template by using their brand assets. 00:09:25.840 |
It's a reminder that even though almost all AI needs human generated training data to get started, 00:09:32.160 |
they don't necessarily need more of it to keep going. 00:09:35.360 |
Or to put it another way, this is the worst that AI, embodied or not, will ever be. 00:09:42.720 |
including the chief of staff to the CEO of Anthropic, makers of the Claw chatbots, 00:09:48.960 |
think that this will massively impact the short-term outlook on employment. 00:09:54.160 |
This article by that chief of staff, Avital Balwit, came out just two weeks ago. 00:09:58.720 |
While I think the outlook isn't quite this stark, here's what she had to say. 00:10:03.920 |
She predicted these next three years might be the last few years that I work. 00:10:08.560 |
I stand at the edge of a technological development that seems likely, 00:10:12.080 |
should it arrive, to end employment as I know it. 00:10:15.040 |
And she makes the point that would have been relevant 00:10:20.080 |
The economically and politically relevant comparison on most tasks 00:10:24.240 |
is not whether the language model, or I would say the embodied AI, 00:10:29.360 |
It's whether they are better than the human who would otherwise do that task. 00:10:33.200 |
Doesn't have to be perfect in other words, just has to be a bit cheaper. 00:10:36.800 |
She makes the somewhat common prediction by now that things like copywriting, 00:10:41.440 |
tax preparation and customer service will be heavily automated. 00:10:45.600 |
But let me give you two examples how the future is a bit more 00:10:52.080 |
First, I remember the frenzied reporting on this report from the think tank, 00:10:58.960 |
According to the headlines at least, they were warning of an AI jobs apocalypse. 00:11:04.240 |
But the very next day I contacted the lead author, Carsten Jung, 00:11:08.560 |
and we had a detailed discussion for AI Insiders. 00:11:11.920 |
First, he said head on that he was disappointed by the media's coverage. 00:11:16.080 |
No, I'm not fully happy with how this is being covered, 00:11:20.320 |
both our report but in general, because it can sound very scary. 00:11:25.200 |
And I think just scaring people doesn't necessarily lead to 00:11:33.600 |
When people talk about jobs apocalypse, I think some people might just switch off 00:11:37.920 |
and throw up their hands and say, oh God, we're all doomed. 00:11:40.240 |
Whereas what we try to do in the report is actually to say there's a range of scenarios 00:11:45.600 |
and it's not some kind of external event like a pandemic that's like happening to us 00:11:50.880 |
and it's all doom and gloom, but it's actually a thing that 00:11:54.000 |
totally depends on decisions by policymakers, but also by organisations that implement AI. 00:12:00.000 |
Then we discussed how a more likely medium term outcome is wage inequality. 00:12:05.520 |
In short, low wages for many, but not for those who utilise AI to boost their productivity. 00:12:11.840 |
So those that remain in work, their productivity will be hugely aided by AI. 00:12:19.040 |
But then of course, and I think this is also Sam Altman's point, 00:12:24.800 |
So we have lower labour costs, AI likely is able to do things more cheaply. 00:12:30.720 |
So those that own companies will have higher returns. 00:12:38.000 |
And the second cautionary tale about how AI's impacts say on jobs can sometimes be overhyped 00:12:44.480 |
actually comes from open AI itself, albeit unintentionally. 00:12:49.840 |
we talk about customer service being revolutionised and productivity accelerating. 00:12:54.880 |
But when the focus is on people using AI for nefarious purposes, 00:13:01.600 |
This was a report released by open AI a few days ago 00:13:04.720 |
about how some bad actors were trying to generate disinformation campaigns en masse. 00:13:11.760 |
but gave a summary of the impact of these campaigns using the GPT models. 00:13:16.480 |
There was no significant audience increase due to our services. 00:13:24.560 |
So far, these operations from places like Russia, Israel and China 00:13:28.560 |
do not appear to have benefited from meaningfully 00:13:31.440 |
increased audience engagement or reach as a result of our services. 00:13:34.960 |
They basically describe how these guys came up with a load of spam, 00:13:41.040 |
For the most part, it was because the spam just wasn't very good. 00:13:44.560 |
I don't know, it might be me, but I just find it a little bit ironic 00:13:47.520 |
that when we're talking about a negative use of the technology, 00:13:51.200 |
the party line is that the models are kind of useless. 00:13:54.560 |
Of course, what we really need are better benchmarks. 00:13:57.520 |
And so I was pleased to see this initiative from Scale AI. 00:14:01.200 |
They describe these benchmarks and leaderboards that can't be gamed, 00:14:10.320 |
GPT 4.0 is not a million miles ahead of other models. 00:14:16.000 |
how we should always benchmark models on our own use cases, 00:14:20.320 |
because leaderboards chop and change quite a lot. 00:14:23.600 |
Notice how the table on the left is quite different 00:14:30.800 |
isn't the only reason to be optimistic about benchmarks, 00:14:36.400 |
In short, though, I think just about the only thing we can all agree on 00:14:40.720 |
is that the future is about as unpredictable as it has ever been. 00:14:45.360 |
In terms of at least referring to AI in academic papers, 00:14:49.120 |
you can see the recent exponential increase across virtually every field. 00:14:56.320 |
in the real world, in society, with jobs, with embodied physical AI, 00:15:02.560 |
Thank you, though, for being here with me as we watch it all unfold.