8 Signs It's The Future: Thought-to-Text, Nvidia Text-to-Video, Character AI, and P(Doom) @Ted

00:00:00.000 | I want to know if you agree that each of these eight developments would have shocked you not

00:00:05.120 | just six months ago but even six weeks ago. These all came in the last few days and range from text

00:00:11.220 | to video, thought to text, GPT models predicting stock moves and AI annihilation discussed at TED.

00:00:18.800 | But we start with NVIDIA's new text to video model. Rather than show the paper I'm just going

00:00:24.300 | to let different examples play on screen. From the paper one of the breakthroughs here is in temporal

00:00:30.600 | consistency. Essentially the series of images that are used to form the video are more aligned with

00:00:36.280 | each other so the sequence plays more smoothly with fewer sudden glitches or changes. The generated

00:00:42.400 | videos by the way have a resolution of 1280 by 2048 pixels rendered at 24 frames per second and

00:00:49.060 | there is a powerful line from the appendix of the paper that was released with the samples.

00:00:53.720 | The author of the paper is a famous author and he's a great author. He's a great author and he's a

00:00:54.280 | great author. Authors say that they expect enhanced versions of this model to reach even higher quality

00:00:59.880 | potentially being able to generate videos that appear to be deceptively real. They go on to say

00:01:05.600 | this has important ethical and safety implications. This future might not be far away as I'm going to

00:01:12.620 | show you in a moment with the progression that's happened in text to image in one year. Just before

00:01:18.080 | I move on you may have wondered when this is going to become a product. Well they kind of admit in the

00:01:24.260 | past that they can't yet make it commercially viable because it's not ethically sourced. It was

00:01:29.540 | largely trained on copyrighted internet data and yesterday blockade lab showcased the add to this

00:01:36.080 | feature where as you can see you can do doodles and turn them into images that go in this 3D world.

00:01:42.560 | We are swiftly moving from two dimensions to three dimensions and as a bonus this is

00:01:48.320 | Zipnerf a 3D neural rendering tech released this week. I'm not even counting this as a third major

00:01:54.240 | development I'm lumping it in with blockade labs. This video shows what happens when a series of

00:01:59.780 | two-dimensional photographs are merged into a 3D drone like video and probably a real estate agent's

00:02:07.200 | dream. Just imagine a cherished moment in time being crystallized into a permanent immersive

00:02:13.420 | experience and soon to be honest many people may not have to imagine with the Apple Reality Pro

00:02:19.580 | possibly debuting as early as June and costing around three thousand dollars.

00:02:24.220 | According to Bloomberg it might be available to buy in the autumn and have things like a dial

00:02:30.440 | where you can move between virtual and augmented reality. Coming back to what has already occurred

00:02:36.040 | do you remember when a mid-journey image won the Colorado State Fair digital art competition?

00:02:42.160 | Well now the same thing has happened to photography. The quote-unquote photo on

00:02:47.440 | the right was generated by Dali 2 and it won the 2023 Sony World Photography Award.

00:02:54.200 | Now the artist behind it Boris L. Dagson did refuse the award but I want to show you a few images that

00:03:00.600 | show how far mid-journey in particular has come over the last year because many people believe

00:03:06.040 | that mid-journey version 5 is actually superior to Dali 2 which won the award. Take a look at the

00:03:12.120 | progress of the different mid-journey versions and remember that version 1 was released in February

00:03:18.120 | of last year. There is almost exactly one year's difference between v1 and v5.

00:03:24.180 | Here is another example of the progress and at this rate we will have mid-journey version 50

00:03:30.180 | within about 10 years. What will that version be like? Before I move on to the fourth development

00:03:36.180 | I'm going to show you two of the craziest images that I could find from mid-journey version 5.

00:03:41.300 | I would say I can still tell which images are AI generated around 90% of the time but my prediction

00:03:47.300 | would be that by the end of the year it will be 90% of the time that I can't tell. But if you thought

00:03:52.900 | text to image was getting close to the top of the list then I would say that it's not. If you thought text to image was getting close to the top of the list then you're probably wrong.

00:03:54.160 | What about thought to image or even thought to text? Here is a fascinating extract from the AI dilemma.

00:04:24.140 | It's a very interesting example of how AI can reconstruct what it sees. When you dream your visual cortex sort of runs in reverse so this means certainly in the next couple of years we'll be able to start decoding dreams.

00:04:32.140 | It can reconstruct what you're seeing but can it reconstruct what you're thinking, your inner monologue?

00:04:40.140 | They had people watch these videos and would try to reconstruct their inner monologue. Here's the video.

00:04:44.140 | It's this woman getting hit in the back, getting knocked forward. What would the AI reconstruct?

00:04:50.140 | I see a girl that looks just like me get hit on the back and then she's knocked forward.

00:04:54.120 | The fifth development concerns something rather more mundane which is making money.

00:04:58.120 | Now many of you may know that AI is already well on the way to conquering poker.

00:05:04.120 | I used to play a lot of poker myself and dare I say I was pretty darn good at it.

00:05:08.120 | But even though poker involves predicting human behavior, AI is starting to master it.

00:05:13.120 | Which brings me nicely to the development I actually wanted to talk about which was forecasting the stock market.

00:05:19.120 | According to this really quite interesting paper, accurately forecasting stock market growth

00:05:24.100 | and how much it returns is an emerging capacity of complex models.

00:05:28.100 | I'm going to let the author of the paper tell you exactly what prompt they used.

00:05:32.100 | So the exact question that we ask is, pretend you're a financial advisor. Here's a headline.

00:05:37.100 | Is this headline going to be good or bad for the company in the short term?

00:05:41.100 | Once you have enough headlines, you're basically just investing the companies with good headlines and not investing the companies with bad headlines.

00:05:47.100 | And here is the table summarizing the results. A 1 was a positive headline.

00:05:54.080 | A -1 was a negative headline. And a 0 was a neutral headline.

00:05:58.080 | And when GPT-3 analyzed those headlines, you can see the correlation with the average of the next day's return.

00:06:05.080 | That's positively correlated for good headlines according to GPT-3 and negatively correlated for bad headlines.

00:06:12.080 | And you can see that earlier models really couldn't do this.

00:06:15.080 | And as many of you may well be thinking, this is GPT-3, this is not GPT-4.

00:06:21.080 | I bet that as we speak, there are thousands of traders using this model.

00:06:24.060 | Using the GPT-4 API to predict the next day's stock movements.

00:06:28.060 | I think the results will get even more interesting as the context window expands and GPT-4 or GPT-5 can analyze entire articles, press conferences, etc.

00:06:40.060 | The next development is that fairly quietly, without attracting many headlines, role-playing chatbots are beginning to go mainstream.

00:06:47.060 | Character.ai, founded by one of the original authors of the Transformer paper, recently crossed a hundred million views.

00:06:54.040 | That is starting to resemble the growth trajectory of the original ChatGPT.

00:07:00.040 | GPT-4 is still smarter and better at doing these role-plays, but the interface of this site is easy to use and of course it's free.

00:07:08.040 | This is not sponsored in any way, but it was pretty fun playing this text-based adventure game.

00:07:14.040 | I especially enjoyed going on complete tangents to what the adventure was supposed to be about and I think confusing the AI.

00:07:20.040 | The seventh way that we can tell that the world has fundamentally changed.

00:07:24.020 | Is that Google doesn't seem all-conquering anymore, as it has done for my entire adult lifetime.

00:07:29.020 | First I heard in the New York Times that Samsung was considering replacing Google with Bing as the default search engine on its devices.

00:07:37.020 | I am not surprised that that shocked Google employees.

00:07:41.020 | Then, yesterday, this article came out in Bloomberg.

00:07:44.020 | It says that many Google employees begged their leadership to not release BARD, saying that it was a pathological liar and cringeworthy.

00:07:52.020 | I've done videos on that myself.

00:07:54.000 | But even when Google employees tested it out, asking it how to land a plane or do scuba diving, the answers it gave would likely result in serious injury or death.

00:08:04.000 | In February, one employee said the following in an internal message group:

00:08:08.000 | "BARD is worse than useless. Please do not launch."

00:08:11.000 | The concern that many have is that now Google will ditch safety in order to catch up.

00:08:16.000 | As the article reports, the staffers who are responsible for the safety and ethical implications of new products have been told to

00:08:23.980 | "not get in the way or try to kill any of the generative AI tools in development."

00:08:28.980 | Which brings me to the final reason that we know we're in the future.

00:08:32.980 | The risks of AI annihilation are beginning to be taken seriously.

00:08:36.980 | It's not just lone voices anymore, like Eliezer Yudkowsky, who a couple of days ago got a standing ovation at TED.

00:08:43.980 | He believes the world is firmly on track for AI takeover.

00:08:47.980 | It's also other senior figures who believe our probability of doom from AI is non-zero.

00:08:53.960 | Here are some selected probabilities, but I want to focus on Paul Cristiano, who gives a risk of doom between 10 and 20%.

00:09:01.960 | He previously ran the alignment team at OpenAI and now leads the alignment research center.

00:09:07.960 | And you may remember them from the GPT-4 technical report.

00:09:11.960 | They were the guys that OpenAI trusted to run the model evaluation of GPT-4,

00:09:17.960 | testing whether it could autonomously replicate and gather resources, which they concluded may become possible

00:09:23.940 | with sufficiently advanced AI systems, but the conclusion is that the current model is probably not capable of doing so.

00:09:30.940 | These are quite senior figures and insiders giving a non-trivial risk of AI annihilation.

00:09:36.940 | I think that deserves a lot more public conversation than it's currently getting.

00:09:40.940 | And on that, Sam Altman seems to agree.

00:09:43.940 | And the bad case, and I think this is like important to say, is like lights out for all of us.

00:09:47.940 | Yeah, I think it's like impossible to overstate the importance of AI safety and alignment work

00:09:53.920 | I would like to see much, much more happening.

00:09:56.900 | So the future is here and I think the media needs to catch up.

8 Signs It's The Future: Thought-to-Text, Nvidia Text-to-Video, Character AI, and P(Doom) @Ted

Chapters