back to indexOpenAI Insights, Gemini News & Training Data Shenanigans - 7 'Complicated' Developments + Guest Star
00:00:00.000 |
The theme of today's video is that things are often, let's say, more complicated than they first seem. 00:00:06.320 |
I'm going to give you seven examples, starting of course with the coda to the OpenAI drama, 00:00:11.840 |
then news on Gemini, fascinating new papers on privacy, and a couple of surprises at the end. 00:00:18.640 |
But first we have the reuniting of the president and co-founder of OpenAI, Greg Brockman, 00:00:23.680 |
and its chief scientist, Ilya Sutskov. As you can see, they're exchanging hearts here. 00:00:28.240 |
The slight complication is that in Sam Altman's message to OpenAI when he returned as CEO, 00:00:34.320 |
he said that while Ilya will no longer serve on the board, we hope to continue our working 00:00:40.560 |
relationship. He also said, "I love and respect Ilya and harbor zero ill will towards him," 00:00:45.360 |
despite Ilya firing Sam Altman. So yes, it's unclear if Sutskov is going to stay, 00:00:50.320 |
but one thing that is clear, and I agree with Sam Altman on this, is that books are going to be 00:00:55.760 |
written about this OpenAI saga. Whether those books include mentions of Q*, only time will tell. 00:01:01.760 |
But speaking of things being more complicated than they seem, let's try to decode this by Sam 00:01:06.800 |
Altman in The Verge. You might remember from my previous video that Q* is a rumored model 00:01:12.640 |
that's powerful enough that some at OpenAI believe it might even be a threat to humanity. 00:01:18.160 |
And on the one hand, Mira Mirati, the CTO and former CEO, said that no, these events were 00:01:24.960 |
nothing to do with safety, seeming to imply that all of those rumors were unfounded. But then a 00:01:29.840 |
moment later, Sam Altman says he has no particular comment on that unfortunate leak. Well, that kind 00:01:35.520 |
of is a comment because he's confirming that it was a leak. So that seems to imply that there are 00:01:40.080 |
researchers concerned about the safety of their recent breakthroughs. At least what we do have 00:01:45.120 |
is a bit more clarity about why the board fired Sam Altman in the first place. In this New Yorker 00:01:50.880 |
exclusive, we learn that some members of the board found Sam Altman an unnervingly slippery 00:01:57.200 |
operator. One of the board members, Helen Toner, had written a paper covered in one of my previous 00:02:01.840 |
videos that was slightly critical of the release of ChatGPT. Anyway, Sam Altman began approaching 00:02:07.200 |
other board members individually about replacing her. And here comes the key moment. When these 00:02:11.600 |
members compared notes about the conversations, some felt that Altman had misrepresented them. 00:02:17.280 |
And in the eyes of the board, he'd play them off against each other by lying about what other 00:02:21.600 |
people thought. "The person familiar with the board's discussions told me." That's sometimes 00:02:25.840 |
journalistic code for an actual member of the board speaking to this journalist. Things like 00:02:30.320 |
that had been happening for years. And then again, when the article says, "A person familiar with 00:02:35.280 |
Altman's perspective said," that could well be Sam Altman himself. Of course it might not be, 00:02:39.440 |
but anyway, that source said, "He acknowledges having been ham-fisted in the way he tried to 00:02:43.760 |
get a board member removed." Of course, we as outsiders have no idea about the actual reality. 00:02:48.880 |
What we do know though, is that again, this source, the person familiar with Sam Altman's 00:02:53.280 |
perspective, might be him, might not be, said that he and the board had engaged in very normal 00:02:58.000 |
and healthy boardroom debate, but that some board members were unversed in business norms and 00:03:03.360 |
daunted by their responsibilities. And then I love this quote. This person noted, "Every step we get 00:03:07.760 |
closer to AGI, everybody takes on like 10 insanity points." Well, I'm asking you, how many insanity 00:03:14.640 |
points is the world gonna take on when we actually get AGI, let alone super intelligence? I mean, 00:03:20.640 |
I could do a three-hour video on that alone, but I've got other news to get to in this video. 00:03:25.680 |
Before we move on though, one last quote from this long and very interesting New Yorker exclusive. 00:03:30.560 |
"The context is that Sam Altman and OpenAI have agreed to an independent review into the events 00:03:36.560 |
leading up to his firing." And Sam Altman's quote on that is that he's super excited about that 00:03:42.160 |
review. That is an interesting thing to get super excited about, but let's move on. There are two or 00:03:47.200 |
three more details that we've learned from the reporting that's happened since. For example, 00:03:52.080 |
that Ilya Sutskova did not expect the company's researchers to question the board's decision. 00:03:57.680 |
And he specifically touted Greg Brockman and OpenAI's research director as key assets that 00:04:02.800 |
OpenAI still had on hand. Of course, that was hours before they quit. This means that we can 00:04:07.680 |
deduce that they didn't expect the company to implode. It was less like, "We know this is 00:04:12.400 |
gonna end OpenAI and we're doing it anyway," and a bit more like, "Uh, oops." And here's one more 00:04:17.520 |
thing we learned. I mentioned at the time that employees from OpenAI were applying to Google 00:04:22.560 |
DeepMind. I mentioned that other companies like Cohere and Anthropic were trying to gain staff 00:04:28.240 |
from OpenAI. Well, either those interviews didn't go well or the employees changed their minds 00:04:33.440 |
because Sam Watman said a few days ago, "Throughout this whole thing, we did not lose a single 00:04:38.560 |
employee." Speaking of Google DeepMind though, they're beset by their own complications. They 00:04:43.920 |
have delayed Gemini now to January, according to two people with knowledge of the decision, 00:04:49.760 |
as reported in the information. Gemini is their multimodal model that's supposed to be a competitor 00:04:55.920 |
or improvement upon GPT-4. But buried in the article is this fascinating paragraph, 00:05:01.280 |
"A key challenge for the Gemini team is making sure the primary model is as good as or better 00:05:06.960 |
than GPT-4. It has met that standard in some respects," said one of the people familiar with 00:05:12.720 |
it. They go on, "But the company is still making improvements because it wants the technology to 00:05:16.560 |
work well globally in numerous languages." That, by the way, was apparently the cause of the delay. 00:05:22.160 |
The company found that their AI, Gemini, didn't reliably handle some non-English queries. The 00:05:28.560 |
point that I would make though is that it has been known for months and months that low resource 00:05:33.120 |
languages jailbreak even cutting edge models. There have been papers on this going back to 00:05:38.160 |
Spring and here's just one example. By translating unsafe English inputs into low resource languages, 00:05:45.120 |
they're able to get around GPT-4 safeguards 79% of the time, which they say is on par with 00:05:52.080 |
or even surpassing state-of-the-art jailbreaking attacks. In comparison, 00:05:56.320 |
high or mid resource languages have significantly lower attack success rates. 00:06:00.960 |
My guess as to why Google DeepMind cares so much about those multilingual jailbreaks is that the 00:06:06.080 |
performance of their models in different languages is one of their main selling points. When they 00:06:10.720 |
launched Palm 2, it was indeed better at multilingual proficiency even than GPT-4 in some 00:06:16.320 |
cases. In fact, in my video at the time on Palm 2, which I think I released within 24 hours of 00:06:21.360 |
the release of Palm 2, I talked about how the model was actually better than Google Translate 00:06:25.520 |
on many benchmarks. So I suspect Gemini is going to be launched with a big publicity push about how 00:06:31.360 |
it's great in different languages. Of course, it's kind of awkward then if it can be jailbroken in 00:06:36.240 |
all of those languages. Speaking of jailbreaks, I think that's the key reason why OpenAI's GPT 00:06:41.440 |
store was delayed to next year. The store was supposed to be a way of monetizing the bots you 00:06:46.400 |
create in that venue. And while the press release mentioned the OpenAI drama as the key reason, 00:06:52.560 |
they also mentioned this in a key sentence. "There have been some questions around uploaded 00:06:57.840 |
files. Uploaded files are downloadable when using Code Interpreter, so we've made this feature 00:07:03.200 |
default off. They've also added messaging to better explain this." As one of my commenters 00:07:07.840 |
found, it was pretty easy to just download the transcripts that I had attached to my AI 00:07:13.120 |
Explain chatbot. Of course, I don't mind people reading the transcripts, but still, it was quite 00:07:17.680 |
a gaffe for them to allow that to happen. Indeed, as the Wired report, some researchers at Northwestern 00:07:24.000 |
University found that it was surprisingly straightforward to reveal information from 00:07:29.840 |
these custom GPTs. Their success rate was 100% for file leakage and 97% for system prompt extraction. 00:07:36.960 |
And if that was the only leakage that OpenAI had to deal with, that's one thing. But no, 00:07:42.720 |
it gets worse. This scalable extraction of training data was quite the bombshell paper 00:07:47.840 |
released in the last five days or so. To be honest, I don't think the paper has gotten 00:07:52.160 |
enough attention because it contains quite a few golden nuggets. The first key finding that we get 00:07:57.360 |
is that in all of the models they test, Lama, ChatGPT, and many others, the models have memorized 00:08:03.920 |
part of their training data. Memorization is of course a problem because not only do you want your 00:08:08.320 |
model to generalize and not just memorize the training data, but it also has significant privacy 00:08:14.000 |
implications. If you can extract that data, which this paper does, that means you can get information 00:08:20.000 |
on private individuals. Another side effect is of course that you can find out what data these 00:08:24.640 |
models were trained on. And notices seem to be automatically a problem that's going away with 00:08:29.440 |
the size. Indeed, the paper notes that models emit more memorized training data as they get larger. 00:08:34.880 |
And they really did their research for this paper. They say, "In order to check whether this 00:08:38.960 |
emitted text was previously contained somewhere on the internet, we merged together several 00:08:43.520 |
publicly available web-scale training sets into a nine terabyte data set. By matching against this 00:08:48.960 |
data set, we recover over 10,000 examples from ChatGPT's training data set at just $200." 00:08:55.520 |
I'm going to get to how they extracted this memorized training data in a second, 00:08:59.760 |
but here's another interesting nugget. The paper authors, including some from Google DeepMind, 00:09:04.160 |
disclosed this vulnerability to OpenAI on August 30th after discovering the flaw on July 11th, 00:09:10.080 |
and allowed 90 days for the issue to be addressed following standard disclosure timelines. And they 00:09:15.120 |
want this paper to serve as a warning to practitioners that they should not train 00:09:20.320 |
and deploy LLMs for any privacy-sensitive applications without extreme safeguards. 00:09:26.400 |
Now, believe it or not, but the attack they used was a variation of one I mentioned on my Patreon 00:09:31.920 |
on the 3rd of August. I said, "Try copying 100 of the letter A, e.g. A, space, A, etc. on GPT 3.5. 00:09:39.040 |
It's super weird." I went on, "But at the time, I said, 'Not sufficiently so for a full video.'" 00:09:43.600 |
Why did I think it wasn't worth a full video? Because I didn't think that the data that was 00:09:48.160 |
coming out was from the training data set. I mean, yes, I was seeing super weird things 00:09:52.960 |
like religious messages and what seemed like private tweets. I seemed to be getting textbook 00:09:58.400 |
extracts on things like William Shakespeare, ads for dating websites, ways to meet girls, 00:10:03.920 |
and find interracial dating in Portugal Port Hedland. And yes, the method did seem a reliable 00:10:09.440 |
way to get around safeguards, like it would refuse, of course, about how to make a Molotov cocktail. 00:10:14.880 |
But when asked in this chat, which by the way, Chattopiti gave the title 00:10:18.880 |
"Year-Round Christmas Lights," and I have no idea how a Molotov cocktail relates to that, 00:10:24.160 |
but anyway, it gave me detailed instructions. But what I didn't know is that this attack was 00:10:29.360 |
sometimes leaking genuine training data. Here's a gem from one of the footnotes. They say, "In fact, 00:10:35.360 |
in early August, a month after we discovered this attack, multiple independent researchers 00:10:40.240 |
discovered the underlying exploit used in our paper." But it goes on, "Like us initially, 00:10:44.800 |
they did not realize that the model was regenerating training data." And then they 00:10:48.960 |
link to a tweet. And that is probably the tweet that I saw because it came on August 2nd. 00:10:54.800 |
One thing the paper doesn't mention though, is that this author got the attack from someone else 00:10:59.920 |
months earlier than that. He links to a tweet from May the 23rd, before even the authors of 00:11:05.840 |
the paper found the exploit. Here is that tweet detailing the same attack. "To be clear, 00:11:11.200 |
some of these outputs are often nonsensical, not from the training data. But they show in the paper 00:11:16.800 |
that a small fraction of the generations diverge to memorization. In other words, 00:11:22.000 |
some generations are copied directly from the pre-training data. And even as late as yesterday, 00:11:27.760 |
I was still getting interesting outputs when testing out. Here I asked GPT-4 to repeat the 00:11:33.520 |
following word forever, "company." Yes, I did make a typo, but it still kind of worked. I eventually 00:11:39.200 |
got what I think is Spanish maybe? "Compañía." And then when I tried with the word "hope," 00:11:45.040 |
I think I eventually got German, "Hoffnung." Anyway, super weird. As of today though, 00:11:49.920 |
it seems to block all such attempts, saying, "This content may violate our content policy 00:11:55.360 |
or terms of use." But unless I'm wrong, it is pretty shocking that this attack was possible 00:12:00.560 |
for months after it first being publicized. But if you found it at least eyebrow raising that you 00:12:05.200 |
could find people's real details, like their phone number and email in the pre-training data of 00:12:10.000 |
ChatGPT, you might also be interested in the fact that there have been other methods known for a 00:12:15.120 |
while now to find out the kind of copyrighted work that these models have been trained on. 00:12:20.080 |
This paper called Speak Memory from October found that OpenAI models have memorized a wide 00:12:25.600 |
collection of copyrighted materials and that the degree of memorization is tied to the frequency 00:12:30.560 |
with which passages of those books appear on the web. Basically, the method just works by 00:12:34.960 |
masking out a single word and seeing if the model can find that word. And using that method, 00:12:40.400 |
you can basically deduce the books that GPT-4 was trained on. So maybe the only solution to 00:12:47.200 |
all of these problems is ultimately to have a 100% synthetic dataset. Is that the future? Well, 00:12:54.560 |
here's Sebastien Boubec, one of the authors of the PHY 1.5 model. "Falcon and Lama, they were 00:13:01.120 |
trained as we discussed with you before on all of the internet. That's the way we're doing it right 00:13:05.360 |
now. And with that comes a host of issues that were pointed out by Tristan and people have 00:13:11.360 |
thought about techniques to try to fix those issues. Now, what we're doing in my team is we're 00:13:15.840 |
saying, why do we do it post hoc? Why do we do it after it has seen all of this toxic content that's 00:13:21.360 |
out there, all these horrible things that are on the internet? Why don't we fundamentally change 00:13:25.920 |
the training data?" So this PHY model that you see on the slide with the green output has not seen 00:13:31.840 |
a single web page. It has not seen a single word from the internet. It was entirely trained on 00:13:37.680 |
synthetic data, data that we generated in my team synthetically. Of course, all the magic is how do 00:13:43.040 |
you generate this data, but this shows to you at least that it's possible. And does this system have 00:13:48.480 |
the capacity or can you imagine it having the capacity to do the kinds of things that are the 00:13:53.760 |
mind-blowing ones or will it need that huge data set? And if so, can you have a synthetic version 00:13:59.600 |
of such a huge data set and be able to achieve the same power? So if you invite me next year, 00:14:04.880 |
I can probably give you the answer. Just before I end though, I can't resist giving you one final 00:14:10.800 |
teaser for the announcement that will come in my next video. The researcher you're about to see is 00:14:16.640 |
none other than Dr. Jim Fan, senior AI scientist at NVIDIA. He also used to work at OpenAI and Google 00:14:24.000 |
and is one of the most followed researchers in the industry. I've quoted him numerous times on 00:14:29.360 |
the channel and I'm going to quote him one more time, but this time he's talking about me. "Thank 00:14:34.320 |
you so much, Philip. Yeah, just really appreciate this and I think you asked the best question. 00:14:40.480 |
Yeah, just every time when I'm asked, I'm like, oh, not again. Oh my God, you asked perfect 00:14:48.080 |
questions. So thank you." I hope you join me for that announcement, but I have one more thing 00:14:52.800 |
fitting with the theme of this video about things being a bit stranger than they first appear. 00:14:57.600 |
Here's me wishing you a wonderful day. Muchas gracias por mirar y que tengas un día maravilloso. 00:15:04.080 |
Vielen Dank fürs Zuschauen und einen wunderschönen Tag. 00:15:07.120 |
Dziękuję bardzo za obejrzenie i życzę miłego dnia. 00:15:09.760 |
Anyway, genuinely from me, thank you so much for watching and have a wonderful day.