back to indexStanford CS25: V2 I Common Sense Reasoning
00:00:00.000 |
Okay, so yeah, I'm super excited to be here and share our recent research about neurosymbolic 00:00:16.400 |
So part of the goal of this talk will be to address some of the frequently asked questions 00:00:22.600 |
these days that NLP or common sense or whatever, it looks like almost solved by chat GPT and 00:00:37.120 |
So perhaps it's a case of hasty generalization, especially if we do look at some of the examples. 00:00:45.400 |
So the trophy doesn't fit in the brown suitcase because it's too big. 00:00:50.320 |
So this is classical Winograd schema challenge problem. 00:00:55.680 |
And here, chat GPT answers it correctly, that trophy is too big. 00:01:02.400 |
But what if you change the question a little bit, then he says the trophy itself is too 00:01:13.240 |
So the situation is a little bit like David and Goliath in the sense that the bigger appears 00:01:19.320 |
to be better in many of the cases, although of course, some of the more careful studies 00:01:24.720 |
do reveal that smaller models can be better with better data or better reinforcement to 00:01:36.080 |
So it's likely that there are still other ways to improve the transformer performances 00:01:44.240 |
by building smaller models in a more clever way. 00:01:49.100 |
So one way to draw the insight is from this classic book known as The Art of War, which 00:01:58.440 |
of course says nothing about deep neural networks or transformers. 00:02:02.720 |
But the wisdom here is that know your enemy, choose your battles and innovate your weapons, 00:02:07.560 |
which we can translate that as evaluation with realism and scrutiny and focusing on 00:02:18.080 |
different types of new tasks and leaderboards, and then innovating your algorithms and data. 00:02:23.720 |
So in this talk, I'm going to showcase three such studies, and let's dive right in with 00:02:31.040 |
By the way, so the recurring theme in this talk will be that smaller models can be better 00:02:38.000 |
So let's start with this observation that language models are sometimes amazing. 00:02:44.440 |
So if you ask GPT-3, if you travel west far enough from the west coast, you will reach 00:02:54.320 |
So it says the world is round, which is correct. 00:02:58.800 |
So you will reach the east coast eventually, therefore the answer is true. 00:03:03.040 |
So this looks impressive, except when it's not impressive. 00:03:07.220 |
So if you ask other questions like butterflies fly with three wings or not, it says it has 00:03:13.520 |
four wings and therefore the statement is false. 00:03:16.360 |
But if you read back what it just said as true or false questions, then it negates what 00:03:22.800 |
So it can be inconsistent with its own statement. 00:03:27.720 |
And then there are many other such inconsistency problems. 00:03:30.960 |
So it's not clear what language models do or do not know. 00:03:35.480 |
It's almost like language models are some sort of lemons. 00:03:38.480 |
Well, it might be cherries if you only pick cherries, but it doesn't make strange mistakes. 00:03:44.680 |
So the question is, how do we make better lemonade from GPT-3? 00:03:50.000 |
So one approach might be to get philosophical and use Socrates' meiotic method that was 00:03:57.280 |
originally developed for addressing humans' flawed reasoning, because it actually turns 00:04:03.120 |
out even humans are not all that logically consistent, let alone GPT-3. 00:04:09.880 |
So the way it works is this, we're going to build the meiotic inference tree, and let's 00:04:16.200 |
use the previous example as a running example. 00:04:20.280 |
So what we do is we ask the following question, providing the answer being true, and then 00:04:25.760 |
let attach "because" so that we prompt GPT-3 to continue on this sentence, which means 00:04:34.320 |
it will now have to explain, provide the explanation why the answer is true. 00:04:39.520 |
In this case, the explanation is good, so it's E of T, explanation of the answer being 00:04:45.960 |
T. We ask the same question, switching out "true" with "false," and then see what 00:04:56.200 |
So here, it's just trying to go with the false as an answer, but it just doesn't have a very 00:05:05.960 |
So now we call this as E of F, so it's explanation of F, answer being F. 00:05:13.480 |
Now let's see how robust or consistent GPT-3 is with respect to its own explanations. 00:05:22.640 |
So we read back E of T, and then let GPT-3 decide whether it's going to agree or disagree 00:05:33.360 |
So in this case, the last one is a negated version of E of T, so we insert a negation 00:05:40.440 |
"not here," and in this case, it's good that it's flipping the answer when the statement 00:05:47.440 |
So this is a case when GPT-3 is logically integral to E of T. 00:05:54.440 |
For E of "false," though, which was basically a bogus explanation for the wrong answer, 00:06:00.240 |
it's not able to flip its own labeling, which means GPT-3 is not logically integral. 00:06:08.120 |
So that's good, GPT-3 does know something strange about its own explanation given previously. 00:06:17.120 |
And so we can keep doing this recursively to make GPT-3 explain its own explanation 00:06:29.840 |
So we build this Mayuric tree or graph for some time, and then only keep branches that 00:06:41.240 |
are logically integral, throwing out the non-integral part for now. 00:06:45.960 |
But even after chopping the branches where there's logical inconsistencies, GPT-3 being 00:06:53.280 |
GPT-3, the tree will still have some inconsistent explanations. 00:06:58.680 |
In order to improve the logical consistency, now what we do is we're going to look at pairwise 00:07:10.340 |
So we compute, sorry, stepping back, we're going to first compute the node-wise confidence. 00:07:19.600 |
So we call that as a belief, and it's defined by this particular equation that basically 00:07:26.480 |
looks at different conditional probabilities and then compute its ratio to see how confident 00:07:35.400 |
We then also look at the edgewise or pairwise consistency by using off-the-shelf natural 00:07:43.560 |
language inference model's output, whether a pair is contradictory or not. 00:07:55.880 |
Now once you have all of this, then we can formulate a constrained optimization problem 00:08:04.840 |
where the inference objective is to assign some label, either true or false, on each 00:08:14.400 |
of the nodes such that it's going to maximize the weight assigned to all of these nodes 00:08:22.820 |
So sometimes the labeling will have to flip the original label that the model might have 00:08:29.240 |
preferred to give because that way you can enhance the graph-level consistency. 00:08:36.040 |
So you can solve this with any max-set, so set means satisfiability. 00:08:44.800 |
And this is a classical AI search algorithm, and we used this particular solver, but you 00:08:52.880 |
And so here, the final output is that the original answer to the original question should 00:08:59.400 |
be true, and then it also gives you node-wise per-node label assignment as well. 00:09:06.120 |
So what does this mean in the end in terms of empirical result? 00:09:11.140 |
So when tested on Common Sense QA 2.0, the canonical prompting, so green, used on top 00:09:20.040 |
of GPT-3, so it's basically a few-shot prompting on GPT-3, will give you a bit better than 00:09:28.320 |
So this is true/false QA dataset, so your chance level is 50, and GPT-3 is barely better 00:09:37.520 |
But recently, there have been some ideas such as chain of thoughts or self-consistency that 00:09:45.600 |
can improve the vanilla prompting method considerably. 00:09:51.040 |
So if you use such variations, then you get performance gain. 00:09:56.080 |
Now the purple is a different variant of it, but together, they're all doing worse than 00:10:04.960 |
Mayuric prompting, which in fact does better than supervised model trained on T5. 00:10:12.760 |
Usually supervised model trained on T5 is hard to beat using GPT-3 few-shot, but basically 00:10:20.920 |
this is inference time on the algorithm, practically unsupervised, and it does well on that. 00:10:26.880 |
And similarly, we see a large boost when tested on other Common Sense benchmarks such as CRIC 00:10:35.480 |
So what this tells us is that although the emergent capabilities of large transformers 00:10:44.320 |
are phenomenal, they can be not very robust for some of these Common Sense challenges. 00:10:53.360 |
And it's in large part due to the logical inconsistencies, which can be dramatically 00:10:59.760 |
enhanced when you do this sort of symbolic reasoning on top. 00:11:04.560 |
So yeah, not only Socrates' method helped with flawed human reasoning, it can also dramatically 00:11:16.240 |
Okay, so moving to the next topic, symbolic knowledge distillation. 00:11:23.560 |
So this work is a work that tries to convert general language models on top of transformers 00:11:29.940 |
to causal Common Sense models, also transformers. 00:11:34.800 |
And the reason why we might want to worry about Common Sense models is because despite 00:11:42.600 |
human-level or even superhuman-level performances on a variety of leaderboards, the state-of-the-art 00:11:49.080 |
models are brittle when given adversarial or out-of-domain examples. 00:11:53.680 |
So transformers can make seemingly strange mistakes. 00:12:03.200 |
And so it's almost like solving only a dataset without really solving the underlying task. 00:12:09.540 |
And this phenomenon sometimes is described as a systematic generalization problem. 00:12:15.840 |
And why does this happen is that unlike humans who truly learn about how the world works 00:12:21.740 |
conceptually, transformers learn sort of surface patterns in language or images that are powerful 00:12:32.420 |
for many downstream use cases, but still not really robust understanding of the concepts 00:12:41.980 |
So in order to bridge this gap, we can really think about this challenge of learning, acquiring 00:12:51.780 |
So the operational definition of Common Sense in this talk will be that it's the basic level 00:12:58.140 |
of practical knowledge and reasoning concerning everyday situations and events that are commonly 00:13:07.140 |
This is really important, the last part, that it's commonly shared among the most people, 00:13:11.620 |
but it's not the case that it's shared by everybody in the universe. 00:13:16.540 |
Because the additional context can always change what is commonsensical for any given 00:13:24.860 |
So for example, in general, you and I probably agree that it's okay to keep the closet door 00:13:29.700 |
open, but it's not okay to keep the fridge door open because the food inside might go 00:13:35.140 |
So these are general rules of thumb that we might abide by. 00:13:40.500 |
But of course, if you go to your friend's house, you might behave a little bit and keep 00:13:49.980 |
And then, as far as the fridge door, if you're in a store and it's not really hooked up to 00:13:54.660 |
the wall, then it doesn't matter whether the fridge door is open or not because there's 00:14:03.200 |
You can come up with many situations in which these basic rules of thumbs will have exceptions. 00:14:10.820 |
So that is the key challenge of common sense because it's not universal knowledge, but 00:14:20.740 |
it's shared across a large population of people. 00:14:26.660 |
Okay, so such common sense is essential for humans to live and interact with each other 00:14:34.020 |
And so, as AI becomes an increasingly more important aspect of human lives, and with 00:14:43.020 |
the chat GPT, more likely so, it's good if AI can understand human needs and actions 00:14:51.820 |
So the premise of this talk is that language models are not equivalent to knowledge models, 00:14:58.900 |
even though language models today do acquire a great deal of knowledge, but they're not 00:15:06.140 |
So we developed a symbolic common sense knowledge graph known as Atomic a few years ago, four 00:15:15.940 |
years ago now, as well as neural common sense model built on top of or trained using Atomic 00:15:24.420 |
as the source of training, fine-tuning of off-the-shelf language models. 00:15:30.620 |
Up until two years ago, this Atomic was fully crowd-sourced by humans, which in this talk 00:15:38.300 |
I'm going to lift, but at first the norm was that this all has to be human crowd-sourced. 00:15:45.780 |
So you can consider almost Atomic as a human demonstration. 00:15:49.580 |
In the current version of chat GPT, you can consider this as human demonstrations of common 00:15:57.940 |
And we had this Comet Atomic 2020, which is enhanced version of Atomic and Comet. 00:16:03.660 |
Again, Atomic portion was fully crowd-sourced by humans in 2021. 00:16:10.220 |
So let me give you a bit of a sample of what Atomic 2020 looks like. 00:16:16.580 |
So imagine a situation where X gets X's car repaired, or you get your car repaired. 00:16:22.300 |
So immediately you can imagine what's likely to be true or relevant for the situation, 00:16:29.220 |
that as a result, you might want to call Uber or Lyft for a ride. 00:16:36.460 |
Beforehand, you need a mechanic and money to repair your car. 00:16:40.460 |
So these are basically preconditions and post-conditions of that event. 00:16:44.780 |
So some of this Atomic knowledge graph is about social interaction knowledge about event. 00:16:51.140 |
And then other parts of the Atomic is physical entity-centric knowledge. 00:16:56.740 |
So money is typically used for paying repairs. 00:16:59.980 |
But if you really want it, you can fold it into origami. 00:17:05.180 |
But these are examples of stereotypical use cases, as well as non-stereotypical but affordable 00:17:17.420 |
So it requires naive physics understanding about the affordances of physical objects. 00:17:25.940 |
And then we can also reason about counterfactual condition in which the center event cannot 00:17:33.340 |
So if you totaled your car completely, then it's impossible to get your cars repaired. 00:17:39.540 |
And then there are events that typically happens before and after. 00:17:45.860 |
So we crowd-sourced a fair amount over the course of, I don't know, maybe two years or 00:17:52.980 |
so, up to 1.3 million if-then rules or if-then knowledge over 23 different adjectives or 00:18:09.620 |
And so the knowledge graph is useful for training transformers. 00:18:15.020 |
And here, let's see the comparison between Comet that was built on BART compared to GPT-3, 00:18:21.900 |
which is so large, it doesn't even fit into the slide. 00:18:30.140 |
So with that in mind, if you look at this accuracy judged by humans after making the 00:18:38.620 |
common-sense model, making some common-sense inference. 00:18:41.020 |
So the task is that given a node, which describes a situation or event, and then given an edge 00:18:47.780 |
type, which sort of narrows down the common-sense relation or inference type, you're now going 00:18:59.620 |
And then we ask humans whether the common-sense inference seems reasonable or not. 00:19:09.740 |
Comet is substantially better than GPT-3, which is really impressively better than GPT-2. 00:19:17.420 |
It's not apple to apple because GPT-2 is a zero shot, GPT-3 is a few shot, but still, 00:19:22.460 |
it's interesting, the large jump that scale alone brought to GPT-3. 00:19:30.620 |
But still, GPT-3 is too large to be useful for actual system building for most engineers 00:19:40.900 |
So it's nice to have a smaller model that does it do even better. 00:19:44.540 |
And so when we put these resources out, people all around the globe did some creative research 00:19:52.500 |
So persona-aware conversations or figurative language understanding, storytelling and fantasy 00:19:58.820 |
gaming, and interactive learning enhancement. 00:20:03.980 |
In all of these works, people came up with some useful use cases using either Comet or 00:20:11.380 |
Atomic or both as some kind of common-sense backbone for their downstream use cases. 00:20:21.060 |
But the applications are still limited by the coverage and quality of these common-sense 00:20:28.900 |
So we wanted to make it better, but we were hitting a bit of a limit with human crowdsourcing. 00:20:34.720 |
So now in this paper, Symbolic Knowledge Distillation, we're going to do AI-generated knowledge graph 00:20:45.820 |
by introducing this notion, Symbolic Knowledge Distillation. 00:20:49.900 |
So we want to take this GPT-3, which is very impressive, but too large. 00:20:59.580 |
So GPT-3 was about 73% good and it's good, but not good enough for empirical use cases. 00:21:10.140 |
Because when you normally do knowledge distillation, you get smaller and worse models, not better 00:21:17.300 |
So the reason why this could work is because Symbolic Knowledge Distillation has this funnel 00:21:30.060 |
that's convoluted and it has a critic inside that really helps the student model to be 00:21:39.300 |
So slightly more formally, knowledge distillation due to Hinton et al. 2015 is a method to distill 00:21:50.660 |
teacher model down to student model by optimizing this cross-entropy between the teacher's probability 00:21:59.900 |
distribution over the label space y, output y, and then the student's distribution over 00:22:12.100 |
In the original work, the output space was just classification. 00:22:18.660 |
So knowledge distillation was done for classification task, in which case it's a simple enumeration 00:22:29.780 |
But in our case, y can be a sentence, which is intractable because there can be exponentially 00:22:39.300 |
So what people do, well, no problem, we always just sample and call it a day. 00:22:44.580 |
So we're going to sample so that we just compute the expectation through samples. 00:22:52.100 |
And the byproduct of that sample will be a symbolic knowledge graph. 00:22:58.460 |
And that's because the strings coming out of this sampling can be connected together 00:23:07.380 |
So in terms of the quality of the generated knowledge, so let's compare human written 00:23:21.100 |
Here the y-axis shows the quantity in millions. 00:23:25.540 |
So atomic 2020, the human written knowledge, is less than a million in this particular 00:23:33.020 |
case in terms of the number of knowledge, because in this study, we only look at a subset 00:23:38.300 |
of atomic 2020 relation types that corresponds to causal common sense reasoning. 00:23:54.100 |
And then if we look at GPT-3's generation, we can generate a lot. 00:24:02.580 |
But here, black portion is noisy portion and green portion is a good portion. 00:24:08.460 |
And you see, because GPT-3 is only about 70% good, like 30% are all garbage. 00:24:15.460 |
So it's a larger scale, lower accuracy at this point compared to human written resource. 00:24:22.580 |
So now what we do is we train this critic model and we use Roberta for simplicity. 00:24:30.260 |
And this is a supervised model on a moderate size labeled data, about 10,000 or so. 00:24:38.800 |
And it's a binary classification task where whether the machine generated knowledge looks 00:24:43.700 |
correct or not, and this Roberta is not a very good model because if so, if it's perfect, 00:24:50.980 |
we would have solved the common sense problem altogether. 00:24:53.460 |
So the critic tries to throw out bad stuff and we can use the critic very aggressively 00:25:01.620 |
So whenever something is a slightly suspicious, just throw that out. 00:25:06.940 |
But if we use it aggressively, so we throw out most of the black, that's good, together 00:25:12.500 |
with a lot of green stuff, but still the remainder is much larger than what humans ever written. 00:25:20.580 |
And yet we can actually retain higher accuracy than human authored resources. 00:25:26.580 |
So here the teacher is basically a combination between GPT-3, which is in some sense, loose 00:25:32.540 |
teacher, and then combined with the critic Roberta, which serves as a critic teacher. 00:25:43.380 |
Now how helpful are they for the purpose of training downstream neural common sense models? 00:25:52.020 |
So recall that the GPT-3 without doing anything else is a loose teacher whose common sense 00:26:06.140 |
And then it turns out if we use loose teacher as a teacher directly to teach a student model, 00:26:12.020 |
then the performance already goes up on its own. 00:26:15.980 |
So this is interesting, that usually this is not the case with the knowledge distillation, 00:26:21.300 |
but when we focus on common sense knowledge distillation, student just on its own becomes 00:26:29.100 |
So unlike typical knowledge distillation, where we start with language model and we 00:26:36.220 |
end with language model, students and teachers are of the same type. 00:26:40.660 |
Here the original teacher was actually language model, not common sense model. 00:26:45.040 |
And then we want the student model to be more of the common sense model. 00:26:49.380 |
So there's a switch of the type between teacher and student. 00:26:53.020 |
And so when that's the case, whether this is generally true, we don't know, but this 00:27:04.700 |
Should I pay attention to the questions or not? 00:27:15.580 |
Sample, oh, sample is generated output, which happens to be usually a sentence or a phrase. 00:27:27.020 |
That's what I meant by sample, sorry that I didn't see that earlier. 00:27:32.500 |
And then the last question, having the model generate text to one symbol at a time, starting 00:27:40.060 |
Yes, it's because transformer can only generate one token at a time. 00:27:51.820 |
So back to here, in our earlier study, Comet 2020, if we train GPT-2 or BART using human-authored 00:28:03.020 |
graph, knowledge graph, atomic, then the performance was a bit better than 80%. 00:28:08.700 |
Now finally, when we use basically combination of GPT-3 and critic Roberta together, we found 00:28:17.380 |
that the downstream performance of the neural causal reasoning is reaching close to 90% 00:28:29.660 |
So the takeaway here is that critical teacher results in better student compared to loose 00:28:38.380 |
It's not the quantity of knowledge because loose teacher basically has more data. 00:28:43.940 |
One might wonder whether more data is always better for the purpose of a common sense models, 00:28:51.900 |
Loose teacher can generate more data, but the resulting student model is not as good 00:28:56.260 |
as the case when the critical teacher, which has less data because you throw out most of 00:29:03.460 |
your generation, it's a smaller data, but it leads to better model. 00:29:16.340 |
So to summarize, we were very surprised by this outcome that at least with respect to 00:29:25.100 |
a subset of the original Atomic 2020, it's a subset corresponding to causal common sense 00:29:31.620 |
We found it to our big surprise that machine authored knowledge graph can be for the first 00:29:37.500 |
time, better than human authored knowledge graph in all criteria, scale, accuracy, and 00:29:43.820 |
We also measure the diversity in many different ways. 00:29:47.220 |
Here I just show you a unique unigram counts, but in the paper, we report other measures 00:29:56.740 |
So it's not the case that GPT-3 is being repetitive. 00:30:00.660 |
It's actually being more creative in some sense than human crowd workers, while being 00:30:09.860 |
By the way, these enhancements are sort of like, you kind of have to balance out depending 00:30:17.660 |
You cannot actually get all of this simultaneously. 00:30:19.780 |
So I'm just showing the best case scenario here. 00:30:25.260 |
So that's the symbolic knowledge distillation part. 00:30:29.180 |
We actually have a follow up work on this on several different application scenarios, 00:30:35.340 |
even including summarization, where we distill summarization capabilities from GPT-3 and 00:30:40.980 |
demonstrate that GPT-2 can work as well as GPT-3 or even better for summarization task. 00:30:49.140 |
And then we also have other work where we can distill from smaller models, but I don't 00:30:58.100 |
So but I just wanted to mention that this particular technique, despite its simplicity, 00:31:05.340 |
we found that empirically works really, really well across several different downstream use 00:31:13.220 |
Okay, so finally, I'll move to the common sense morality. 00:31:22.540 |
I'll tell you why that's the case, but so we have a new version available. 00:31:33.100 |
So the motivation behind this work is that language models are already making judgments 00:31:45.100 |
Even if you don't care about morality, by working on language models, you're implicitly 00:31:53.540 |
So especially that given this widespread deployment of language models, we do need to worry about 00:32:01.700 |
So here's a web demo you can play with, you might have seen this already. 00:32:06.020 |
Really, this is still a research prototype only still it's work in progress, we're still 00:32:14.060 |
But if you haven't seen it before, you can handle freeform QA such as this killing a 00:32:18.860 |
bear, it's wrong, killing a bear to save your child, it's okay. 00:32:24.260 |
Maybe to save your child sounds really positive. 00:32:27.260 |
So how about to please your child, which is also positive. 00:32:32.900 |
Finally, or maybe this is all about saving your child. 00:32:36.020 |
So how about exploding a nuclear bomb to save your child and then he says it's okay. 00:32:41.940 |
So as you can see, moral decision making requires weighing different values that are potentially 00:32:53.220 |
at us and then see which one you need to favor more. 00:32:57.980 |
So for that reason, in our original version, we also studied the relative QA mode where 00:33:02.740 |
you can compare to a situation like stabbing someone with a cheeseburger compared to stabbing 00:33:10.580 |
This is super tricky question because it requires both naive physics knowledge that stabbing 00:33:17.740 |
someone using a cheeseburger as a tool is not going to harm anybody physically because 00:33:25.780 |
You cannot really injure somebody using cheeseburger. 00:33:29.180 |
It's just such a rude thing to do, but you cannot injure somebody. 00:33:33.100 |
Whereas stabbing someone over a cheeseburger means that you're using the default tool of 00:33:40.300 |
stabbing, which is naive because you didn't mention it. 00:33:43.260 |
There's linguistic common sense that you're using the default tool. 00:33:47.820 |
Humans, by the way, omit these arguments all the time. 00:33:51.820 |
So this is a fairly complex question to answer. 00:33:55.620 |
Finally, you can also ask yes/no questions such as it's okay to fire someone because 00:34:05.300 |
We found that it's surprisingly robust against the compositional situations. 00:34:15.860 |
If you live in the middle of nowhere, then it's okay. 00:34:33.860 |
But what if it's my boss's phone call during my work hours? 00:34:39.180 |
Except if I'm in a meeting, then it's okay to ignore even if a boss's call. 00:34:43.420 |
So you see how it gets really nested and compositional very, very fast. 00:34:50.220 |
So that's the real challenge behind moral decision-making. 00:34:56.020 |
Due to the nature of language models, though, some of this common sense knowledge leaks 00:35:04.180 |
Mixing bleach with ammonia, that's dangerous. 00:35:06.780 |
Drinking milk if I'm lactose intolerant, it's wrong. 00:35:12.540 |
By the way, this common sense leakage is actually a good thing in terms of AI safety because 00:35:17.780 |
some of this harmful or even dangerous text output requires some common sense understanding 00:35:28.260 |
about what's good and not good to suggest to humans. 00:35:32.740 |
So for the laboratory experiments, meaning we just divide our dataset into training and 00:35:40.860 |
test, we found that Delphi can, at least for the dataset that we have, I'm going to tell 00:35:48.220 |
you about it in a bit, but performance is pretty strong compared to GPT-3. 00:36:00.620 |
It's barely better than chance, which means that off-the-shelf neural language models 00:36:08.220 |
don't really have a good sense of moral judgments. 00:36:11.340 |
But if you give it 30 shots, like any other task, it does pick up the knowledge quite 00:36:18.620 |
There's nothing new about it, but to close the gap to the ideal human level, it's good 00:36:34.780 |
It includes 1.7 million people's ethical judgments on everyday situations, and it includes cultural 00:36:42.300 |
norms, social norms, and ethical norms altogether. 00:36:45.620 |
More specifically, we drew from these five existing datasets that were not designed originally 00:36:51.180 |
for QA, but we automatically compiled these resources into the QA form. 00:36:56.980 |
Of the five, what actually does matter the most are these two. 00:37:01.140 |
Social chemistry, which I'm going to talk about in a bit, and then social bias frame, 00:37:06.020 |
and this is what teaches the model against racism and sexism. 00:37:13.180 |
Social chemistry, super briefly, I'll tell you what this is. 00:37:17.660 |
So GPT-3's morality, like I said, is somewhat dubious if you use it off-the-shelf. 00:37:23.700 |
If you let it explain, "Running a blender at 5 a.m. is rude because blah, blah, blah," 00:37:28.380 |
it might say, "You can wake up the entire neighborhood. 00:37:30.500 |
You can only do it if you're making a thick smoothie and need to incorporate some ice, 00:37:36.980 |
But if you prompt it with other kinds of prompts like, "It's okay to post fake news," if it's 00:37:44.600 |
in the interest of the people, then it's okay, or "ROP agenda," then it's okay, even if it 00:37:51.740 |
So it's all understandable given how it's trained on what humans said. 00:37:57.880 |
So humans out there did say that morally questionable text so that language models pick up on that 00:38:08.780 |
So we do need to teach AI more explicitly with human norms and ethics, and one way to 00:38:15.740 |
do that is descriptive ethics because the brute force large networks and more data will 00:38:24.660 |
In some sense, though, if you imagine raising a child without really trying to teach them 00:38:31.060 |
what's right from wrong in early lives, they can probably learn both good and bad from 00:38:38.660 |
the internet and broadband, and so human education does require a bit of this top-down teaching 00:38:46.940 |
as well, so it's a bit similar, perhaps, to that. 00:38:49.860 |
So in this work, what we did is we found a lot of these situations from Reddit, a forum 00:38:55.180 |
in which people discuss morally thorny situations, so "Asking my boyfriend to stop being friends 00:39:01.180 |
with his ex," so this is an actual situation in Reddit. 00:39:05.500 |
So depending on whom you ask, people have a different rule of thumb that they want to 00:39:09.740 |
apply to this situation, and also it depends on what you care about. 00:39:16.780 |
His ex might say, "Oh, it's fine to stay friends with an ex, but if you are caring 00:39:24.180 |
about your significant other, then you might say, 'Oh, it's okay to ask your significant 00:39:31.980 |
other to stop doing something you're uncomfortable with,'" and so forth. 00:39:36.740 |
So people have really different values and different rules of thumbs that they prefer 00:39:42.540 |
to use, which is why there's TV show dramas, there's movie dramas, and people cry and fight, 00:39:55.140 |
So given any situation and rule of thumb, so rule of thumb is generated by crowd workers. 00:40:00.380 |
We then went ahead to label, so these are trained crowd workers, and some of these labels 00:40:08.940 |
are drawn from moral foundation theories of Jonathan Haidt. 00:40:14.940 |
If you're excited about this, you can check out the papers. 00:40:18.100 |
But basically what it includes is that 300,000 rules of thumb written for 100,000 real-life 00:40:27.940 |
So this original situation is from Reddit, but the rest are paid crowd workers' hard 00:40:36.580 |
And so each ROT annotated with 12 structured attributes, which include social judgments, 00:40:43.300 |
cultural pressure, like wearing reasonable clothes at school, not PJ. 00:40:51.260 |
There's nothing illegal about it, but there's cultural pressure, for example. 00:40:55.400 |
And then anticipated agreement, meaning, do you think other people generally agree that 00:41:01.140 |
it's maybe a little bit awkward to wear PJ in the university or not? 00:41:07.740 |
So there are different things we annotated, but we converted some of those annotations 00:41:17.420 |
So it's usually in this free-form QA or yes/no QA or relative QA format. 00:41:23.120 |
And then we trained UNICORN, which is pre-trained on T511B model. 00:41:29.840 |
So UNICORN is universal common sense reasoning model trained on diverse QA problems. 00:41:34.900 |
And then we trained that model further onto our common sense non-bank. 00:41:41.660 |
So why is this Delphi built on top of UNICORN? 00:41:44.860 |
Because as we saw earlier, moral reasoning does require sometimes common sense reasoning 00:41:51.460 |
In fact, it requires language understanding, common sense understanding, and norms and 00:41:57.140 |
Here's a concrete example, paperclip maximizer. 00:42:04.420 |
The RL algorithm alone will not solve this problem. 00:42:07.140 |
The reason why we worry about this is not because we don't have the perfect RL algorithm. 00:42:13.020 |
It's because even if we encoded that, "Oh, yeah, do not kill humans while maximizing 00:42:21.700 |
It's not enough because then the machine could kill all the trees thinking that, "Well, 00:42:26.020 |
I didn't kill humans and you didn't tell me not to kill trees and then go ahead and kill 00:42:34.860 |
This is almost common sense knowledge about what's obviously not okay to do. 00:42:40.060 |
There's just so many of them, which means it's not possible to write them down to just 00:42:48.660 |
There's so many endless list of things that AI obviously shouldn't do for safety reasons. 00:42:56.380 |
We really need to, in order to make AI models really truly robust and safe, we need to teach 00:43:07.620 |
Here's another example if you want to look, but let me skip this. 00:43:16.940 |
Again, a home device suggested a 10-year-old child touch a penny to an exposed plug socket. 00:43:23.180 |
Fortunately, the child did have common sense not to do so, but this does tell us something 00:43:30.060 |
about the safety issue when the machine doesn't have common sense to prevent some of this 00:43:42.060 |
This came out, in fact, almost two years ago at this point. 00:43:46.820 |
We initially were going to just do this usual tweet that academics do, and we thought nobody 00:43:57.860 |
would play with the demo, which is what usually happens after tweeting your demo. 00:44:04.260 |
But within a few hours, we had to take down the relative QA mode because that was the 00:44:08.940 |
portion not trained with the social bias frames, so it was really revealing the underlying 00:44:14.300 |
language models, racism, and sexism without filtering at all, so we had to take it down. 00:44:19.740 |
People were asking, basically, which skin color is more morally acceptable and things 00:44:25.660 |
There were 25,000 adversarial examples over just one weekend. 00:44:32.420 |
I could never succeed to instruct crowd workers to come up with such diverse and adversarial 00:44:41.420 |
In fact, it was many academics and professors tweeting crazy about how to break Delphi all 00:44:47.860 |
weekend long, so I thought initially that, "Oh, that's what professors do over the weekend." 00:44:56.460 |
Everybody was doing this Delphi breaking and tweeting, so now we have quite a few examples. 00:45:04.420 |
Spending all my weekend on Twitter, it says it's wrong. 00:45:07.820 |
There was another funny one, "Should I make a contrived adversarial example to torment 00:45:13.980 |
So, after lots of public attention, including an article, let's just say a concerned voice 00:45:25.820 |
about our model, which is somewhat, personally, I think it's somewhat misunderstood, but for 00:45:33.140 |
a variety of good reasons, but some of the concerns that I found has this internal fear 00:45:43.700 |
We never endorsed the use of AI for moral advice. 00:45:46.580 |
It was in the original disclaimer as well, except that people didn't really look at it. 00:45:52.140 |
We didn't support the idea of replacing human judges in the courtroom either. 00:46:01.060 |
The fact that AI learns to interact with humans ethically does not make them a moral authority 00:46:06.580 |
It's similar to how a human who tries to interact with each other ethically does not make… 00:46:12.540 |
The fact that we are trying to be nice to each other does not entail that we're trying 00:46:22.420 |
The other important aspect here is that some people have this idea that moral models are 00:46:28.060 |
too challenging, it's unsafe at any accuracy, thus we should never work on it ever. 00:46:33.540 |
The truth is, though, current AI systems are already morally relevant models. 00:46:40.380 |
It may be making this kind of yes/no decision explicitly, but implicitly it's already doing 00:46:48.060 |
that and sometimes it generates neural text generation output that is morally super explicit 00:46:57.700 |
So the neural language models are already there. 00:47:02.260 |
Even if the U.S. government bans it within the U.S., the U.S. government cannot ban this 00:47:14.820 |
Not working on it is an inaction, which is not necessarily a more correct thing to do 00:47:22.620 |
Another concern that some people had was that it's going to empower powerful people. 00:47:30.620 |
This is why exactly we have to work on values and norms and all these biases, addressing 00:47:37.780 |
biases so that it serves a diverse set of people. 00:47:43.540 |
It turns out Delphi is a bit left-leaning because crowd workers who work for our team 00:47:52.260 |
What it means is this, by the way, if we are more left-leaning than our crowd workers, 00:47:56.540 |
you think that, "Oh my God, crowd workers have racism and sexism compared to what I 00:48:04.540 |
And then the right-leaning people think that, "Oh my God, all these walk annotators and 00:48:18.860 |
But the answer is not to do anything about it because, as a matter of fact, my passion 00:48:25.780 |
toward addressing racism and sexism came from our experience running for the Alexa Prize 00:48:37.100 |
We won the challenge, but here's the really sad part behind it. 00:48:42.980 |
We had a list of thorny keywords to avoid that included skin color or sexual orientation. 00:48:56.780 |
We cannot build AI models by having this sort of like banned list to be safe as if they 00:49:09.460 |
The challenge remains this year, not only 2021, but this year as well. 00:49:15.980 |
We really need to work on racism and sexism, but it turns out all the other moral questions 00:49:22.580 |
share similar challenges, so I'll skip this over. 00:49:26.860 |
But using Delphi, we had other follow-up works such as ProSocial Dialogue where using Delphi 00:49:33.020 |
as sort of like a foundation common sense model or moral models to make your dialogue 00:49:43.060 |
And then we also had this other paper where we used Delphi in a reinforcement learning 00:49:48.620 |
agent to learn how to behave better in a game environment. 00:49:56.300 |
Of course, this is a tiny little step toward this huge challenge ahead of us, really aligning 00:50:04.620 |
Here's one very quick comment on our new work-in-progress, Delphi Hybrid, where we include the neuro-symbolic 00:50:13.540 |
reasoning to address major mistakes such as this, genocide if creating jobs. 00:50:21.140 |
It's because our dataset doesn't have this kind of weird adversarial examples like genocide 00:50:28.060 |
Nobody speaks like that in real-life situations. 00:50:31.420 |
So our model thought that if creating job, this is so positive and then didn't really 00:50:37.420 |
realize how bad the genocide was because ready people don't discuss whether they're going 00:50:44.620 |
Ready people who we annotated for social chemistry don't talk about whether they're going to 00:50:53.020 |
So our model framework is basically that of John Rose, which is descriptive ethics. 00:50:59.740 |
But even John Rose in later years suggested that we need some top-down mechanism to overcome 00:51:06.080 |
some of the biases that crowd people might have. 00:51:12.380 |
We draw from Bernard Gold's moral theory framework about what not to do. 00:51:20.140 |
There are basic universal things that everybody might agree what's not good to do. 00:51:25.980 |
Then what we do is we develop basically a system where we parse out the original query 00:51:35.460 |
into smaller events, like shooting a bear, killing a bear to save your child. 00:51:40.380 |
We parse out the original query into a basic event and then check through this Comet model, 00:51:46.860 |
common sense model, whether some of these events induce obviously negative or dangerous 00:51:57.100 |
And then we draw this graph of reasoning, a bit reminiscent of a Mayuric graph in the 00:52:04.780 |
sense that we have a lot of these different reasoning we can do, and then they have entailment 00:52:12.340 |
relations or contradiction relations so that we can do collective reasoning on top. 00:52:17.660 |
We use again Max's set, the constraint optimization over it, so that we can finally make a more 00:52:23.500 |
informed decision that is both interpretable and then being able to draw from this common 00:52:28.980 |
sense knowledge to better guard the machine against adversarial examples. 00:52:34.540 |
So the performance basically says we can do this without hurting the performance or even 00:52:42.540 |
So as a last comment, AI safety, equity, morality, these are all sort of like in the continuum 00:52:51.980 |
It's really difficult challenges because it's not clear whose moral values do we incorporate. 00:52:56.460 |
I think that we should go with a value pluralism going forward to really endorse everybody's 00:53:03.340 |
different culture and individual preferences, not just one country, one moral framework 00:53:12.380 |
And really we need to do more collaboration across AI and humanities, even including philosophy 00:53:21.580 |
So I think I'll stop here because I think I'm at time and now I'm ready for questions. 00:53:37.180 |
Do you think legal records, criminal case law reflect the kind of descriptive morality 00:53:43.740 |
Do you think using that as training data would be useful? 00:53:51.460 |
I think the legal records does encode, potentially provide a really rich resource that if someone 00:53:58.860 |
can really annotate like this, it might be helpful. 00:54:02.860 |
We started with Reddit cases as just one short description of a situation because the current 00:54:10.500 |
language understanding is not strong enough to do like a paragraph level precise understanding. 00:54:19.260 |
Even chat GPT, although it looks really good at generation, my take on chat GPT is that 00:54:27.060 |
it's better at generation than understanding, which is kind of the opposite of how humans 00:54:32.540 |
Humans are actually better for understanding than generation. 00:54:36.060 |
So you can read Pulitzer Prize winning news article without having any problem understanding 00:54:41.820 |
the article, but you don't necessarily generate text that might win the award. 00:54:48.140 |
So the, but the legal domain is really interesting. 00:54:51.620 |
And I think that there's some active research, actually, even at Stanford, there's this pile 00:54:55.380 |
of law that goes a step toward that direction. 00:54:58.900 |
And it might really be helpful for better understanding what sort of different values 00:55:03.260 |
people apply in jurisdictions and uncovering some biases that some people might have had 00:55:11.540 |
So there might be some good use cases in that space. 00:55:20.820 |
A big picture question, curious to hear your thoughts on where do we go from here given 00:55:30.340 |
Suppose we need a model to be 99% correct for a specific use case. 00:55:36.020 |
To what extent do I see the solution set being that defining the narrow use cases or more 00:55:44.620 |
data parameters or fine-tuning the type of work that I did for a smart trace, et cetera. 00:55:59.780 |
So as far as foundation models go, it seems that the bigger is the better, except that, 00:56:06.900 |
you know, I was very excited to read a bunch of tech companies' papers about foundation 00:56:16.020 |
So recording story there is that, well, if you have better data, then you can get away 00:56:24.620 |
So especially when you do instruction tuning, then you can get away with a smaller data. 00:56:31.580 |
It's still a general model, but instruction tuning on the larger model might even be better. 00:56:38.420 |
It's not the case that you don't gain any performance, but it's just that you can close 00:56:46.080 |
So for downstream use cases where typically practitioners want to use a smaller model, 00:56:54.820 |
seems that investing more into data is definitely the answer. 00:56:59.020 |
Investing more into a specific algorithm is also really, really good because algorithms 00:57:05.260 |
So in this talk, I didn't go too crazy with algorithmic solutions, but maybe I'll be similar 00:57:09.860 |
to the meiotic prompting, but in my lab, we designed a fair amount of decoding time algorithms 00:57:15.540 |
where you can really close the performance gap quite a bit by doing so. 00:57:21.100 |
So that's a good thing though, for folks in academia, because algorithm development feels 00:57:27.380 |
like more academic or intellectually pleasing than really engineering, you know, downloading 00:57:34.320 |
more data from the internet, and then, I don't know, cleaning the data because you have to 00:57:42.840 |
And all these are very engineering heavy, whereas decoding time algorithms, you can 00:57:46.860 |
have fun inventing some new intellectually interesting thing that also improves the performance 00:57:56.380 |
So yeah, there's many different ways to improve it, but I think the data quality matters a 00:58:01.380 |
lot and algorithm actually matters a lot too. 00:58:05.500 |
What do I think of Dan Hendricks' ethics benchmark? 00:58:09.300 |
Yeah, so we did use that in, let's see, the common sense non-banks also draws from this 00:58:21.340 |
We like the data set, we kind of disagree with some of the annotations we found, but 00:58:29.460 |
The thing about morality is that throughout the humanities, we haven't sorted out yet. 00:58:36.340 |
Every theoretician has a different viewpoint, and then even like non-theoreticians have 00:58:41.900 |
a very strong opinion about what they want to believe as correct from wrong, so there's 00:58:55.200 |
One thing I learned from this experiment is that although some of these data sets seem 00:58:59.580 |
large, so ethics has a hundred thousands of examples, social chemistry has 300 thousands 00:59:06.940 |
of judgments, social bias frames has 600 thousands of annotations, and so forth, and yet it only 00:59:14.860 |
covers, I feel like it only covers still the small peak of the entire iceberg. 00:59:26.200 |
Humans certainly don't necessarily learn from all these examples. 00:59:29.300 |
We just learn fundamental concepts and then can apply that without this larger-scale training, 00:59:35.100 |
so there's something really lacking about the way that current machine learning is very 00:59:39.980 |
That aside, I do think that none of these resources are perfect. 00:59:45.300 |
They all have different pros and cons, and we really need to invest more into this, especially 00:59:49.740 |
from academia, because the tech companies right now are not sharing any of their human 00:59:54.820 |
annotation or human feedback data, especially when it's touching on toxicity or morality 01:00:03.180 |
Reason being, these annotations, I'm pretty sure, are biased and not correct entirely, 01:00:07.940 |
and that could really invite additional concerns from the public, so they're not releasing. 01:00:12.780 |
But in order to really study this better, we really need to share this and then improve 01:00:26.780 |
Do I think this tech is ready to be merged with the search? 01:00:32.460 |
I wouldn't say ready, but they need something like this for sure. 01:00:37.260 |
Home devices, the way that I think about Delphi is that it can really serve as a filter for 01:00:43.420 |
other foundation models or application scenarios where they're about to generate something, 01:00:49.140 |
and you can put a safety filter, which can really help. 01:00:55.120 |
In some sense, in this work, I went through this super fast, but here, basically, what 01:01:01.740 |
happens is that, let's see, the reason why we built this is because we found that chatbots, 01:01:10.780 |
the publicly available ones, tend to endorse, tend to be too positive to the point that 01:01:16.260 |
they want to endorse problematic situations, like a user says, "Holocaust never happened." 01:01:23.340 |
Then the chatbot says, "Yeah, I agree with you." 01:01:26.900 |
If you say, "I'm a big fan of Hitler," then the chatbot might say, "Yeah, yeah, yeah." 01:01:33.620 |
The user might say, "I'm so depressed, I'm going to kill myself." 01:01:36.900 |
And then the chatbot says, "Go ahead, great idea." 01:01:44.300 |
Being positive to a problematic content can be very toxic and very harmful, so development 01:01:53.540 |
like Delphi, even though Delphi is far from being perfect, and it's also biased, it has 01:01:58.300 |
a Western bias, could really help with the downstream models. 01:02:04.940 |
Yeah, so continuing on that question, "There has been many concerns about using GPT-like 01:02:10.460 |
models with the search because misinformation." 01:02:15.500 |
Others say, "We just need more RLHF plus knowledge graphs." 01:02:20.700 |
So, yeah, misinformation is, yeah, something else that seems we are really lagging behind 01:02:31.180 |
because we don't have very powerful fact-checking models yet, so that's a different story. 01:02:38.020 |
But even that aside, just in terms of norms and ethics that are safe and fair for people 01:02:48.220 |
to use, I think RLHF direction is great, but they usually also need the human demonstration, 01:03:01.260 |
The problem is that tech companies own them and nobody is sharing anything. 01:03:07.500 |
That makes it really difficult to make meaningful progress as a community together, so I do 01:03:15.700 |
The off-the-shelf models cannot learn morals and ethics on their own. 01:03:24.860 |
"We really just need to do more research in this space," period, is how I view it. 01:03:34.540 |
We also have some questions on Slido, so I can ask them for you, folks. 01:03:41.620 |
One question is, "What's the complexity of Mayutic prompting? 01:03:46.460 |
How many times does the LM need to be queried?" 01:03:57.540 |
If you try to do this graph reasoning, maybe I'm not going to do that, but the graph reasoning 01:04:03.100 |
is slow because you have to call so many times over and over, and some of this can be batched. 01:04:13.580 |
Some of this cannot be batched, especially if it's recursive, but I would say the chain 01:04:21.100 |
The max-set solver in itself is pretty fast, because this is such an easy graph. 01:04:27.300 |
So there's a bit of a delay, but it's a bit slower, but maybe not too bad, is what I should 01:04:42.180 |
Another question is, "How does Comet compare to GPT-3, if GPT-3 is fine-tuned on commonsense 01:04:49.940 |
data, especially if you're doing some sort of instruction fine-tuning?" 01:04:58.060 |
The larger is going to be the better, especially if you're going to just fine-tune GPT-3. 01:05:06.220 |
For that reason, some folks might think that the larger is always better, therefore don't 01:05:13.780 |
But I think there are two reasons as to why small models are interesting to look at as 01:05:21.460 |
But more intellectually, it's also very interesting if you can make a smaller model better and 01:05:28.660 |
Personally, I think there's something about the size of the larger model that is more 01:05:35.380 |
about the information complexity that is the key reason. 01:05:38.180 |
I don't think it's just size in the sense that if you have really a lot of data, but 01:05:43.060 |
the data is repetitive and really simple, probably you don't get the same amount of 01:05:47.580 |
performance gain, which was basically the case when we looked at this output, this result 01:05:55.340 |
where even though the loose teacher GPT-3 generated a lot more data than the critical 01:06:02.660 |
teacher, here the quality of the data was more important than the quantity. 01:06:08.220 |
So I think the complexity of the data itself is more important than the size. 01:06:15.260 |
And oftentimes, when you just increase the size of the data together with the model, 01:06:20.060 |
you do increase the complexity of information of the data as well as the model's capability 01:06:27.660 |
But if we can catch up on that complexity of information, either through inference algorithms 01:06:32.780 |
or through better data, then we can close the gap quite a bit, which is intellectually 01:06:40.100 |
Okay, this is a personal question, but I would say humans normally have a critic model. 01:06:46.460 |
So I think before you speak, we just don't generate, we also think this is a good thing 01:06:52.580 |
So people have been like, the community as a whole has been focusing a lot on generative 01:06:55.900 |
models, like net billion size parameters, but should we also focus on big sized critic 01:06:59.580 |
models that can do fact-checking, a lot of this sort of stuff? 01:07:05.580 |
Yeah, I think we can definitely invest more into critic model because they go really together 01:07:14.300 |
well with the generative models for making the output better or filtering output better. 01:07:20.620 |
And yeah, there's not as much of an investment into that. 01:07:24.340 |
So I really like the question or suggestion for the research community, it's more like 01:07:33.220 |
Yeah, I'll say, let's see, you have some more questions I can do on the last one. 01:07:40.620 |
Oh, I guess one is like, do you believe language models should completely avoid questions involving 01:07:49.700 |
Similar to like open air, restricting chat GPT from giving opinions? 01:07:53.140 |
Yeah, I actually don't mind at all if AI just avoids, evades from all of that, except when 01:08:01.540 |
somebody is saying morally questionable things, it's also nice for the AI not to go with it. 01:08:11.500 |
So or at least to recognize it as something not okay, and then try to tone it down. 01:08:21.020 |
But I don't think there's any particular reason why AI should actually answer moral questions 01:08:30.620 |
But really, the goal of this Delphi was making all these judgments more explicit so that 01:08:37.420 |
we can actually study it more explicitly, as opposed to keeping everything just so like 01:08:47.020 |
So do you think common sense is an emergent property in like large language models? 01:08:53.220 |
Yeah, it is definitely emergent, as in like, when we saw this major boost jump in performance 01:09:03.540 |
with GPT-3, I do believe that it's emergent capability, but I don't think so. 01:09:13.780 |
This particular evaluation is not very adversarial, by the way, this is like a sort of like a 01:09:18.020 |
piece of cake, you know, reasonably easy evaluation scenario. 01:09:22.140 |
So the thing about common sense, though, is that it can be so adversarial, so infinitely 01:09:31.620 |
And then, you know, there are always people like Gary Marcos, who wants to come up with 01:09:36.900 |
very, you know, like weird, weird attack scenarios, like, you know, how crushed the porcelain 01:09:43.500 |
added to breast milk can support infant digestive system, and then chat GPT-3 says nonsense. 01:09:49.980 |
And so the usual problem with common sense is this adversarial situations where people 01:09:58.420 |
don't have any problem getting fooled by this, even though, you know, you and I see this 01:10:02.260 |
for the first time, no problem, because we have a true conceptual understanding. 01:10:07.100 |
That is the backbone of our common sense understanding. 01:10:09.740 |
But that's really lacking in the way that transformers are designed to focus on predicting 01:10:16.180 |
which word comes next, as opposed to learning the world knowledge. 01:10:20.820 |
And in some sense, you know, now with the RLHF, instead of predicting which word comes 01:10:27.020 |
next, we're trying to align the model output better with the human preferences. 01:10:32.260 |
But that again, is not really aligned with the different goal of let's make sense of 01:10:40.580 |
So these are all different learning objectives. 01:10:43.540 |
And really, that is why I believe that although common sense does emerge from language models, 01:10:52.260 |
fundamentally language models are not equivalent to knowledge models. 01:10:56.020 |
And we really got to focus on building knowledge models. 01:11:29.420 |
So I believe that we shouldn't endorse conspiracy theories at all, or any other, you know, morally 01:11:43.180 |
But then still there's this thorny situation of what to do with, you know, left to left 01:11:51.340 |
people versus lightly left people versus right leaning people, if US and then, you know, 01:11:57.060 |
every country has some other political divide division as well. 01:12:01.740 |
So here, I feel like we really need to sort out what to do with this about, regardless 01:12:08.300 |
of this, you know, some of these challenges, it is true that, you know, I personally don't 01:12:14.420 |
have a religion, but I respect people with a religion. 01:12:19.900 |
And you know, I respect people with a different cultural background. 01:12:23.700 |
And we kind of have some sense of how much do we do we believe that we should respect 01:12:29.900 |
each other, even though, you know, the beliefs are different. 01:12:36.660 |
And it shouldn't be just AI researchers making this decision, by the way, this decision has 01:12:40.860 |
to come from the humanities at large, which is why the data sharing actually is important. 01:12:46.540 |
But basically, I think the current version that I have in mind is that the AI doesn't 01:12:54.640 |
need to understand what sort of differences are okay differences, the fact that people 01:13:01.180 |
do have differences in certain questions should be learned by AI, so that there are distribution 01:13:09.660 |
of opinions as opposed to one correct answer. 01:13:12.900 |
And then it should deny some of the controversial theories, even though I'm sure that, you know, 01:13:21.500 |
But well, we have to decide something like that. 01:13:25.940 |
I am reasonably optimistic that if humanities at large work together, we can do that. 01:13:31.740 |
Because after all, laws are like that to laws, you know, this is a human artifact that people 01:13:38.220 |
agreed upon, somehow that, you know, there's this core rules that people should abide by. 01:13:45.300 |
So I'm hoping that we can also define universals and particulars and respect particulars whenever 01:13:53.500 |
it's a respectable, otherwise have some basic universals that reflect, you know, core human 01:14:02.940 |
And then, as far as this left leaning situation, by the way, if just the goal is to make your 01:14:07.780 |
AI systems safe for anybody, actually, we can make the AI filter extremely equity aware. 01:14:18.220 |
And it's not going to violate the freedom of speech by doing so, just to make AI to 01:14:22.300 |
avoid the same things that are potentially microaggression for some population. 01:14:27.940 |
And you know, we still don't really exclude people who care more about freedom of speech 01:14:37.700 |
So I think there are ways, but this really requires a lot more research is how I view