back to indexStanford XCS224U: Natural Language Understanding I Experiment Protocol Overview I Spring 2023
Chapters
0:0
0:23 Rationale
1:15 Requirements [link]
5:49 Other tips and resources
00:00:06.000 |
This short screencast is an overview of the experiment protocol document. 00:00:10.200 |
This is the second document on your way to the final paper. 00:00:15.080 |
but I think it's really important in terms of helping you and your teammates and 00:00:18.840 |
your mentor get complete clarity on what you're going to achieve for the final paper. 00:00:23.500 |
The rationale at high level is the same as the one for the lit review. 00:00:28.780 |
We want you in productive dialogue with your teammates and with yourself and 00:00:33.480 |
with your mentor about the scope of the project and its overall goals. 00:00:38.300 |
We want you to identify the core questions you'll be addressing at this stage. 00:00:42.700 |
We want you to identify your core methods, that is, data, models, metrics, and everything 00:00:49.460 |
And maybe most importantly for this phase, we want you to identify obstacles and propose 00:00:56.380 |
The more the merrier in terms of uncovering these things that are threatening the convergence 00:01:01.960 |
If we can identify these things at this stage, we can probably find really productive workarounds. 00:01:07.140 |
But as it gets closer and closer to the final paper deadline, it becomes harder and harder 00:01:11.560 |
to get these things to converge in a happy way. 00:01:15.720 |
The requirements are linked from the course website. 00:01:22.580 |
This is a short, structured report that is establishing your core experimental framework 00:01:31.020 |
They can get long if you have a lot of obstacles or points of uncertainty, but by and large, 00:01:36.260 |
if things are going well, we expect this to be a short document. 00:01:39.120 |
We do specify that the max is eight pages, but I'll say that the norm is for them to 00:01:46.140 |
We have, as before, an optional Overleaf template. 00:01:51.060 |
We encourage you to use it, but it is not required at this stage. 00:01:57.220 |
I mentioned these in the lit review overview. 00:01:59.060 |
We want you to now to state your core hypothesis or hypotheses as clearly as possible. 00:02:05.920 |
We want you to talk about what data resources that you're going to use and any limitations, 00:02:10.380 |
access limitations, producing data is a big limitation, anything that might threaten the 00:02:19.700 |
If it's a standard sort of classification problem, it might be a very short thing that 00:02:23.460 |
you report, like you say, macro F1, but if you're working on a specialized problem or 00:02:29.060 |
inventing your own metrics, this might be a more detailed discussion. 00:02:35.540 |
We want to know about the baselines that you plan, the comparison points, the ablations, 00:02:40.640 |
all of that stuff, and we would like to see how it keys into the core hypotheses you listed 00:02:47.700 |
And then the general reasoning, how does the project come together? 00:02:50.760 |
How do the data and the models and the metrics connect with the hypothesis? 00:02:55.560 |
I think that's the most important thing, most powerful thing to convey at that point. 00:03:03.900 |
I emphasize you are not required to have any results at this stage. 00:03:07.700 |
It could be purely a planning document, but if you do have results, it's great to report 00:03:14.060 |
It means that you've got a minimal viable project and we can start to build on whatever 00:03:21.940 |
But we would like to know what you've done so far. 00:03:24.260 |
Even if it's just assembling the raw ingredients, let us know. 00:03:28.240 |
If you've run some experiments and you got stuck, that's an obstacle that we'll want 00:03:35.000 |
It's not evaluative in terms of us giving you a grade based on how far along you are. 00:03:41.740 |
It is entirely evaluated based on the extent to which you can give us a clear insight into 00:03:49.740 |
And then finally, as always, a required references section. 00:03:53.540 |
In terms of tips, I would say first, there's no particular length we have in mind. 00:03:58.580 |
A short or a long rubric could be bad or good depending on the state of your project and 00:04:08.820 |
As I said before, please try to call out concerns you have, even if they are distant ones. 00:04:13.060 |
This is meant to be a last chance to make sure the project will converge in the time 00:04:17.880 |
So you might as well err on the side of more disclosures. 00:04:21.580 |
Yes, you need to be able to state a hypothesis. 00:04:25.220 |
It is common for engineers to come to me and say, but I don't have a hypothesis. 00:04:30.500 |
All I want to do is see whether this model is a good model for my problem. 00:04:37.260 |
Just state that as a claim about what you think will work. 00:04:40.500 |
And that will actually guide us intellectually and also guide us in terms of choosing baselines 00:04:45.140 |
and ablation studies and other things that will give us insight into whether you're right 00:04:49.460 |
about this thing that you feel about this model that you're evaluating. 00:04:53.740 |
No, you do not need to report results, as I said before. 00:04:57.480 |
But they are very welcome because they're a sign that you've kind of got all the working 00:05:01.580 |
pieces in place and the project machine is functioning. 00:05:06.720 |
We want you to have a full working pipeline as soon as possible. 00:05:10.100 |
It is, I confess, tempting to insist on initial results so that that pipeline would be in 00:05:16.060 |
But the spirit of this is that we want you basically to be at a state where any day you 00:05:24.220 |
It might not be the one that you envisioned, but you could submit it. 00:05:27.380 |
Once you get to that point, it's a really happy state in which you're mainly just adding 00:05:31.540 |
new experimental results, improving the reporting, adding analyses and other things that allow 00:05:39.580 |
So get that minimal viable project in the bag soon so that you can do creative exploration 00:05:45.500 |
without feeling like you're under a lot of undue pressure. 00:05:49.940 |
For other tips and resources, I have a very large markdown document that covers lots of 00:05:55.140 |
FAQs that I've seen in the past, discussions of each one of the documents that's associated 00:06:00.740 |
with the final project work, examples of past final papers that have gone on after some 00:06:05.900 |
work to be publications, and a whole lot else besides. 00:06:09.700 |
So if you feel like you just need some guidance on crucial points about how to develop a project 00:06:14.620 |
in the space, I highly recommend this document. 00:06:19.700 |
And if your paper goes on to be a publication, I would love to hear about it. 00:06:24.380 |
Drop me a note and I will, with your permission, add that to the list of published papers stemming 00:06:30.300 |
I'm very proud of how long and diverse that list is.