back to index

Stanford XCS224U: Natural Language Understanding I Experiment Protocol Overview I Spring 2023


Chapters

0:0
0:23 Rationale
1:15 Requirements [link]
5:49 Other tips and resources

Whisper Transcript | Transcript Only Page

00:00:00.000 | Welcome, everyone.
00:00:06.000 | This short screencast is an overview of the experiment protocol document.
00:00:10.200 | This is the second document on your way to the final paper.
00:00:13.320 | It's a somewhat unusual document,
00:00:15.080 | but I think it's really important in terms of helping you and your teammates and
00:00:18.840 | your mentor get complete clarity on what you're going to achieve for the final paper.
00:00:23.500 | The rationale at high level is the same as the one for the lit review.
00:00:27.400 | It's about productive dialogue.
00:00:28.780 | We want you in productive dialogue with your teammates and with yourself and
00:00:33.480 | with your mentor about the scope of the project and its overall goals.
00:00:38.300 | We want you to identify the core questions you'll be addressing at this stage.
00:00:42.700 | We want you to identify your core methods, that is, data, models, metrics, and everything
00:00:47.840 | that goes along with that.
00:00:49.460 | And maybe most importantly for this phase, we want you to identify obstacles and propose
00:00:54.980 | some workarounds.
00:00:56.380 | The more the merrier in terms of uncovering these things that are threatening the convergence
00:01:00.960 | of the project.
00:01:01.960 | If we can identify these things at this stage, we can probably find really productive workarounds.
00:01:07.140 | But as it gets closer and closer to the final paper deadline, it becomes harder and harder
00:01:11.560 | to get these things to converge in a happy way.
00:01:15.720 | The requirements are linked from the course website.
00:01:18.160 | You can see the link at the top here.
00:01:20.340 | Let me just offer a quick rundown.
00:01:22.580 | This is a short, structured report that is establishing your core experimental framework
00:01:28.100 | and kind of talking about it.
00:01:29.660 | I emphasize short.
00:01:31.020 | They can get long if you have a lot of obstacles or points of uncertainty, but by and large,
00:01:36.260 | if things are going well, we expect this to be a short document.
00:01:39.120 | We do specify that the max is eight pages, but I'll say that the norm is for them to
00:01:43.660 | be shorter than that.
00:01:46.140 | We have, as before, an optional Overleaf template.
00:01:48.920 | This is just derived from the ACL templates.
00:01:51.060 | We encourage you to use it, but it is not required at this stage.
00:01:55.580 | And then we come to the required pieces.
00:01:57.220 | I mentioned these in the lit review overview.
00:01:59.060 | We want you to now to state your core hypothesis or hypotheses as clearly as possible.
00:02:05.920 | We want you to talk about what data resources that you're going to use and any limitations,
00:02:10.380 | access limitations, producing data is a big limitation, anything that might threaten the
00:02:15.460 | need that you have for data.
00:02:17.620 | Metrics.
00:02:18.700 | This is important.
00:02:19.700 | If it's a standard sort of classification problem, it might be a very short thing that
00:02:23.460 | you report, like you say, macro F1, but if you're working on a specialized problem or
00:02:29.060 | inventing your own metrics, this might be a more detailed discussion.
00:02:34.020 | Models of course, or architectures.
00:02:35.540 | We want to know about the baselines that you plan, the comparison points, the ablations,
00:02:40.640 | all of that stuff, and we would like to see how it keys into the core hypotheses you listed
00:02:45.760 | in 4A.
00:02:47.700 | And then the general reasoning, how does the project come together?
00:02:50.760 | How do the data and the models and the metrics connect with the hypothesis?
00:02:55.560 | I think that's the most important thing, most powerful thing to convey at that point.
00:03:01.100 | And then a summary of progress so far.
00:03:03.900 | I emphasize you are not required to have any results at this stage.
00:03:07.700 | It could be purely a planning document, but if you do have results, it's great to report
00:03:13.060 | them.
00:03:14.060 | It means that you've got a minimal viable project and we can start to build on whatever
00:03:18.720 | insights those results have offered.
00:03:20.940 | But it's not required.
00:03:21.940 | But we would like to know what you've done so far.
00:03:24.260 | Even if it's just assembling the raw ingredients, let us know.
00:03:28.240 | If you've run some experiments and you got stuck, that's an obstacle that we'll want
00:03:31.460 | to identify.
00:03:32.940 | That is the spirit of this progress report.
00:03:35.000 | It's not evaluative in terms of us giving you a grade based on how far along you are.
00:03:40.740 | Quite the contrary.
00:03:41.740 | It is entirely evaluated based on the extent to which you can give us a clear insight into
00:03:46.860 | how you're doing for this project.
00:03:49.740 | And then finally, as always, a required references section.
00:03:53.540 | In terms of tips, I would say first, there's no particular length we have in mind.
00:03:58.580 | A short or a long rubric could be bad or good depending on the state of your project and
00:04:04.200 | what you're trying to achieve.
00:04:05.460 | So completely open-minded about that.
00:04:08.820 | As I said before, please try to call out concerns you have, even if they are distant ones.
00:04:13.060 | This is meant to be a last chance to make sure the project will converge in the time
00:04:16.880 | allotted.
00:04:17.880 | So you might as well err on the side of more disclosures.
00:04:21.580 | Yes, you need to be able to state a hypothesis.
00:04:25.220 | It is common for engineers to come to me and say, but I don't have a hypothesis.
00:04:30.500 | All I want to do is see whether this model is a good model for my problem.
00:04:34.500 | And I say to them, that is a hypothesis.
00:04:37.260 | Just state that as a claim about what you think will work.
00:04:40.500 | And that will actually guide us intellectually and also guide us in terms of choosing baselines
00:04:45.140 | and ablation studies and other things that will give us insight into whether you're right
00:04:49.460 | about this thing that you feel about this model that you're evaluating.
00:04:53.740 | No, you do not need to report results, as I said before.
00:04:57.480 | But they are very welcome because they're a sign that you've kind of got all the working
00:05:01.580 | pieces in place and the project machine is functioning.
00:05:06.720 | We want you to have a full working pipeline as soon as possible.
00:05:10.100 | It is, I confess, tempting to insist on initial results so that that pipeline would be in
00:05:13.980 | place.
00:05:14.980 | I think that's premature.
00:05:16.060 | But the spirit of this is that we want you basically to be at a state where any day you
00:05:21.060 | could submit a project.
00:05:23.020 | It would be a working project.
00:05:24.220 | It might not be the one that you envisioned, but you could submit it.
00:05:27.380 | Once you get to that point, it's a really happy state in which you're mainly just adding
00:05:31.540 | new experimental results, improving the reporting, adding analyses and other things that allow
00:05:36.380 | you to build a really fleshed out project.
00:05:39.580 | So get that minimal viable project in the bag soon so that you can do creative exploration
00:05:45.500 | without feeling like you're under a lot of undue pressure.
00:05:49.940 | For other tips and resources, I have a very large markdown document that covers lots of
00:05:54.140 | things.
00:05:55.140 | FAQs that I've seen in the past, discussions of each one of the documents that's associated
00:06:00.740 | with the final project work, examples of past final papers that have gone on after some
00:06:05.900 | work to be publications, and a whole lot else besides.
00:06:09.700 | So if you feel like you just need some guidance on crucial points about how to develop a project
00:06:14.620 | in the space, I highly recommend this document.
00:06:17.260 | I think it's got lots of good stuff in it.
00:06:19.700 | And if your paper goes on to be a publication, I would love to hear about it.
00:06:24.380 | Drop me a note and I will, with your permission, add that to the list of published papers stemming
00:06:29.300 | from work for this course.
00:06:30.300 | I'm very proud of how long and diverse that list is.
00:06:33.540 | [BLANK_AUDIO]