back to index

Stanford XCS224U: NLU I Presenting Your Research, Part 2: Writing NLP Papers I Spring 2023


Chapters

0:0
1:36 Additional notes
5:52 Stuart Shieber: the 'rational reconstruction'
8:32 David Goss's hints on mathematical style
9:4 Cormac McCarthy
11:1 A look at two really well-written papers

Whisper Transcript | Transcript Only Page

00:00:00.000 | Welcome back everyone.
00:00:06.400 | This is part 2 in our series on presenting your research.
00:00:09.760 | We're going to be talking about writing
00:00:11.520 | NLP papers and this will hold both for
00:00:13.580 | this course and for papers that you might write in general.
00:00:18.200 | Here's the outline of a typical NLP paper.
00:00:22.300 | This is not required in terms of its structure,
00:00:24.820 | but it is freeing to know that if you follow roughly the structure,
00:00:28.680 | you'll be in line with the norms of the field and your readers and
00:00:32.380 | especially your conference reviewers might
00:00:34.740 | have an easier time navigating the ideas.
00:00:37.540 | We're almost always talking about a four or eight page,
00:00:40.900 | two column paper where
00:00:42.780 | references are not counted toward this page total.
00:00:45.780 | We're thinking about papers that typically have these components.
00:00:49.200 | For an eight pager, you'd have
00:00:50.960 | your opening page with the title and abstract and then the intro.
00:00:54.840 | The intro might dribble onto page 2 and then you're probably going to
00:00:58.060 | have your related work section.
00:01:00.120 | After that, you're going to be starting in on the heart of the paper.
00:01:03.820 | You might introduce your task and the associated data,
00:01:07.100 | then your models, then your methods and results,
00:01:10.880 | which could be a no-nonsense reporting of the results,
00:01:14.220 | close with some analysis and then a very short conclusion.
00:01:17.340 | These aren't very long papers,
00:01:18.660 | so a short conclusion is fine.
00:01:21.180 | For a four pager, which is another common format,
00:01:24.220 | you just compress all of this and you might
00:01:26.680 | hope that you can devote much less space to
00:01:28.740 | related work for the sake of just explaining your own ideas.
00:01:32.300 | But fundamentally, it's the same structure.
00:01:35.100 | In a bit more detail, here are some notes on this.
00:01:37.500 | The intro, in the conventions of NLP papers,
00:01:40.780 | the intro is expected to tell
00:01:42.500 | the full story of the paper at a high level.
00:01:45.140 | Indeed, the abstract tells an even higher level complete story.
00:01:49.580 | The intro is one layer down,
00:01:51.440 | and by the time the reader is done with the intro,
00:01:53.600 | they essentially know exactly what the paper is going to do,
00:01:57.420 | and it's all over but the details.
00:01:59.820 | The intro is a microcosm.
00:02:02.100 | Related work, this is meant to contextualize your work and
00:02:05.980 | provide insights into major relevant themes
00:02:08.460 | of the literature as a whole.
00:02:10.140 | Use each paper or theme as
00:02:12.380 | a chance to articulate what is special about your paper.
00:02:15.700 | I find it helpful to think about this in a templatic format
00:02:19.100 | where each subsection or paragraph of
00:02:21.820 | my related work section is going to raise
00:02:23.700 | some topical issue or question.
00:02:26.600 | Once I've done that as the first sentence of the paragraph,
00:02:29.760 | then I just more or less list out,
00:02:31.720 | maybe with some color,
00:02:32.980 | the individual papers that fall under that topical heading.
00:02:36.500 | Then I close the paragraph with an explanation of how
00:02:39.620 | those ideas relate to the ideas for the current paper.
00:02:43.280 | How the ideas from my paper
00:02:44.880 | complement what's in the literature,
00:02:46.860 | or conflict with it,
00:02:48.140 | or extend it, or whatever else it is,
00:02:50.580 | I try to articulate that so that the whole effect is
00:02:53.660 | to raise some crucial topics for your area,
00:02:56.460 | help the reader understand what those topics are,
00:02:59.140 | and then always explain how the current ideas relate
00:03:02.000 | to that whole messy context.
00:03:05.100 | The related work is a chance to of course cite
00:03:07.700 | everyone so that no one feels upset about being excluded,
00:03:10.500 | but also to contextualize your ideas in this very useful way.
00:03:15.420 | Data. This will likely be
00:03:17.820 | a very detailed section if the datasets are
00:03:20.180 | new or your task is new,
00:03:21.880 | or the data is unfamiliar,
00:03:23.680 | or you're casting things in an unfamiliar way.
00:03:26.260 | This could be a much shorter section if it's
00:03:28.940 | a familiar dataset and familiar task structure.
00:03:32.660 | Then your model, by which I
00:03:35.060 | really mean the essence of your ideas.
00:03:37.080 | You might not have a model per se,
00:03:39.220 | but presumably you have
00:03:40.660 | some core ideas that you are pursuing and evaluating.
00:03:44.580 | This is your chance to flesh out
00:03:46.680 | exactly what those ideas are like.
00:03:48.980 | You can use your prior work section to
00:03:51.940 | contextualize and amplify and with luck,
00:03:55.100 | the related work really helped us understand why
00:03:58.100 | you're doing the modeling or analysis work that you're doing.
00:04:01.620 | Then the methods, this would be
00:04:03.460 | like your experimental approach,
00:04:05.060 | including metrics, baseline models,
00:04:07.980 | and other things like that.
00:04:09.300 | You're probably going to be short on space,
00:04:11.140 | and so you can start using appendices for things
00:04:13.500 | like hyperparameter optimization choices,
00:04:16.780 | other small details about compute and so forth,
00:04:20.020 | so that you can keep the main paper
00:04:22.200 | devoted to the real essence of the narrative.
00:04:25.260 | Then results, and I think it's best if this is
00:04:27.660 | a no-nonsense description of what is in
00:04:31.120 | the results tables or in the results figures and how
00:04:34.100 | that basically relates to the previous sections.
00:04:38.480 | Then you can open things up
00:04:39.940 | a little bit when you do analysis.
00:04:41.520 | This can be discussion of what the results mean,
00:04:44.080 | what they don't mean,
00:04:45.460 | how they can be approved,
00:04:46.780 | what the limitations are, and so forth.
00:04:49.540 | The nature of this section will depend a lot on
00:04:52.780 | the nature of your modeling effort
00:04:54.220 | and the nature of your results.
00:04:56.500 | For papers that have
00:04:58.660 | multiple experiments with multiple datasets,
00:05:01.060 | maybe even multiple models,
00:05:02.580 | it can help to repeat that methods, results,
00:05:05.540 | analysis thing, or even data methods,
00:05:08.180 | results analysis in separate subsections so that we
00:05:11.700 | get self-contained units of experimental reporting.
00:05:15.660 | Then you might have a final analysis or
00:05:18.440 | discussion section that weaves them all together
00:05:21.060 | and reconnects with the ideas
00:05:22.780 | from your intro and related work.
00:05:25.180 | Then finally, a conclusion.
00:05:27.580 | You again want to quickly
00:05:29.100 | summarize what was in the paper in a way that's
00:05:31.260 | not unlike what you did in the abstract, I'm guessing.
00:05:34.020 | There is one nice opportunity here though,
00:05:35.980 | which is to chart out
00:05:37.180 | possible future directions and
00:05:39.380 | questions that you left open and so forth,
00:05:41.540 | things that people might pursue as next steps.
00:05:44.300 | You at least get to have a forward-looking,
00:05:47.380 | more expansive final few sentences of the paper.
00:05:51.700 | I thought I would just offer a scattering
00:05:54.860 | of interesting advice about
00:05:56.860 | scientific writing things that you could mull over.
00:05:59.400 | Let's start with this nice deconstruction
00:06:02.140 | from the NLP-er Stuart Schieber.
00:06:04.460 | He's fundamentally arguing for
00:06:06.260 | the rational reconstruction approach.
00:06:08.660 | He contrasts that first with
00:06:10.540 | the continental style in which one states
00:06:13.540 | the solution with as little
00:06:14.860 | introduction or motivation as possible,
00:06:17.100 | sometimes not even saying what the problem was.
00:06:20.020 | I think here he means to be criticizing
00:06:22.020 | continental philosophers like Derrida,
00:06:24.380 | but that's just my guess.
00:06:26.280 | Readers of papers like this will have
00:06:28.820 | no clue as to whether you are right or not,
00:06:31.400 | without incredible efforts
00:06:32.940 | in close reading of the paper,
00:06:34.700 | but at least they'll think you're a genius.
00:06:37.380 | I think he means that somewhat ironically,
00:06:39.620 | we should strive not to write papers in this mode.
00:06:43.580 | At the other extreme is what he calls the historical style.
00:06:47.820 | This is a whole history in the paper of false starts,
00:06:51.100 | wrong attempts, near misses,
00:06:52.780 | redefinitions of the problem.
00:06:54.900 | This is better than the continental style because
00:06:57.820 | a careful reader can probably follow
00:06:59.780 | the line of reasoning that the author went
00:07:01.460 | through and then use this as motivation,
00:07:03.900 | but the reader will probably
00:07:05.020 | think you're a bit addle-headed.
00:07:06.700 | In general, these papers are
00:07:08.900 | also very difficult to read because it's hard to
00:07:11.420 | discern what was important and what wasn't.
00:07:15.060 | Ultimately, what Schieber offers as
00:07:18.020 | a better mode is the rational reconstruction approach.
00:07:21.420 | You don't present the actual history that you went through,
00:07:24.100 | but rather an idealized history that perfectly
00:07:26.540 | motivates each step in the solution.
00:07:29.300 | The goal in pursuing the rational reconstruction style is
00:07:32.060 | not to convince the reader that you're
00:07:33.460 | brilliant or addle-headed for that matter,
00:07:35.760 | but that your solution is trivial.
00:07:38.360 | It takes a certain strength of
00:07:39.900 | character to take that as one's goal.
00:07:42.180 | The better written your paper is,
00:07:44.140 | the more readers will come away thinking,
00:07:47.580 | that was very clear and obvious,
00:07:49.500 | even I could have had those ideas.
00:07:51.420 | It feels paradoxical, but it is the best mode to operate in.
00:07:55.900 | Sometimes people feel a tension here
00:07:58.820 | between the rational reconstruction approach and
00:08:01.460 | my call elsewhere in these lectures to
00:08:03.580 | really disclose as much as you can,
00:08:05.580 | to be open and honest about what you did.
00:08:08.520 | If you start to feel that tension,
00:08:10.320 | I would encourage you to use the appendices to really
00:08:13.240 | enumerate every false start possibly as a list,
00:08:16.280 | so that someone who is really trying to figure out what
00:08:18.440 | happened has all the information they need.
00:08:21.120 | That will allow you to in the paper,
00:08:23.720 | tell a story that will reach the maximum number of
00:08:26.280 | people and feel informative,
00:08:28.360 | feel like progress, and so forth.
00:08:31.440 | I also like this hint on
00:08:34.120 | mathematical style from David Goss.
00:08:36.100 | This is a document that's full of advice,
00:08:38.180 | a lot of it about how to format math equations in LaTeX.
00:08:41.580 | But fundamentally, his advice is,
00:08:43.900 | have mercy on the reader.
00:08:45.400 | One part of that is to just have your reader in mind,
00:08:48.840 | to write your own paper as though you were
00:08:51.200 | someone consuming the ideas for the first time,
00:08:53.940 | and think what information would you need,
00:08:56.460 | what can be left out,
00:08:57.660 | and in general, what would help most in
00:09:00.260 | terms of conveying the ideas to that hypothetical reader.
00:09:04.100 | Cormac McCarthy, the novelist,
00:09:06.820 | also has an outstanding piece linked at the bottom here,
00:09:10.300 | full of advice for scientific writers.
00:09:12.500 | I think Cormac McCarthy actually hangs out with lots of
00:09:14.740 | scientists at the Santa Fe Institute,
00:09:16.860 | and he's probably learned a lot about how they work.
00:09:19.740 | Here's one piece of advice that I'd like to highlight.
00:09:23.040 | Decide on your paper's theme and
00:09:25.220 | two or three points you want every reader to remember.
00:09:28.160 | This would be stuff that you would put in the intro and
00:09:30.500 | maybe structure the intro around these ideas.
00:09:33.660 | This theme and these points form
00:09:36.180 | the single thread that runs through your piece.
00:09:38.760 | The words, sentences, paragraphs,
00:09:40.880 | and sections are the needlework that holds it together.
00:09:43.940 | If something isn't needed to help
00:09:45.660 | the reader to understand the main theme, omit it.
00:09:48.480 | I find this wonderfully clarifying.
00:09:50.760 | Once I have figured out what my two or three main points are,
00:09:54.480 | and I have sketched them at least in the intro,
00:09:56.980 | then as I'm writing and as I'm
00:09:59.020 | deciding on experiments to run or to neglect,
00:10:01.580 | I'm always thinking about whether or
00:10:03.260 | not they serve those main points.
00:10:05.940 | That helps me a lot with decision-making,
00:10:08.500 | and it helps me a lot with actually just writing
00:10:10.660 | these papers in a way that I hope is relatively clear.
00:10:14.860 | This strategy will not only result in a better paper,
00:10:18.620 | but it will be an easier paper to write, as I said,
00:10:21.740 | since the themes you choose will determine what to include and
00:10:24.340 | exclude and resolve a lot of
00:10:26.260 | low-level questions about the narrative.
00:10:28.860 | Then the final piece of advice I wanted to offer here in
00:10:31.900 | general is advice from Patrick Blackburn.
00:10:34.540 | This is actually about giving
00:10:35.740 | talks and I'll mention it later,
00:10:37.220 | but I think it applies to any scientific communication.
00:10:41.080 | The fundamental insight, where do
00:10:43.340 | good talks or papers come from?
00:10:45.320 | It is honesty.
00:10:46.920 | A good talk or a good paper should never stray
00:10:49.860 | far from simple, honest communication.
00:10:53.120 | I have that phrase in mind all the time as I do
00:10:55.860 | my research and I find it wonderfully
00:10:57.800 | clarifying and exciting to see.
00:11:01.300 | To round this out, I thought I would just mention
00:11:04.780 | two papers that I find to be exceptionally well-written.
00:11:08.740 | I can think of lots of papers like this.
00:11:10.620 | I in fact have a longer list at
00:11:12.740 | that papers.md document in the course code repository,
00:11:16.420 | but I thought it would be fun to highlight
00:11:18.060 | two and I've given the links here.
00:11:20.020 | The first is the ELMo paper,
00:11:22.380 | deep contextual word representation.
00:11:24.660 | I like pretty much every aspect of this paper.
00:11:27.440 | The intro does a great job of contextualizing
00:11:30.500 | the results and beginning to motivate
00:11:33.660 | the core idea behind contextual representation and pre-training.
00:11:37.780 | The actual model is presented with real clarity.
00:11:41.280 | You get all the notation that
00:11:42.700 | you need about how it's structured.
00:11:44.420 | It's a bit dense, but I think that's a consequence
00:11:46.980 | of a very complicated model,
00:11:48.520 | so I think I'm fine with that.
00:11:50.300 | Then you get a really exhaustive exploration
00:11:53.540 | experimentally and then with
00:11:55.460 | follow-up questions and hypotheses that really give
00:11:58.420 | you an amazingly full picture
00:12:01.220 | of ELMo and its strengths and weaknesses.
00:12:03.720 | It's a short paper, but man,
00:12:05.540 | it feels jam-packed with ideas and you
00:12:07.980 | can learn so much just by reading it.
00:12:10.780 | I also thought I would single out the GloVe paper.
00:12:13.980 | This is another really interesting example.
00:12:16.180 | Now, I think the whole paper is well-written,
00:12:18.720 | but the part that I would like to single
00:12:20.500 | out is the presentation of the model itself.
00:12:23.780 | Because rarely in NLP do you
00:12:25.980 | get a analytic starting point
00:12:28.580 | that the authors build gradually into a model.
00:12:31.840 | They talk about the practical challenges of
00:12:34.220 | implementing that model and why that leads them to
00:12:36.780 | certain implementation choices for it.
00:12:39.780 | Then ultimately, you get a description of
00:12:42.100 | the various hyperparameters that are
00:12:43.700 | involved in the final implementation.
00:12:46.360 | By the end of that, you feel like you've really
00:12:48.420 | learned something conceptual in addition to
00:12:51.740 | now understanding the details of the GloVe model itself.
00:12:55.360 | It doesn't hurt that the paper is well-written
00:12:57.420 | elsewhere and has incredible results, of course,
00:13:00.180 | but I really single out the model reporting as
00:13:03.540 | an exceptional piece of writing that we could all
00:13:05.660 | think about as we think about
00:13:07.540 | the motivations for our own modeling ideas.
00:13:11.060 | [BLANK_AUDIO]