Back to Index

Stanford XCS224U: Natural Language Understanding I Lit Review Overview I Spring 2023


Chapters

0:0 Intro
0:20 Rationale
1:2 Requirements [website link]
3:38 Lit search tips
6:21 Plagiarism policy
8:47 The next assignment: The protocol

Transcript

Welcome, everyone. This short screencast is an overview of the Lit Review. We've reached the project phase of this course, and I think the Lit Review is a really crucial element there. It's where you begin to build an intellectual foundation for your project. Fundamentally, for me, the rationale behind the Lit Review is about productive dialogue.

We want you to enter into dialogue with your teammates and with yourself and with your mentors and with your colleagues. With your teammates and with yourself and with your mentor from the teaching team about what you're going to try to accomplish in the final project. So you should pose questions.

You should identify obstacles and propose initial workarounds. You should find data sets and models and architectures and everything else, and you should think carefully about your resources. The idea here is to gather intel from the literature about what people are doing in your space and use that information to begin to carve out an original vision for your own final projects.

The specific requirements are listed at the course website. You can follow the link at the top of the slide here. Let me just offer you a brief rundown. First, it's a roughly six-page document, and we say eight pages is the max so you don't go overboard, and that does not include the obligatory references section.

We have a template that you can use. It's a LaTeX template based in the ACL format. You're not required to use it, but you might as well, and you'll begin to get used to dealing with that ACL format, which is required for the final paper. Groups of one should review five papers.

Groups of two should review seven, and groups of three should review nine. I think you can hear in there a small incentive for doing group work, which we think is productive in general. The ideal is to have the same topic for your lit review and final project. Obviously, that's the most efficient, but we do grant that sometimes people finish their lit review, and that leads them to the realization that they don't want to work in that area anymore.

That is perfectly fine, but then you should negotiate that with your mentor and with the members of your team to make sure you can have your actual project converge in the time available, which is always too short. Then we come to the major things to include, and you might as well use these phrases as section headings in the document to help your mentor understand what you're trying to do.

General problem task definition, this is absolutely crucial. We want to know what kind of questions you're going to be asking, and you might begin to guide us toward things that will be crucial for your project. Then we want concise summaries of the articles. I will say we're not really looking for you to just summarize the content.

The ideal summary is going to raise questions and identify issues that will be useful to you in thinking about the project work itself. Relatedly, you should compare and contrast these articles. How do they differ in terms of models and data and fundamental results? Because that too could help point the way toward space for an original contribution.

Then really importantly, let's think about future work. What are you going to do next? How might this all come together into a project? It's never too early to begin that creative process, and the more you can do under this heading, the more productive your dialogue with your mentor will be, and then I think you'll be able to go farther.

Then finally, so that you get in the habit of this and so that we can identify what literature items you're reviewing, we want an obligatory references section. In terms of actually conducting lit review searches, I do have some tips that I have found really productive over the years. Here's the kind of cycle that I still use regularly.

Search with some keywords in the ACL anthology, Google Scholar, or Semantic Scholar. Because it's NLP, I recommend starting with the ACL anthology. A wonderful aspect of the NLP community is that it is very well organized when it comes to its literature. Essentially, all of the work that's published by the ACL is accumulated into this anthology, which has good Google search, good bib entries, abstracts, links to the papers, you name it.

So that's a good first stop, especially if you're working on a core topic in NLP. But don't limit yourself to that. You should branch out also and check Google Scholar and Semantic Scholar. In this course, we really value interdisciplinary work, and so you want to connect with other literatures besides NLP in many cases.

Next step, download relevant and/or highly cited results and check out their abstracts and related work sections. These are heuristics. Relevant will be delivered by the search engine according to your keywords, and then highly cited is just a good heuristic for finding things that have been influential. You shouldn't depend on it, but there's no doubt that it's a useful piece of information.

When you do this, you're seeking out key questions and techniques and also other highly cited papers. You should not read entire papers at this point. That is tremendously inefficient. There are too many papers out there, so you need to use your time wisely. So the idea here is to get a feel for these papers and also get a sense for what else they are citing.

Download the papers that you see prominently in the related work section and kind of add those to the set that you downloaded as part of your core search. And then return to step one with some new keywords that you gathered as part of the searching that you did. And you should keep going on that loop and break out of it when you have a sense for what you're doing and what others have done in the area.

You'll start to iterate around in a few papers that you think are clearly important. Maybe some new directions will be suggested by the other papers that are in your set. Now you've got the basis for thinking about a selection for the lit review. At that stage, you select some core papers from that downloaded set.

And finally, you read those deeply and you cover those in the lit review. Notice you do that only at the final stage so that you can learn as much as you can by kind of surveying widely in a lightweight way. And then you go deep once you have a sense for where to invest.

This is sort of amusing, a plagiarism policy. It's especially meta-feeling for us because, after all, we study large language models. And there's a growing concern in academia that these language models will make it harder for us to assess student writing. Let's try to embrace this a little bit. So I did do a search based on how the Electra paper relates to the Transformers paper.

This is with GPD 4, and I will confess to you that what came back looks awfully useful to me. This does look like kind of raw information that you could use to inform a lit review. So that's all to the good. Make sure you know the course policy, though.

It's linked here. And what it essentially says is there's no rule against using an AI assistant like GPT-4 to help you with your lit review. But all output from the model needs to be quoted. That's per the policy. You treat it like any other resource. And, of course, assignments that are just quotations from any resource are not going to do well in terms of evaluation.

Assignments with substantial overlap in prose will be scrutinized for plagiarism. And what that means is that if two groups used a similar prompt and got back similar results and included them in the lit review unquoted, they'd probably get nabbed for plagiarism, not because they used a model, but because the two assignments look too much alike.

And at that point, it's not the language model that we're implicating here, but rather the standard sort of thing that we see when we worry about plagiarism. So I would suggest using these assistants not to produce raw prose for you, but rather to help you figure out what's in the literature.

And you'll want to be skeptical consumers because while I think this is a pretty good description of Elektra and Transformers, I haven't thoroughly audited it, and I wouldn't include it anywhere, even paraphrased by me, until I had given it a thorough audit to make sure that it was all factually correct.

Because that's the ultimate thing that we're looking for, never mind where all this prose came from. But the idea here is that there's obviously value to these things. They could supercharge certain aspects of research, so we don't want to ban them. After all, we think they're really interesting artifacts.

That's why we're in this course. But we want to use them with caution, and we want to make sure that they don't end up kind of producing really bad scholarship. That is the fundamental thing that we're watching out for. That's it for the lit review. It's worth thinking ahead to the next document, which is a bit more unusual in the context of this course.

That's the experiment protocol. This is a short, structured report designed to help you establish your core experimental framework. And the required sections are listed here, hypotheses, data, metrics, models, general reasoning, summary of progress, and references. You can see that that's kind of the raw materials for a project in this space.

And we're trying to look to see whether anything is missing, and whether there are any other obstacles that would prevent your project from converging in the time available. The idea is clarity around project goals, identification of obstacles, and project risks. So you should be erring on the side of disclosing too much in the interest of making sure we overcome all the obstacles and fill in all the gaps so that the project succeeds in the end.

That's for the protocol, but you might as well be thinking along these lines for the lit review. It is never too early to begin brainstorming about exactly what the final project is going to look like. Even at the lit review stage, you can start to get a feel for what hypotheses are interesting, what techniques you want to try, and so forth and so on.

So the earlier the better. That's what all of these preliminary project assignments are about. Thank you.