TDDE09 Natural Language Processing
The main purpose of the project is to give you an opportunity to identify, assess, and make use of NLP research literature (learning outcome 4). You will also have opportunity to deepen the knowledge that you have acquired in the other parts of the course.
You can either do the standard project or work on a project on a self-proposed topic. The general project structure and requirements are the same for both forms. If you want to propose your own topic but are unsure whether your idea is suitable for the course, you should discuss with the examiner in good time before deliverable D2 (Project plan).
The standard project is carried out in groups of 4 students and centers around a concrete task: to implement a syntactic parser, train the implemented parser on the data released by the Universal Dependencies project, and evaluate the accuracy of the trained parser either on gold-standard data or in the context of some other task.
The minimal requirements for the standard project are as follows:
- You implement a baseline consisting of a complete tagger–parser pipeline, based on labs 3 and 4.
- You modify and/or apply this baseline system, implementing methods described in the NLP research literature.
- You evaluate your system on the Universal Dependencies treebanks or in the context of some other task.
- You analyse your results and draw conclusions about the effectiveness of the implemented methods.
Simple projects will make limited-scale modifications to the baseline system. Complex projects will be more varied and either implement substantial changes (such as a different parsing algorithm), or apply the parser in the context of some other task. In any case, the focus must be on the implementation of methods described in the NLP research literature.
The project runs during the whole course, but most of the work is concentrated in the two project weeks W9–W10. When you plan your time for the project, you should calculate approximately 56 hours per group member, or a total of 224 hours for a group with 4 members. Here is a suggested breakdown of this time into concrete tasks:
- 8 hours for project preparations (2 hours per week)
- 32 hours for the most intensive part of the work during the project weeks
- 2 hours to participate in the final conference
- 14 hours for the post-project paper (D6)
While the choice of the topic of your project is completely up to you, the form of the project is rather rigid. In particular, throughout the project you will have to submit a number of deliverables (D1–D6); these are designed to keep you on track, and to give you feedback on your progress. The rest of this page contains detailed information about these deliverables.
D1: Group contract
Your first task in the project (scheduled for weeks W3–W4) is to form your project group. We encourage you to form groups that include students with different backgrounds, skills, and interests, as this can improve the quality of the project.
After formation, your group is required to make a group contract that will govern your collaboration. The contract should spell out the behaviours that you expect of all group members, as well as procedures for resolving impasses in the group.
Specific questions to think about include the following:
- What is our level of ambition for this project?
- How will we communicate with each other?
- How often and where will we meet?
- How will we make sure that our meetings are productive?
- What will we do if some member contributes significantly less than others?
- What will we do if some member breaks any rule set out in this contract?
Instructions: Make a group contract and have it signed by all members of the group. Include both the name and the LiU-ID of each group member. Submit the signed contract as a PDF document. Rules for hand-in assignments
Due date: 2021-01-29
D2: Project plan
During the first few weeks of the course (W5–W8), your group should meet at least once a week to plan and prepare the project. Half-way into this phase, your group must hand in a project plan with the following structure:
- Background. What is this project about, why is it interesting, and how will we do it?
- Literature review. What methods from the NLP literature do we want to implement and evaluate?
- Task assignment. Who in our group will do what and when?
In addition, your plan must contain a list of references to the research articles that describe the methods that you want to implement and evaluate in your project. The list should be formatted according to academic standards.
Instructions: Write a project plan (approximately 2 pages) according to the specification above and submit it as a PDF document. Rules for hand-in assignments
Due date: 2021-02-12
Feedback: We advise you to discuss your project plan with the examiner. Book an appointment
Standard project: Syntactic parsing
Syntactic parsing is the task of mapping a sentence to a formal representation of its syntactic structure. We will introduce this task in the first week, return to it on several occasions throughout the course, and cover it in detail in Unit 4. To provide you with additional background material, we have compiled a reading list:
Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source (Google Research Blog, 2016-05-12). This blog post provides an easy-to-read introduction to syntactic parsing and its applications and introduces Google’s SyntaxNet framework, which can be used to train parsers on suitable data.
Universal Dependencies v1: A Multilingual Treebank Collection (research article, LREC 2016). This research article describes a collection of data sets that can be used to train syntactic parsers, including parsers based on Google’s SyntaxNet and the parser that you will implement in the lab series. Homepage of the Universal Dependencies project
Grounded Compositional Semantics for Finding and Describing Images with Sentences (research article, TACL 2014). This research article presents an interesting use case for syntactic parsers. Note that we do not expect you to understand all technical details in this paper. The purpose is to give you a concrete, non-trivial example of what syntactic parsers can be used for.
The starting point for the standard project is the tagger–parser pipeline that you will implement in labs 3 and 4. There are many different things that you can do to modify and/or apply this baseline system. Here are some ideas, roughly sorted from simple to complex. For each idea we also list one relevant research article. You can also come up with your own ideas of course, and do your own literature search. Most research articles in the field of natural language processing are available for free via the ACL Anthology.
Try to improve the accuracy of the baseline system on a specific treebank by adding new features.
Research article: Transition-Based Dependency Parsing with Rich Non-Local Features
Support the parsing to labelled trees, where each dependency arc is labelled with a grammatical function such as subject.
Research article: Algorithms for Deterministic Incremental Dependency Parsing
Implement the arc-hybrid system and a dynamic oracle for choosing the best possible transition in a given configuration.
Research article: Training Deterministic Parsers with Non-Deterministic Oracles
Support the parsing to non-projective trees by implementing a transition system with a swapping operation.
Research article: Non-Projective Dependency Parsing in Expected Linear Time
Replace the greedy search in the baseline system with a beam search.
Research article: A Tale of Two Parsers
Replace the transition-based dependency parser with a dynamic programming parser based on the Eisner algorithm.
Research article: Non-Projective Dependency Parsing Using Spanning Tree Algorithms
Replace the feedforward architecture in the baseline system with an architecture based on recurrent neural networks.
Research article: Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations
Apply your parser to an extrinsic task such as information extraction, and evaluate its performance.
Research article: Multi-Way Classification of Semantic Relations Between Pairs of Nominals
During W7–W8 the task for your group is to implement and evaluate the baseline for your project. If you are doing the standard project, then this baseline is the tagger–parser pipeline that you will implement in labs 3 and 4. If you are doing a project on a self-proposed topic, then the baseline is the implementation of whatever other system that you will compare your own work to.
Standard project: Tagger–parser pipeline
The baseline for the standard project is a simple pipeline architecture with the following components:
- a part-of-speech tagger (which you will implement in lab 3)
- a dependency parser (which you will implement in lab 4)
- code to read and output dependency trees in the CoNLL-U format
You will also need to write code to train and evaluate your system on any given Universal Dependencies treebank. Your code should report tagging accuracy and unlabelled attachment score.
Some of the Universal Dependencies treebanks contain so-called non-projective trees. To train on these treebanks, you will first have to projectivize them. For this you can use the following Python script (contains usage instructions): projectivize.py
Instructions: Submit a plain text file containing the following: (a) the tagging accuracy and unlabelled attachment score for your baseline system when trained on the training sections and evaluated on the development sections of the English Web Treebank (EWT), (b) a link to a GitLab repository containing your code, and (c) instructions for how to replicate your results using your code.
Due date: 2021-02-26
D4: Project work
During the two project weeks (W9–W10), you will extend and/or apply your baseline system according to your project plan. At the end of this period, you must submit a one-paragraph abstract for your project. The abstract should summarize what you have actually done in the project (which may be different from what you planned to do), as well as your main results. The main purpose of the abstract is to announce your presentation ahead of the final conference that will take place in W11.
Instructions: Submit a plain text file containing the following: (a) a one-paragraph abstract of your project (no longer than 200 words), and (b) a link to a GitLab repository containing your code.
Due date: 2021-03-12
D5: Project presentation
In the week following the project weeks (W11), your group will present your project in connection with the course’s final conference. After the conference, you will give feedback on other groups’ projects.
Present your project
The project presentation consists of two parts:
- a 10-minute video about your project that you record ahead of the final conference
- an interactive session during the conference, where you answer questions about your project
You are free to choose the presentation’s content and structure. Bear in mind that the presentation needs to be understandable to everybody in the course (not only the examiner).
In preparing the presentation, you may want to consider the following questions:
- What have you done in this project? What method did you implement and evaluate?
- Why have you chosen this particular project?
- Which sources of scientific information did you use?
- What are your experimental results?
- What are your conclusions regarding the implemented method?
Tip: One of the easiest ways to record your presentation is to use Zoom.
Instructions: Record your group’s presentation as a 10-minute video (mp4 format) and share it with the examiner. Also, email the examiner a link to a Zoom room that your group will use for the interactive session. Make sure to make your group members co-hosts so that some of you can leave the room to visit other rooms.
Due date: 2021-03-15
Feedback and examination: You will receive feedback on your project and your presentation from other students during and after the final conference; this feedback will be useful to you when preparing your post-project paper. After the conference the examiner will assess your presentation according to the Project rubric. This assessment will contribute to your grade for the project module.
Give feedback on other presentations
Each of you will be assigned three other presentations to provide feedback on. Of course, you are welcome to watch more presentations as well – have a look at the project abstracts and see what interests you!
After the final conference, you will submit a Feedback form (link) for each of the three presentations that you have been assigned. The form will contain the following questions/prompts:
- What method from the NLP literature did the group implement?
- What was the most interesting result in the project?
- What was the group’s conclusion regarding the implemented method?
- State one thing about the presentation that you really liked.
- State one thing about the presentation that can be improved.
- What new knowledge do you take away from the presentation?
Instructions: Submit your feedback forms, one form for each of the presentations assigned to you.
Due date: 2021-03-18
The final conference consists of two joint sessions and four group sessions.
The purpose of the joint sessions is to briefly introduce the projects that will be presented – and to celebrate the end of the course! 🥳
During the group sessions, the idea is that you ‘walk around’ and join other group’s rooms to find out more about their projects and ask questions. Have a look at the project abstracts (distributed separately) and try to find projects that interest you. Perhaps another group did something similar than your own, and you want to compare results and experiences? Of course, at least one member from your group should stay in your own Zoom room to answer questions that other students might have about your project.
- 15:15–15:30 Introductory session (main room)
- 15:30–15:35 Transition time
- 15:35–15:50 Parallel group session 1 (group rooms)
- 15:50–16:05 Parallel group session 2 (group rooms)
- 16:05–16:15 Break
- 16:15–16:30 Parallel group session 3 (group rooms)
- 16:30–16:45 Parallel group session 4 (group rooms)
- 16:45–16:50 Transition time
- 16:50–17:00 Closing session (main room)
D6: Post-project paper
The final project-related assignment is an individual reflection paper. The purpose of this assignment is to give you an opportunity to take stock of what you have learned from the project. We ask you to structure your paper into three parts as follows:
- Describe your work with the project. Focus on things that let you illustrate what you have learned.
- Examine your work and link it to the relevant course content.
- Articulate your learning. What did you learn? How, exactly, did you learn it? Why does this learning matter?
You will encounter the same type of questions in the labs, which should give you a good starting point. For more tips on how to write a good reflection paper, see the Guidelines for the post-project paper.
Instructions: Write a paper according to the given specification. The length of your paper should be around 1,500 words (approximately 3 pages). Submit your report as a PDF document named.
Due date: 2021-03-26
Examination: The examiner will assess your paper according to the criteria spelled out in the Guidelines for the post-project paper. This assessment will contribute to your grade for the project module.
Page responsible: Marco Kuhlmann
Last updated: 2021-01-17