TDDE09 Natural Language Processing
The main purpose of the project module is to give you an opportunity to seek, assess, and use scientific information within the area of NLP (learning outcome 4). You will also have opportunity to deepen the knowledge that you have acquired in the other modules.
The project should be carried out in groups of approximately 6 students and will center around a concrete task: You will implement a syntactic parser, train the implemented parser on the data released by the Universal Dependencies project, and evaluate the accuracy of the trained parser either on gold-standard data or in the context of a concrete downstream task.
The minimal project looks as follows:
- Implement a complete tagger–parser pipeline based on labs 3 and 4
- Modify and/or apply this baseline system, implementing a method described in the NLP literature
- Evaluate your system on the Universal Dependencies treebanks or in the context of some downstream task
- Analyse your results and draw conclusions about the usefulness of the chosen method
Simple projects will make minor modifications to the baseline system. More advanced projects will be more varied and either implement substantial changes (such as a different parsing algorithm), or apply the parser to a downstream task (such as information extraction). In any case, the focus should be on the implementation of methods described in the NLP literature.
The project runs W4–W11, but most of the work is concentrated during the project week in W10. When you plan your time for the project, you should calculate approximately 53 hours per group member, or a total of 318 hours for a group with 6 members. Here is a suggested breakdown of this time into concrete tasks:
- 12 hours for the project work in W4–W9 (roughly 2 hours per week)
- 8 hours for the optional pre-project paper (D2)
- 16 hours for the most intensive part of the project work in W10
- 4 hours to participate in the project presentations in W11
- 8 hours for the post-project paper (D6)
While the choice of the specific focus of your project is completely up to your group, the form of the project is rather rigid. In particular, throughout the project you will have to submit a number of deliverables (D1–D6); these are designed to keep you on track, and to give you feedback on your progress. The rest of this page contains detailed information about these deliverables.
D1: Group contract
Your first task in the project (scheduled for W4–W5) is to form your project group. We encourage you to form groups that include students with different backgrounds, skills, and interests, as this can improve the quality of the project.
After formation, your group is required to make a group contract that will govern your collaboration. The contract should spell out those behaviours that you expect of all group members, as well as procedures for resolving impasses in the group. Specific questions to think about include the following:
- How will we communicate with each other? At what times?
- How often and where will we meet?
- How will we make sure that our meetings are productive?
- What will we do if somebody does not show up at a meeting?
- What will we do if somebody breaks any rule set out in this contract?
Instructions: Make a group contract and have it signed by all members of the group. Include both the name and the LiU-ID of each group member. Scan the signed contract and submit it as a PDF document. Rules for hand-in assignments
Due date: 2019-02-01
Format of the subject line: TDDE09-2019 D1 marku61
Upon receiving your group contract, we will assign your group a group ID that you should use in future submissions (see below).
D2: Pre-project paper (optional)
To help you get into the topic of the project quickly, we invite you to write an optional pre-project paper on syntactic parsing and what it can be used for. This paper will also provide you with material for your (mandatory) post-project paper (D6).
The material for the pre-project paper consists of the following reading:
Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source (Google Research Blog, 2016-05-12). This blog post provides an easy-to-read introduction to syntactic parsers and their applications and introduces Google’s SyntaxNet framework, which can be used to train parsers on suitable data.
Universal Dependencies v1: A Multilingual Treebank Collection (research article, LREC 2016). This research article describes a collection of data sets that can be used to train syntactic parsers, including parsers based on Google’s SyntaxNext and the parser that you will implement in this project. Homepage of the Universal Dependencies project
Grounded Compositional Semantics for Finding and Describing Images with Sentences (research article, TACL 2014). This research article presents an interesting use case for syntactic parsers. Note that we do not expect you to understand all technical details in this paper. The purpose is to give you a concrete, non-trivial example of what syntactic parsers can be used for.
In your paper, you should address the following questions:
- What is syntactic parsing, what can it be used for, and how can it be done?
- Why is parsing so hard for computers to get right, and how can machine learning help with that?
- What do you expect to learn from the project? How, exactly, will you learn it? Why will this learning matter?
We encourage you to discuss these questions in your group in order to get feedback and align your views of your project. However, note that everything that you present for assessment must be your own work.
You can think of the pre-project paper as a forward-looking version of the post-project paper (D6), and follow the same structure and the same suggestions as for that paper.
Instructions: Write a paper addressing the above questions. The length of your paper should be around 1,000 words (approximately 2 pages). Submit your paper as a PDF document named as follows: TDDE09-2019-D2-your LiU-ID.pdf
Due date: 2019-02-08
Format of the subject line: TDDE09-2019 D2 your LiU-ID marku61
Example: TDDE09-2019 D2 marjo123 marku61
Feedback and examination: You will get feedback on your paper from the examiner. This feedback will be useful to you when you prepare your post-project paper (D6).
D3: Baseline system
During W7–W9 the task for your group is to implement and evaluate the baseline system. This system should realise a simple pipeline architecture with the following components:
- a part-of-speech tagger (most of which you will implement in lab 3)
- a transition-based dependency parser (most of which you will implement in lab 4)
- code to read and output dependency trees in the CoNLL-U format
You should also write code to train and evaluate your system on any given Universal Dependencies treebank. Your code should report tagging accuracy and unlabelled attachment score.
Some of the Universal Dependencies treebanks contain so-called non-projective trees. To train on these treebanks, you will first have to projectivize them. For this you can use the following Python script (contains usage instructions): projectivize.py
Instructions: Submit an email containing the following: (a) the tagging accuracy and unlabelled attachment score for your baseline system (both the perceptron version and the neural version) when trained on the training sections and evaluated on the development sections of the English Web Treebank (EWT), (b) a link to a GitLab repository containing your code, and (c) instructions for how to replicate your results using your code.
Due date: 2019-03-01
Format of the subject line: TDDE09-2019 D3 your group ID marku61
Example: TDDE09-2019 D3 G1 marku61
Tips for this phase of the project
- Work in parallel. The different components of the baseline system can largely be developed independently.
- Take notes of any ideas that you come up with for how the baseline system could be improved.
- Prepare a couple of slides that present the baseline system. These slides will come in handy for the project presentation.
D4: Modified system
During the project week (W10), your task is to modify and/or apply your baseline system, implementing a method described in the NLP literature. There are many different things that you could try. Here are some ideas, roughly sorted from simple to complex. For each idea we also list a research article that may make a suitable starting point for your project.
Most research articles in the field of natural language processing are available for free via the ACL Anthology.
Try to improve the accuracy of the baseline system on a specific treebank by adding new features.
Research article: Transition-Based Dependency Parsing with Rich Non-Local Features
Support the parsing to labelled trees, where each dependency arc is labelled with a grammatical function such as subject.
Research article: Algorithms for Deterministic Incremental Dependency Parsing
Implement the arc-hybrid system and a dynamic oracle for choosing the best possible transition in a given configuration.
Research article: Training Deterministic Parsers with Non-Deterministic Oracles
Support the parsing to non-projective trees by implementing a transition system with a swapping operation.
Research article: Non-Projective Dependency Parsing in Expected Linear Time
Replace the greedy search in the baseline system with a beam search.
Research article: A Tale of Two Parsers
Replace the transition-based dependency parser with a dynamic programming parser based on the Eisner algorithm.
Research article: Non-Projective Dependency Parsing Using Spanning Tree Algorithms
Replace the feedforward architecture in the baseline system with an architecture based on recurrent neural networks.
Research article: Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations
Apply your parser to an extrinsic task such as information extraction, and evaluate its performance.
Research article: Multi-Way Classification of Semantic Relations Between Pairs of Nominals
At the end of the project week, you should write a short abstract for your project. The abstract should summarise what you have done in the project, as well as your main results. The purpose of the abstract is to announce your presentation ahead of the ‘mini-conference’ that will take place in W11.
Instructions: Submit an email containing the following: (a) a short plain-text abstract of your project (no longer than 200 words), and (b) a link to a GitLab repository containing your code.
Due date: 2019-03-08
Format of the subject line: TDDE09-2019 D4 your group ID marku61
Example: TDDE09-2019 D4 G1 marku61
Feedback and examination: You can get feedback on your project plan from the examiner (book an appointment). This feedback will give you an idea to what degree your project meets the project-related assessment criteria in the Project Rubric.
D5: Project presentation
In the week following the project week (W11), your group will present your project at the course’s ‘mini-conference’. You are allotted a 15 minute time slot for this presentation. You are free to choose the presentation’s content and structure. Bear in mind that the presentation needs to be understandable to everybody in the course (not only the examiner).
In preparing the presentation, you may want to consider the following questions:
- What have you done in this project? What method did you evaluate?
- Why have you chosen this particular project?
- Which sources of scientific information did you use?
- What are your experimental results?
- What are your conclusions regarding the implemented method?
Instructions: Present your project, following the instructions above. The exact schedule for the mini-conference
will be announced at the beginning of W10 can be found below.
Wednesday 13 March, 14:55–17:00
- 14:55–15:00 Introduction
- 15:00–15:20 Group 1
- 15:20–15:40 Group 2
- 15:40–16:00 Group 3
- 16:00–16:20 Group 4
- 16:20–16:40 Group 5
- 16:40–17:00 Group 6
Feedback and examination: You will receive oral feedback on your project and your presentation during the mini-conference; this feedback will be useful to you when preparing your post-project paper. After the conference the examiner will assess your presentation according to the criteria spelled out in the Project Rubric. This assessment will contribute to your grade for the project module.
D6: Post-project paper
The final project-related assignment is an individual reflection paper. The purpose of this assignment is to give you an opportunity to take stock of what you have learned from the project. We ask you to structure your paper into three parts as follows:
- Describe your work with the project. Focus on things that let you illustrate what you have learned.
- Examine your work and link it to the relevant course content.
- Articulate your learning. What did you learn? How, exactly, did you learn it? Why does this learning matter?
For more tips on how to write a good reflection paper, see the guide on Reflection papers.
Instructions: Write a paper according to the given specification. The length of your paper should be around 1,000 words (approximately 2 pages). Submit your report as a PDF document named as follows: TDDE09-2019-D6-your LiU-ID.pdf
Due date: 2019-03-23
Format of the subject line: TDDE09-2019 D6 your LiU-ID marku61
Example: TDDE09-2019 D6 marjo123 marku61
Examination: The examiner will assess your paper according to the criteria spelled out in the Project Rubric. This assessment will contribute to your grade for the project module.
Page responsible: Marco Kuhlmann
Last updated: 2019-01-14