729A27 Natural Language Processing
The main purpose of the project is to give you an opportunity to seek, assess, and use scientific information within the area of NLP (learning objective 4). You will also have opportunity to deepen the knowledge that you have acquired in the course.
The project should be carried out in groups of approximately 4 students and will center around a concrete task: You will implement a syntactic parser, train the implemented parser on the data released by the Universal Dependencies project, and evaluate the accuracy of the trained parser either on gold-standard data or in the context of a concrete downstream task.
The minimal project looks as follows:
- Implement a complete tagger–parser pipeline based on labs 3 and 4
- Modify and/or apply this baseline system, implementing a method described in the NLP literature
- Evaluate your system on the Universal Dependencies treebanks or in the context of some downstream task
- Draw conclusions about the usefulness of the chosen method
Simple projects will make one or a few minor modification to the baseline system. More advanced projects will be more varied and either implement substantial changes (such as a different parsing algorithm), or apply the parser to a downstream task (such as information extraction). In any case, the focus should be on the implementation of methods described in the NLP literature.
While the choice of the specific focus of your project is completely up to your group, the form of the project is rather rigid. In particular, throughout the project you will have to submit a number of deliverables (D1–D6); these are designed to keep you on track, and to give you feedback on your progress. The rest of this page contains detailed information about these deliverables.
The project runs W3–W10, but most of the work is concentrated during the project week in W9. When you plan your time for the project, you should calculate approximately 53 hrs per group member, or a total of 212 hrs for a group with 4 members. Here is a suggested breakdown of this time into concrete tasks:
- 12 hrs for the project work in W3–W8 (roughly 2 hours per week)
- 8 hrs for the pre-project paper (D2)
- 16 hrs for the most intensive part of the project work in W9
- 4 hrs to participate in the project presentations in W10
- 8 hrs for the post-project paper (D6)
D1: Group contract
Your first task in the project (scheduled for W3–W4) is to form your project group. We encourage you to form groups that include students with different backgrounds, skills, and interests, as this can improve the quality of the project.
After formation, your group is required to make a group contract that will govern your collaboration. The contract should spell out those behaviours that you expect of all group members, as well as procedures for resolving impasses in the group. Specific questions to think about include the following:
- How will we communicate with each other? At what times?
- How often and where will we meet?
- How will we make sure that our meetings are productive?
- What will we do if somebody does not show up at a meeting?
- What will we do if somebody breaks any rule set out in this contract?
Instructions: Make a group contract and have it signed by all members of the group. Submit the signed contract as a PDF document. Rules for hand-in assignments
Due date: 2018-01-26
Format of the subject line: 729A27-2018 D1 marku61
Upon receiving your group contract, we will assign your group a group ID that you should use in future submissions (see below).
D2: Pre-project paper
To carry out the project you need a good idea of what a syntactic parser is, and what it can be used for. To achieve this goal, we ask you to write a pre-project paper based on the following reading:
Announcing SyntaxNet: The World’s Most Accurate Parser Goes Open Source (Google Research Blog, 2016-05-12). This blog post provides an easy-to-read introduction to syntactic parsers and their applications and introduces Google’s SyntaxNet framework, which can be used to train parsers on suitable data.
Universal Dependencies v1: A Multilingual Treebank Collection (research article, LREC 2016). This research article describes a collection of data sets that can be used to train syntactic parsers, including parsers based on Google’s SyntaxNext and the parser that you will implement in this project. Homepage of the Universal Dependencies project
Grounded Compositional Semantics for Finding and Describing Images with Sentences (research article, TACL 2014). This research article presents an interesting use case for syntactic parsers. Note that we do not expect you to understand all technical details in this paper. The purpose is to give you a concrete, non-trivial example of what syntactic parsers can be used for.
In your paper, you should address the following questions:
- What is a syntactic parser, and what can it be used for?
- Why is parsing so hard for computers to get right?
- What role does the Universal Dependencies project play in parser development?
- What role do syntactic parsers play in the paper by Socher et al.?
We encourage you to discuss these questions in your group in order to get feedback and align your views of your project. However, please note that your paper should be your own.
Instructions: Write a critical judgement addressing the above questions. The length of your paper should be around 1,000 words (approximately 2 pages). Submit your paper as a PDF document named as follows: 729A27-2018-D2-your LiU-ID.pdf
Due date: 2018-02-02
Format of the subject line: 729A27-2018 D2 your LiU-ID marku61
Example: 729A27-2018 D2 marjo123 marku61
Feedback and examination: You will get feedback on your paper from the examiner, who will assess it according to the criteria spelled out in the Project Rubric. This assessment will contribute to your grade for the project component of the examination.
D3: Baseline system
During W6–W8 the task for your group is to implement and evaluate the baseline system. This system should realise a simple pipeline architecture with the following components:
- a part-of-speech tagger (most of which you will implement in lab 3)
- a transition-based dependency parser (most of which you will implement in lab 4)
- code to read and output dependency trees in the CoNLL-U format
You should also write code to train and evaluate your system on any given Universal Dependencies treebank. Your code should report tagging accuracy and unlabelled attachment score.
Some of the Universal Dependencies treebanks contain so-called non-projective trees. To train on these treebanks, you will first have to projectivize them. For this you can use the following Python script (contains usage instructions): projectivize.py
Instructions: Submit an email containing the following: (a) the tagging accuracy and unlabelled attachment score for your baseline system when trained on the training sections and evaluated on the development sections of the English and the Swedish treebank, (b) a link to a GitLab repository containing your code, and (c) instructions for how to replicate your results using your code.
Due date: 2018-02-23
Format of the subject line: 729A27-2018 D3 your group ID marku61
Example: 729A27-2018 D3 G1 marku61
Tips for this phase of the project
- Work in parallel. The different components can largely be developed independently.
- Present your labs to each other and let the students that appear most confident about a certain component implement it.
- Take notes of any ideas that you come up with for how the baseline system could be improved.
- Prepare a couple of slides that present the baseline system. You can later modify them to present your final system.
D4: Modified system
During the project week (W9), your task is to modify and/or apply your baseline system, implementing a method described in the NLP literature. There are many different things that you could try. Here are some ideas, roughly sorted from simple to complex. For each idea we also list a research article that may make a suitable starting point for your project.
Most research articles in the field of natural language processing are available for free via the ACL Anthology.
Try to improve the accuracy of the baseline system on a specific treebank by adding new features.
Research article: Transition-Based Dependency Parsing with Rich Non-Local Features
- Support the parsing to labelled trees, where each dependency arc is labelled with a grammatical function such as subject.
Research article: Algorithms for Deterministic Incremental Dependency Parsing
- Implement the arc-hybrid system and a dynamic oracle for choosing the best possible transition in a given configuration.
Research article: Training Deterministic Parsers with Non-Deterministic Oracles
- Support the parsing to non-projective trees by implementing a transition system with a swapping operation.
Research article: Non-Projective Dependency Parsing in Expected Linear Time
- Replace the greedy search in the baseline system with a beam search.
Research article: A Tale of Two Parsers
- Replace the averaged perceptron in the baseline system with a neural network.
Research article: A Fast and Accurate Dependency Parser Using Neural Networks
- Replace the transition-based dependency parser with a dynamic programming parser based on the Eisner algorithm.
Research article: Non-Projective Dependency Parsing Using Spanning Tree Algorithms
- Apply your parser to an extrinsic task such as information extraction, and evaluate its performance.
Research article: Multi-Way Classification of Semantic Relations Between Pairs of Nominals
At the end of the project week, you should write a short abstract for your project. The abstract should summarise what you have done in the project, as well as your main results. The purpose of the abstract is to announce your presentation ahead of the ‘mini-conference’ that will take place in W10.
Instructions: Submit an email containing the following: (a) a short abstract of your project (no longer than 200 words), and (b) a link to a GitLab repository containing your code.
Due date: 2018-03-02
Format of the subject line: 729A27-2018 D4 your group ID marku61
Example: 729A27-2018 D4 G1 marku61
Feedback and examination: You can get feedback on your project plan from the examiner (book an appointment). This feedback will give you an idea to what degree your project meets the project-related assessment criteria in the Project Rubric.
D5: Project presentation
In the week following the project week (W10), your group will present your project at the course’s ‘mini-conference’. You are allotted a 15 minute time slot for this presentation. You are free to choose the presentation’s content and structure, but you should bear in mind that the presentation needs to be understandable to everybody in the course.
In preparing the presentation, you may want to consider the following questions:
- What have you done in this project? What method did you evaluate?
- Why have you chosen this particular project?
- Which sources of scientific information did you use?
- What are your experimental results?
- What are your conclusions regarding the usefulness of the implemented method?
Instructions: Present your project, following the instructions above. The exact schedule for the mini-conference will be announced at the beginning of W10.
Feedback and examination: The examiner will assess your presentation according to the criteria spelled out in the Project Rubric. This assessment will contribute to your grade for the project component of the examination. At the same time, the feedback will be useful to you when preparing your post-project paper.
What if your group cannot present at the mini-conference? The group presentation can be replaced by a written report in which your group presents your project. Please contact the examiner for details.
D6: Post-project paper
The final project-related assignment is an individual reflection paper. The purpose of this assignment is to give you an opportunity to think about what you have learned from the project. The paper should have three components:
- your description of your project work, with with a focus on those aspects that you consider most important
- your analysis of your experience based on concepts from the course
- your conclusions regarding the question what you take away from this part of the course
For more detailed information, see the guide on Reflection papers.
Instructions: Write a paper according to the above specification. Make sure to take into account both the feedback that you got on your pre-project paper and your group’s presentation. The length of your paper should be around 1,000 words (approximately 2 pages). Submit your report as a PDF document named as follows: 729A27-2018-D6-your LiU-ID.pdf
Due date: 2018-03-17
Format of the subject line: 729A27-2018 D6 your LiU-ID marku61
Example: 729A27-2018 D6 marjo123 marku61
Examination: The examiner will assess your paper according to the criteria spelled out in the Project Rubric. This assessment will contribute to your grade for the project component of the examination.
Page responsible: Marco Kuhlmann
Last updated: 2018-01-12