Hide menu

TDDE09 Natural Language Processing


This page contains the instructions for the project assignments. For more information about how the project component of the course is examined, see the page on Examination.

Overview

The main purpose of the project is to give you an opportunity to seek, assess, and use scientific information within the area of NLP (learning objective 4). You will also have opportunity to deepen the knowledge that you have acquired in the course.

General structure

The project should be carried out in groups of approximately 4 students and will center around a concrete task: You will implement a syntactic parser, train the implemented parser on the data released by the Universal Dependencies project, and evaluate the accuracy of the trained parser either on gold-standard data or in the context of a concrete downstream task.

The minimal project looks as follows:

  • Implement a complete tagger–parser pipeline based on labs 3 and 4
  • Modify and/or apply this baseline system, implementing a method described in the NLP literature
  • Evaluate your system on the Universal Dependencies treebanks or in the context of some downstream task
  • Draw conclusions about the usefulness of the chosen method

Simple projects will make one or a few minor modification to the baseline system. More advanced projects will be more varied and either implement substantial changes (such as a different parsing algorithm), or apply the parser to a downstream task (such as information extraction). In any case, the focus should be on the implementation of methods described in the NLP literature.

Deliverables

While the choice of the specific focus of your project is completely up to your group, the form of the project is rather rigid. In particular, throughout the project you will have to submit a number of deliverables (D1–D6); these are designed to keep you on track, and to give you feedback on your progress. The rest of this page contains detailed information about these deliverables.

Time requirements

The project runs W3–W10, but most of the work is concentrated during the project week in W9. When you plan your time for the project, you should calculate approximately 53 hrs per group member, or a total of 212 hrs for a group with 4 members. Here is a suggested breakdown of this time into concrete tasks:

  • 12 hrs for the project work in W3–W8 (roughly 2 hours per week)
  • 8 hrs for the pre-project paper (D2)
  • 16 hrs for the most intensive part of the project work in W9
  • 4 hrs to participate in the project presentations in W10
  • 8 hrs for the post-project paper (D6)

D1: Group contract

Your first task in the project (scheduled for W3–W4) is to form your project group. We encourage you to form groups that include students with different backgrounds, skills, and interests, as this can improve the quality of the project.

After formation, your group is required to make a group contract that will govern your collaboration. The contract should spell out those behaviours that you expect of all group members, as well as procedures for resolving impasses in the group. Specific questions to think about include the following:

  • How will we communicate with each other? At what times?
  • How often and where will we meet?
  • How will we make sure that our meetings are productive?
  • What will we do if somebody does not show up at a meeting?
  • What will we do if somebody breaks any rule set out in this contract?

Instructions: Make a group contract and have it signed by all members of the group. Submit the signed contract as a PDF document. Rules for hand-in assignments

Due date: 2018-01-26

Format of the subject line: NLP-2018 D1 marku61

Upon receiving your group contract, we will assign your group a group ID that you should use in future submissions (see below).

D2: Pre-project paper

To carry out the project you need a good idea of what a syntactic parser is, and what it can be used for. To achieve this goal, we ask you to write a pre-project paper based on the following reading:

In your paper, you should address the following questions:

  • What is a syntactic parser, and what can it be used for?
  • Why is parsing so hard for computers to get right?
  • What role does the Universal Dependencies project play in parser development?
  • What role do syntactic parsers play in the paper by Socher et al.?

We encourage you to discuss these questions in your group in order to get feedback and align your views of your project. However, please note that your paper should be your own.

Instructions: Write a critical judgement addressing the above questions. The length of your paper should be around 1,000 words (approximately 2 pages). Submit your paper as a PDF document. Rules for hand-in assignments

Due date: 2018-02-02

Format of the subject line: NLP-2018 D2 your LiU-ID marku61

Example: NLP-2018 D2 marjo123 marku61

Feedback and examination: You will get feedback on your paper from the examiner, who will assess it according to the criteria spelled out in the Project Rubric. This assessment will contribute to your grade for the project component of the examination.

D3: Baseline system

During W6–W8 the task for your group is to implement and evaluate the baseline system. This system should realise a simple pipeline architecture with the following components:

  • a part-of-speech tagger (most of which you will implement in lab 3)
  • a transition-based dependency parser (most of which you will implement in lab 4)
  • code to read and output dependency trees in the CoNLL-U format

You should also write code to train and evaluate your system on any given Universal Dependencies treebank. Your code should report tagging accuracy and unlabelled attachment score.

Some of the Universal Dependencies treebanks contain so-called non-projective trees. To train on these treebanks, you will first have to projectivize them. For this you can use the following Python script (contains usage instructions): projectivize.py

Instructions: Submit an email containing the following: (a) the tagging accuracy and unlabelled attachment score for your baseline system when trained on the training sections and evaluated on the development sections of the English and the Swedish treebank, (b) a link to a GitLab repository containing your code, and (c) instructions for how to replicate your results using your code.

Due date: 2018-02-23

Format of the subject line: NLP-2018 D3 your group ID marku61

Example: NLP-2018 D3 G1 marku61

Tips for this phase of the project

  • Work in parallel. The different components can largely be developed independently.
  • Present your labs to each other and let the students that appear most confident about a certain component implement it.
  • Take notes of any ideas that you come up with for how the baseline system could be improved.
  • Prepare a couple of slides that present the baseline system. You can later modify them to present your final system.

D4: Modified system

During the project week (W9), your task is to modify and/or apply your baseline system, implementing a method described in the NLP literature. There are many different things that you could try. Here are some ideas, roughly sorted from simple to complex. For each idea we also list a research article that may make a suitable starting point for your project.

Most research articles in the field of natural language processing are available for free via the ACL Anthology.

At the end of the project week, you should write a short abstract for your project. The abstract should summarise what you have done in the project, as well as your main results. The purpose of the abstract is to announce your presentation ahead of the ‘mini-conference’ that will take place in W10.

Instructions: Submit an email containing the following: (a) a short abstract of your project (no longer than 200 words), and (b) a link to a GitLab repository containing your code.

Due date: 2018-03-02

Format of the subject line: NLP-2018 D4 your group ID marku61

Example: NLP-2018 D4 G1 marku61

Feedback and examination: You can get feedback on your project plan from the examiner (book an appointment). This feedback will give you an idea to what degree your project meets the project-related assessment criteria in the Project Rubric.

D5: Project presentation

In the week following the project week (W10), your group will present your project at the course’s ‘mini-conference’. You are allotted a 15 minute time slot for this presentation. You are free to choose the presentation’s content and structure, but you should bear in mind that the presentation needs to be understandable to everybody in the course.

In preparing the presentation, you may want to consider the following questions:

  • What have you done in this project? What method did you evaluate?
  • Why have you chosen this particular project?
  • Which sources of scientific information did you use?
  • What are your experimental results?
  • What are your conclusions regarding the usefulness of the implemented method?

Instructions: Present your project, following the instructions above. The exact schedule for the mini-conference will be announced at the beginning of W10.

Feedback and examination: The examiner will assess your presentation according to the criteria spelled out in the Project Rubric. This assessment will contribute to your grade for the project component of the examination. At the same time, the feedback will be useful to you when preparing your post-project paper.

What if your group cannot present at the mini-conference? The group presentation can be replaced by a written report in which your group presents your project. Please contact the examiner for details.

D6: Post-project paper

The final project-related assignment is an individual reflection paper. The purpose of this assignment is to give you an opportunity to think about what you have learned from the project. The paper should have three components:

  • your description of your project work, with with a focus on those aspects that you consider most important
  • your analysis of your experience based on concepts from the course
  • your conclusions regarding the question what you take away from this part of the course

For more detailed information, see the guide on Reflection papers.

Instructions: Write a paper according to the above specification. Make sure to take into account both the feedback that you got on your pre-project paper and your group’s presentation. The length of your paper should be around 1,000 words (approximately 2 pages). Submit your report as a PDF document.

Due date: 2018-03-17

Format of the subject line: NLP-2018 D6 your LiU-ID marku61

Example: NLP-2018 D6 marjo123 marku61

Examination: The examiner will assess your paper according to the criteria spelled out in the Project Rubric. This assessment will contribute to your grade for the project component of the examination.


Page responsible: Marco Kuhlmann
Last updated: 2018-01-12