Hide menu

TDDE09 Natural Language Processing


This page contains the instructions for the project assignments. For more information about how the project module is examined, see the page on Examination.

Overview

The main purpose of the project module is to give you an opportunity to seek, assess, and use scientific information within the area of NLP (learning outcome 4). You will also have opportunity to deepen the knowledge that you have acquired in the other modules.

General structure

The project should be carried out in groups of approximately 6 students and will center around a concrete task: You will implement a syntactic parser, train the implemented parser on the data released by the Universal Dependencies project, and evaluate the accuracy of the trained parser either on gold-standard data or in the context of a concrete downstream task.

The minimal project looks as follows:

  • Implement a complete tagger–parser pipeline based on labs 3 and 4
  • Modify and/or apply this baseline system, implementing a method described in the NLP literature
  • Evaluate your system on the Universal Dependencies treebanks or in the context of some downstream task
  • Analyse your results and draw conclusions about the usefulness of the chosen method

Simple projects will make minor modifications to the baseline system. More advanced projects will be more varied and either implement substantial changes (such as a different parsing algorithm), or apply the parser to a downstream task (such as information extraction). In any case, the focus should be on the implementation of methods described in the NLP literature.

Time requirements

The project runs W4–W11, but most of the work is concentrated during the project week in W10. When you plan your time for the project, you should calculate approximately 53 hours per group member, or a total of 318 hours for a group with 6 members. Here is a suggested breakdown of this time into concrete tasks:

  • 12 hours for the project work in W4–W9 (roughly 2 hours per week)
  • 8 hours for the optional pre-project paper (D2)
  • 16 hours for the most intensive part of the project work in W10
  • 4 hours to participate in the project presentations in W11
  • 8 hours for the post-project paper (D6)

Deliverables

While the choice of the specific focus of your project is completely up to your group, the form of the project is rather rigid. In particular, throughout the project you will have to submit a number of deliverables (D1–D6); these are designed to keep you on track, and to give you feedback on your progress. The rest of this page contains detailed information about these deliverables.

D1: Group contract

Your first task in the project (scheduled for W4–W5) is to form your project group. We encourage you to form groups that include students with different backgrounds, skills, and interests, as this can improve the quality of the project.

After formation, your group is required to make a group contract that will govern your collaboration. The contract should spell out those behaviours that you expect of all group members, as well as procedures for resolving impasses in the group. Specific questions to think about include the following:

  • How will we communicate with each other? At what times?
  • How often and where will we meet?
  • How will we make sure that our meetings are productive?
  • What will we do if somebody does not show up at a meeting?
  • What will we do if somebody breaks any rule set out in this contract?

Instructions: Make a group contract and have it signed by all members of the group. Include both the name and the LiU-ID of each group member. Scan the signed contract and submit it as a PDF document. Rules for hand-in assignments

Due date: 2019-02-01

Format of the subject line: TDDE09-2019 D1 marku61

Upon receiving your group contract, we will assign your group a group ID that you should use in future submissions (see below).

D2: Pre-project paper (optional)

To help you get into the topic of the project quickly, we invite you to write an optional pre-project paper on syntactic parsing and what it can be used for. This paper will also provide you with material for your (mandatory) post-project paper (D6).

The material for the pre-project paper consists of the following reading:

In your paper, you should address the following questions:

  • What is syntactic parsing, what can it be used for, and how can it be done?
  • Why is parsing so hard for computers to get right, and how can machine learning help with that?
  • What do you expect to learn from the project? How, exactly, will you learn it? Why will this learning matter?

We encourage you to discuss these questions in your group in order to get feedback and align your views of your project. However, note that everything that you present for assessment must be your own work.

You can think of the pre-project paper as a forward-looking version of the post-project paper (D6), and follow the same structure and the same suggestions as for that paper.

Instructions: Write a paper addressing the above questions. The length of your paper should be around 1,000 words (approximately 2 pages). Submit your paper as a PDF document named as follows: TDDE09-2019-D2-your LiU-ID.pdf

Due date: 2019-02-08

Format of the subject line: TDDE09-2019 D2 your LiU-ID marku61

Example: TDDE09-2019 D2 marjo123 marku61

Feedback and examination: You will get feedback on your paper from the examiner. This feedback will be useful to you when you prepare your post-project paper (D6).

D3: Baseline system

During W7–W9 the task for your group is to implement and evaluate the baseline system. This system should realise a simple pipeline architecture with the following components:

  • a part-of-speech tagger (most of which you will implement in lab 3)
  • a transition-based dependency parser (most of which you will implement in lab 4)
  • code to read and output dependency trees in the CoNLL-U format

You should also write code to train and evaluate your system on any given Universal Dependencies treebank. Your code should report tagging accuracy and unlabelled attachment score.

Some of the Universal Dependencies treebanks contain so-called non-projective trees. To train on these treebanks, you will first have to projectivize them. For this you can use the following Python script (contains usage instructions): projectivize.py

Instructions: Submit an email containing the following: (a) the tagging accuracy and unlabelled attachment score for your baseline system (both the perceptron version and the neural version) when trained on the training sections and evaluated on the development sections of the English Web Treebank (EWT), (b) a link to a GitLab repository containing your code, and (c) instructions for how to replicate your results using your code.

Due date: 2019-03-01

Format of the subject line: TDDE09-2019 D3 your group ID marku61

Example: TDDE09-2019 D3 G1 marku61

Tips for this phase of the project

  • Work in parallel. The different components of the baseline system can largely be developed independently.
  • Take notes of any ideas that you come up with for how the baseline system could be improved.
  • Prepare a couple of slides that present the baseline system. These slides will come in handy for the project presentation.

D4: Modified system

During the project week (W10), your task is to modify and/or apply your baseline system, implementing a method described in the NLP literature. There are many different things that you could try. Here are some ideas, roughly sorted from simple to complex. For each idea we also list a research article that may make a suitable starting point for your project.

Most research articles in the field of natural language processing are available for free via the ACL Anthology.

At the end of the project week, you should write a short abstract for your project. The abstract should summarise what you have done in the project, as well as your main results. The purpose of the abstract is to announce your presentation ahead of the ‘mini-conference’ that will take place in W11.

Instructions: Submit an email containing the following: (a) a short plain-text abstract of your project (no longer than 200 words), and (b) a link to a GitLab repository containing your code.

Due date: 2019-03-08

Format of the subject line: TDDE09-2019 D4 your group ID marku61

Example: TDDE09-2019 D4 G1 marku61

Feedback and examination: You can get feedback on your project plan from the examiner (book an appointment). This feedback will give you an idea to what degree your project meets the project-related assessment criteria in the Project Rubric.

D5: Project presentation

In the week following the project week (W11), your group will present your project at the course’s ‘mini-conference’. You are allotted a 15 minute time slot for this presentation. You are free to choose the presentation’s content and structure. Bear in mind that the presentation needs to be understandable to everybody in the course (not only the examiner).

In preparing the presentation, you may want to consider the following questions:

  • What have you done in this project? What method did you evaluate?
  • Why have you chosen this particular project?
  • Which sources of scientific information did you use?
  • What are your experimental results?
  • What are your conclusions regarding the implemented method?

Instructions: Present your project, following the instructions above. The exact schedule for the mini-conference will be announced at the beginning of W10 can be found below.

Wednesday 13 March, 14:55–17:00

  • 14:55–15:00 Introduction
  • 15:00–15:20 Group 1
  • 15:20–15:40 Group 2
  • 15:40–16:00 Group 3
  • 16:00–16:20 Group 4
  • 16:20–16:40 Group 5
  • 16:40–17:00 Group 6

Feedback and examination: You will receive oral feedback on your project and your presentation during the mini-conference; this feedback will be useful to you when preparing your post-project paper. After the conference the examiner will assess your presentation according to the criteria spelled out in the Project Rubric. This assessment will contribute to your grade for the project module.

D6: Post-project paper

The final project-related assignment is an individual reflection paper. The purpose of this assignment is to give you an opportunity to take stock of what you have learned from the project. We ask you to structure your paper into three parts as follows:

  • Describe your work with the project. Focus on things that let you illustrate what you have learned.
  • Examine your work and link it to the relevant course content.
  • Articulate your learning. What did you learn? How, exactly, did you learn it? Why does this learning matter?

For more tips on how to write a good reflection paper, see the guide on Reflection papers.

Instructions: Write a paper according to the given specification. The length of your paper should be around 1,000 words (approximately 2 pages). Submit your report as a PDF document named as follows: TDDE09-2019-D6-your LiU-ID.pdf

Due date: 2019-03-23

Format of the subject line: TDDE09-2019 D6 your LiU-ID marku61

Example: TDDE09-2019 D6 marjo123 marku61

Examination: The examiner will assess your paper according to the criteria spelled out in the Project Rubric. This assessment will contribute to your grade for the project module.


Page responsible: Marco Kuhlmann
Last updated: 2019-01-14