Hide menu

732A92 Text Mining


This page contains the instructions for the lab assignments, as well as general information about how to work on and how to submit labs. For more information about the examination of the lab component, see the Examination page.

General information

Lab assignments should be done in pairs. Please contact the examiner in case you want to work on your own. Unfortunately, we do not generally have the resources necessary to tutor and give feedback on one-person labs.

Come prepared. We expect you to have read the lab instructions before you come to the tutored lab sessions. If you come unprepared you will have to spend time on reading the instructions on-site, and will have less time to ask questions and get help.

Instructions: Submit your labs according to the instructions below. Please also read the general rules for hand-in assignments. Before you submit your first lab, you and your lab partner need to sign up in Webreg.

Format of the subject line: 732A92-2017 lab code your LiU-ID your partner’s LiU-ID your lab assistant’s LiU-ID

Example: 732A92-2017 L1 marjo123 erika456 fooba99

Lab assistants for this course:

  • Johan Falkenjack: johsj47
  • Huanyu Li: huali50

Feedback: For each lab there are a number of scheduled hours where you can get oral feedback on your work from the lab assistants. If you submit in time for the first due date, you will get also get written feedback. In addition, you can always get feedback from the examiner (drop-in office hours during term time – or book an appointment).

Information about notebooks

This course uses Jupyter notebooks for some of the lab assignments. Notebooks let you write and execute Python code in a web browser, and they make it very easy to mix code and text.

Lab environment. To work on a notebook, you need to be logged into one of IDA’s computers, either on-site or via ThinLinc. At the start of each lab session, you have to activate the course’s lab environment by writing the following at the terminal prompt:

source /home/732A92/labs/environment/bin/activate

Download and open the notebook. To start a new notebook, say L1.ipynb, download the notebook file to your computer and issue the following command at the terminal prompt.

jupyter notebook L1.ipynb

This will show the notebook in your web browser.

Rename the notebook. One of the first things that you should do with a notebook is to rename it, such that we can link the file to your LiU-IDs. Click on the notebook name (next to the Jupyter logo at the top of the browser page) and add your LiU-IDs, like so:

L1-marjo123-erika456

How to work with a notebook. Each notebook consists of a number of so-called cells, which may contain code or text. During the lab you write your own code or text into the cells according to the instructions. When you ‘run’ a code cell (by pressing Shift+Enter), you execute the code in that cell. The output of the code will be shown immediately below the cell.

Check the notebook and submit it. When you are done with a notebook, you should click on Kernel > Restart & Run All to run the code in the notebook and verify that everything works as expected and there are no errors. After this check you can save the notebook and submit it according to the instructions below.

L0: Introduction to Python

This lab and the preparatory teaching session help you to get started with Python. The focus is on language features that are particularly relevant for this course: You will be working with basic data structures for text data such as strings, dictionaries, and vectors, write loops and comprehensions to iterate over sequential data such as lists of strings, and implement simple functions.

Lab assistant: Johan Falkenjack (johsj47)

NB In case you have previously taken a course featuring Python, either a designated programming course or a course in which Python was used for a significant amount of the assignments (such as TDDE09 Natural Language Processing), you can get a ‘free’ Pass on this lab. Please contact the examiner if you want to make use of this possibility.

L1: Information Retrieval

This lab trains your understanding of basic data structures for web information retrieval such as the inverted index, as well as your ability to apply programming techniques such as web crawling, tf-idf weighting, and ranked query answering. The specific task that you will be working on is to build a simple search engine for Android apps.

Lab assistant: Huanyu Li (huali50)

L2: Natural Language Processing

This lab tests your ability to apply natural language processing techniques to extract information from documents, and to evaluate the performance of these techniques. The specific task that you will be working with is to use a standard NLP library to implement and evaluate a simple system for relation extraction.

Lab assistant: Johan Falkenjack (johsj47)

L3: Statistical Modelling

This lab trains your ability to apply statistical models for analysing textual data, and to correctly interpret the results. Your specific task is to implement and evaluate the standard Gibbs sampling algorithm for Latent Dirichlet Allocation in Python.

Lab assistant: Johan Falkenjack (johsj47)


Page responsible: Marco Kuhlmann
Last updated: 2017-10-11