Hide menu

732A92 Text Mining


Text Mining develops methods for accessing information in and extracting knowledge from large volumes of text. The overall aim of this course is to provide students with practical experience of the main steps of text mining: information retrieval, processing of text data, modelling, analysis of experimental results. The course ends with an individual project where students work on a self-defined problem.

Intended learning outcomes

On completion of the course, you should be able to:

  1. use basic methods for information extraction and retrieval of textual data
  2. apply text processing techniques to prepare documents for statistical modelling
  3. apply relevant machine learning models for analyzing textual data and correctly interpret the results
  4. use machine learning models for text prediction
  5. evaluate the performance of machine learning models for textual data

For each learning objective, there is a set of more specific knowledge requirements that express what you need to demonstrate in order to attain a particular grade. These knowledge requirements are listed on the Examination page.

Course content

The course covers the following content:

  • information retrieval
  • document classification
  • document clustering
  • natural language processing
  • information extraction

Teaching and working methods

The course is taught in the form of lectures, lab sessions, and supervision in connection with an individual project. You are also expected to study independently, both individually and in groups. When you plan your time for the course, you should calculate approximately

  • 42 hrs to prepare for, attend, and follow-up on the lectures
  • 30 hrs to prepare for, carry out, and follow-up on the labs
  • 88 hrs to plan, carry out, and document the project

The course is co-taught with TDDE16 Text Mining at the Faculty of Science and Engineering.

Course literature

The reading for this course consists of excerpts from the following books, as well as research articles.

  • Daniel Jurafsky and James H. Martin. Speech and Language Processing. An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Draft chapters of 3rd edition, October 2019.

  • Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze. Introduction to Information Retrieval. Cambridge University Press, 2008. The complete book is available on-line.

  • ChengXiang Zhai and Sean Massung. Text Data Management and Analysis. A Practical Introduction to Information Retrieval and Text Mining. Morgan & Claypool, 2016.

Feedback policy

What you can expect from us. We try our best to give you prompt, constructive, and meaningful feedback on how well you meet the knowledge requirements set out for the course. We offer feedback in various forms; you can find detailed information about this on the Examination page. Our focus is on non-examinatory, formative feedback, which you can use to improve your learning (and we can use to improve our teaching!) while the course is ongoing.

What we expect from you. We expect you to familiarize yourself with the knowledge requirements set out for the course, and to actively seek our feedback on how well you meet these requirements. We also expect you to reflect on the feedback that we provide, and to grasp opportunities to put it to good use.

Communication policy

What we expect from you. This website is the primary source of information about the course, and we expect you to keep yourself up-to-date with what we publish here. We also send out information via the University’s email list for the course, and we expect you to read email from this list on a regular basis while the course is ongoing.

What you can expect from us. When you contact us via email, you can expect an answer during standard working hours, 8–17. (We do not respond to email in the evening or on a weekend.) For a more personal contact, you can book an appointment with the examiner (via Doodle). During the 2020 session, the course staff uses Microsoft Teams instead of physical meetings.


Page responsible: Marco Kuhlmann
Last updated: 2020-11-02