Hide menu

TDDE16 Text Mining

Text mining is about deriving knowledge from text. Among many other things, it can be used to identify trends in social media, explore cultural developments through the quantitative analysis of digitised documents, and discover drug–drug interactions by mining medical text. The goal of this course is to introduce you to the main components of the text mining pipeline – information retrieval, natural language processing, and text data analysis – and to enable you to independently plan, carry out, and evaluate text mining projects.

Intended learning outcomes

On completion of the course, you should be able to:

  1. use basic methods for information extraction and retrieval of textual data
  2. apply text processing techniques to prepare documents for statistical modelling
  3. apply relevant machine learning models for analyzing textual data and correctly interpret the results
  4. use machine learning models for text prediction
  5. evaluate the performance of machine learning models for textual data

For each learning objective, there is a set of more specific knowledge requirements that outline what you need to demonstrate in order to earn a certain grade. These knowledge requirements are listed on the Examination page.

Course content

The course covers the following content:

  • introduction and overview of quantitative text analysis and its applications
  • information extraction
  • web crawling
  • information retrieval (tf-idf, vector space models)
  • text preprocessing (bag-of-words, n-grams, sparsity and smoothing for text)
  • document classification and sentiment analysis
  • topic models
  • model evaluation

Teaching and working methods

The course is taught in the form of lectures, lab sessions, and supervision in connection with an individual project. You are also expected to study independently, both individually and in groups. When you plan your time for the course, you should calculate approximately

  • 53 hrs to prepare for, attend, and follow-up on the lectures
  • 27 hrs to prepare for, carry out, and follow-up on the labs
  • 80 hrs to plan, carry out, and follow-up on the project

The course is co-taught with 732A92 Text Mining on the Master’s programme in statistics and data mining.

Course literature

There is no obligatory textbook for the course. Reading consists of individual chapters from the following books:

Feedback policy

What you can expect from us. We try our best to give you prompt, constructive, and meaningful feedback on how well you meet the knowledge requirements set out for the course. We offer feedback in various forms; you can find detailed information about this on the Examination page. Our focus is on non-examinatory, formative feedback, which you can use to improve your learning (and we can use to improve our teaching!) while the course is ongoing.

What we expect from you. We expect you to familiarise yourself with the knowledge requirements set out for the course, and to actively seek our feedback on how well you meet these requirements. We also expect you to reflect on the feedback that we provide, and to grasp opportunities to put it to good use.

Communication policy

What we expect from you. This webpage is the primary source of information about the course, and we expect you to keep yourself up-to-date with what we publish here. We also send out information via the University’s email list for the course, and we expect you to subscribe to this list and read your email on a regular basis while the course is ongoing. Check whether you are subscribed

What you can expect from us. When you contact us via email, you can expect an answer during standard working hours, 8–17. (We do not respond to email in the evening or on a weekend.) For a more personal contact, you can drop by during the examiner’s office hours (Wednesdays 13-17 in Building E, Room 3G.476) or book an appointment.

Page responsible: Marco Kuhlmann
Last updated: 2017-09-22