Hide menu

Text Mining


Status Cancelled
School Computer and Information Science (CIS)
Division STIMA
Owner Mattias Villani
Homepage http://www.ida.liu.se/edu/ugrad/course.sv/732A92

  Log in  

Course plan

No of lectures

1 lecture on Python, 2 lectures on information retrieval, 3 lectures on natural language processing + 3 lectures on statistical methods for text mining.

Recommended for

PhD students in Statistics, Computer Science and the Engineering sciences.

The course was last given

Fall 2016


The overall aim of the course is to provide an introduction to quantitative analysis of text, with special focus on applying machine learning methods to text documents. In particular, the student should learn all the main steps when working with text: i) efficient extraction of text, ii) natural language processing of text in a form suitable for iii) statistical machine learning methods which are subsequently used for iv) text prediction.
After completing the course the student should be able to:
• use basic methods for information extraction and retrieval of textual data.
• apply text processing techniques to prepare documents for statistical modelling
• apply relevant machine learning models for analyzing textual data and correctly interpreting the results
• use machine learning models for text prediction
• evaluate the performance of machine learning models for textual data


Introduction to machine learning or equivalent. At least one course in probability and statistics.


The course consists of lectures, lab exercises and a text mining project. The lectures are devoted to presentations of concepts, and methods. The computer exercises are devoted to practical application of text mining tools. In the project work, the student will get hands-on experience in solving a text mining problem.
Language of instruction: English.


Introduction and overview of quantitative text analysis and its applications. Information extraction. Web crawling. Information retrieval. Tf-idf. Vector space models. Text preprocessing. Bag of words. N-grams. Sparsity and smoothing for text. Document classification. Sentiment analysis. Model evaluation. Topic models.




Mattias Villani
Marco Kuhlmann
Patrick Lambrix


Mattias Villani


Text mining project report. Written reports on lab assignments.




Page responsible: Director of Graduate Studies
Last updated: 2012-05-03