Text Mining2017HT
|
|
Course plan
No of lectures
1 lecture on Python, 2 lectures on information retrieval, 3 lectures on natural language processing + 3 lectures on statistical methods for text mining.
Recommended for
PhD students in Statistics, Computer Science and the Engineering sciences.
The course was last given
Fall 2016
Goals
The overall aim of the course is to provide an introduction to quantitative
analysis of text, with special focus on applying machine learning methods to
text documents. In particular, the student should learn all the main steps when
working with text: i) efficient extraction of text, ii) natural language
processing of text in a form suitable for iii) statistical machine learning
methods which are subsequently used for iv) text prediction.
After completing the course the student should be able to:
• use basic methods for information extraction and retrieval of textual
data.
• apply text processing techniques to prepare documents for statistical
modelling
• apply relevant machine learning models for analyzing textual data and
correctly interpreting the results
• use machine learning models for text prediction
• evaluate the performance of machine learning models for textual data
Prerequisites
Introduction to machine learning or equivalent. At least one course in probability and statistics.
Organization
The course consists of lectures, lab exercises and a text mining project. The
lectures are devoted to presentations of concepts, and methods. The computer
exercises are devoted to practical application of text mining tools. In the
project work, the student will get hands-on experience in solving a text mining
problem.
Language of instruction: English.
Contents
Introduction and overview of quantitative text analysis and its applications. Information extraction. Web crawling. Information retrieval. Tf-idf. Vector space models. Text preprocessing. Bag of words. N-grams. Sparsity and smoothing for text. Document classification. Sentiment analysis. Model evaluation. Topic models.
Literature
http://www.ida.liu.se/~732A92/info/courseinfo.en.shtml
Lecturers
Mattias Villani
Marco Kuhlmann
Patrick Lambrix
Examiner
Mattias Villani
Examination
Text mining project report. Written reports on lab assignments.
Credit
6 ECTS
Comments
Page responsible: Director of Graduate Studies