# 729A27 Natural Language Processing

### Lectures

This page contains the study materials for the lectures and specifies the central concepts and procedures that you are supposed to master after each lecture. For more information about how these contents are examined, see the page on Examination.

### Course introduction

Welcome to the course! This introductory module consists of two lectures that introduce you to natural language processing as an application area, the content and organisation of the course, and some basic concepts in text segmentation and linguistics.

#### Materials

Detailed information about the organisation and examination of this course is available on this webpage.

#### Content

After this lecture you should be able to explain and apply the following concepts:

- ambiguity, contextuality, combinatorial explosion
- tokenisation, word token, word type, normalisation, stop word
- morpheme, lexeme, lemma
- part of speech, dependency tree
- supervised and unsupervised machine learning

### Topic 1: Text classification

Text classification is the task of categorising text documents into predefined classes. In this module you will be introduced to text classification and its applications, and learn about two effective classification algorithms: the Naive Bayes classifier and the multi-class perceptron. You will also learn how to evaluate text classifiers using standard validation methods.

#### Materials

- Slides: Text classification
- Videos: Text classification (from the 2017 run of the course)
- Lecture notes by David Chiang (partially covers advanced material)

#### Content

After this lecture you should be able to explain and apply the following concepts:

- Naive Bayes classifier
- maximum likelihood estimation, additive smoothing
- multi-class perceptron classifier
- perceptron learning algorithm, averaging trick
- accuracy, precision, recall

After this lecture you should be able to perform the following procedures:

- evaluate a text classifier based on accuracy, precision, and recall
- apply the classification rule of the Naive Bayes classifier and the multi-class perceptron classifier to a text
- learn the probabilities of a Naive Bayes classifier using maximum likelihood estimation and additive smoothing
- learn the weights of a multi-class perceptron using the perceptron learning algorithm

### Topic 2: Language modelling

Language modelling is the task of assigning probabilities to sentences in a given language. This unit focuses on n-gram-models, which have a wide range of applications such as language identification, machine translation, and predictive text input. High-quality models require advanced smoothing techniques, which are a central topic of this module. You will also learn how to evaluate language models using perplexity.

#### Materials

- Slides: Language modelling
- Videos: Language modelling (from the 2017 run of the course)
- Lecture notes by David Chiang (excluding section 5.6)
- Language Modeling with N-Grams, chapter 4 from Jurafsky and Martin (2017)

The advanced material for this section is the Wagner–Fischer algorithm for computing the Levenshtein distance between two words.

- Slides: Edit distance advanced
- Videos: Language modelling (from the 2017 run of the course) advanced
- Videos: Language modelling (advanced material) (from the 2017 run of the course) advanced

#### Content

After this lecture you should be able to explain and apply the following concepts:

- n-gram model
- add-k smoothing, Witten–Bell smoothing, absolute discounting
- perplexity, entropy
- Levenshtein distance, Wagner–Fisher algorithm advanced

After this lecture you should be able to perform the following procedures:

- learn an n-gram model using additive smoothing and absolute discounting
- evaluate an n-gram model using perplexity and entropy
- compute the Levenshtein distance between two words using the Wagner–Fisher algorithm advanced

### Topic 3: Part-of-speech tagging

A part-of-speech tagger is a computer program that tags each word in a sentence with its part of speech, such as noun, adjective, or verb. In this section you will learn how to evaluate part-of-speech taggers, and be introduced to two methods for part-of-speech tagging: exhaustive search in hidden Markov models (with the Viterbi algorithm), and greedy search with multi-class perceptrons.

#### Materials

- Slides: Part-of-Speech Tagging
- Videos: Part-of-Speech Tagging
- Part-of-Speech Tagging, chapter 10 in Jurafsky and Martin (2017) (Sections 10.1–10.4)

#### Content

After this lecture you should be able to explain and apply the following concepts:

- part of speech, part-of-speech tagger
- accuracy, precision, recall
- hidden Markov model, Viterbi algorithm
- multi-class perceptron, feature window

After this lecture you should be able to perform the following procedures:

- evaluate a part-of-speech tagger based on accuracy, precision, and recall
- compute the probability of a tagged sentence in a hidden Markov model
- simulate the Viterbi algorithm

### Topic 4: Syntactic analysis

Syntactic analysis, also called syntactic parsing, is the task of mapping a sentence to a formal representation of its syntactic structure. In this lecture you will learn about two approaches to dependency parsing, where the target representations take the form of dependency trees: the Eisner algorithm, which casts dependency parsing as combinatorial optimisation over graphs, and transition-based dependency parsing, which is the algorithm also used by Google.

#### Materials

- Slides: Syntactic analysis
- Dependency Parsing, chapter 14 in Jurafsky and Martin (2017)
- Material on the Eisner algorithm (sent out via e-mail) advanced

#### Content

After this lecture you should be able to explain and apply the following concepts:

- dependency tree, projectivity
- Collins’ algorithm, Eisner algorithm advanced
- structured perceptron training
- transition-based dependency parser

After this lecture you should be able to perform the following procedures:

- simulate the Eisner algorithm advanced
- simulate a transition-based dependency parser

### Topic 5: Semantic analysis

This lecture presents techniques for the semantic analysis of words using word embeddings. The basic idea behind word embeddings is to represent words as points in a high-dimensional space in such a way that nearby words (points) have similar meanings. This idea has proven to be very fruitful in a wide range of applications. This lecture presents some standard word embedding models, and in particular the models implemented in Google’s word2vec software.

#### Materials

- Slides: Semantic analysis
- Vector Semantics, chapter 15 in Jurafsky and Martin (2017)
- Semantics with Dense Vectors, chapter 16 in Jurafsky and Martin (2017) (excluding 16.4)

#### Content

After this lecture you should be able to explain and apply the following concepts:

- distributional hypothesis, word embedding, cosine distance
- co-occurrence matrix, positive pointwise mutual information
- truncated singular value decomposition
- continuous bag-of-words model, skip gram-model

After this lecture you should be able to perform the following procedures:

- derive a PPMI matrix from a document collection
- derive word embeddings using truncated singular value decomposition

Page responsible: Marco Kuhlmann

Last updated: 2018-01-12