# 729G17 Language Technology

### Lectures

This page contains the study materials for the lectures and specifies the central concepts and procedures that you are supposed to master after each lecture. For more information about how these contents are examined, see the page on Examination.

### Course introduction

Welcome to the course! This introductory module introduces you to language technology as an application area and to the content and organisation of the course, and reviews some basic concepts from the area of text segmentation.

#### Materials

Detailed information about the course organisation and examination is available on this website.

#### Content

After this module you should be able to explain and apply the following concepts:

- ambiguity, contextuality, combinatorial explosion
- tokenisation, word token, word type, normalisation, stop word

### Topic 1: Text classification

Text classification is the task of categorising text documents into predefined classes. In this module you will be introduced to text classification and its applications, learn how to evaluate text classifiers using standard validation methods, and get to know the Naive Bayes classifier, a simple but effective probabilistic model for text classification.

#### Materials

- Slides: Text classification
- Videos: Text classification (2017)
- Naive Bayes and Sentiment Classification, chapter 6 in Jurafsky and Martin (2017), Sections 6.1–6.4, 6.6–6.7

#### Content

After this module you should be able to explain and apply the following concepts:

- accuracy, precision, recall
- Naive Bayes classifier
- log probabilities
- maximum likelihood estimation, additive smoothing

After this module you should be able to perform the following procedures:

- evaluate a text classifier based on accuracy, precision, and recall
- apply the classification rule of the Naive Bayes classifier to a text
- learn the probabilities of a Naive Bayes classifier using maximum likelihood estimation and additive smoothing

### Topic 2: Language modelling

Language modelling is the task of assigning probabilities to sentences in a given language. This unit focuses on n-gram-models, which have a wide range of applications such as language identification, machine translation, and predictive text input. High-quality models require advanced smoothing techniques, which are a central topic of this module. You will also learn how to evaluate language models using perplexity. The final part of the section introduces edit distance, which in connection with language models can be used for automatic spelling correction.

#### Materials

- Slides: Language modelling
- Videos: Language modelling (from the 2017 run of the course)
- Language Modeling with N-grams, chapter 4 in Jurafsky and Martin (2017), Sections 4.1–4.2, 4.4.1–4.4.2

#### Materials (advanced)

The advanced material for this section is the Wagner–Fisher algorithm for computing the Levenshtein distance between two words.

- Slides: Edit distance
- Videos: Language modelling (advanced material) (from the 2017 run of the course)

#### Content

After this module you should be able to explain and apply the following concepts:

- n-gram model
- additive smoothing
- perplexity, entropy
- Levenshtein distance
- Wagner–Fisher algorithm advanced

After this module you should be able to perform the following procedures:

- learn an n-gram model using additive smoothing
- evaluate an n-gram model using entropy
- compute the Levenshtein distance between two words
- simulate the Wagner–Fisher algorithm advanced

### Topic 3: Part-of-speech tagging

A part-of-speech tagger is a computer program that tags each word in a sentence with its part of speech, such as noun, adjective, or verb. In this section you will learn how to evaluate part-of-speech taggers, and be introduced to two methods for part-of-speech tagging: hidden Markov models (which generalise the Markov models that you encountered in the section on language modelling) and the multi-class perceptron.

#### Materials

- Slides: Part-of-speech tagging
- Videos: Part-of-speech tagging
- Part-of-Speech Tagging, chapter 10 in Jurafsky and Martin (2017), Sections 10.1–10.4

#### Materials (advanced)

The advanced material for this section consists of two parts: the Viterbi algorithm for computing the most probable tag sequence for a sentence under a hidden Markov model, and the generalisation of part-of-speech tagging to named entity recognition (NER).

- Slides: Part-of-speech tagging (material on the Viterbi algorithm)
- Videos: Part-of-Speech Tagging (advanced material)
- Information extraction, chapter 21 of Jurafsky and Martin (2017), section 21.1

#### Content

After this module you should be able to explain and apply the following concepts:

- part of speech, part-of-speech tagger
- accuracy, precision, recall
- hidden Markov model
- multi-class perceptron, feature window
- Viterbi algorithm advanced
- named entity recognition as tagging advanced

After this module you should be able to perform the following procedures:

- evaluate a part-of-speech tagger based on accuracy, precision, and recall
- compute the probability of a tagged sentence in a hidden Markov model
- simulate a part-of-speech tagger based on the multi-class perceptron
- simulate the Viterbi algorithm advanced

### Topic 4: Syntactic analysis

Syntactic analysis, also called syntactic parsing, is the task of mapping a sentence to a formal representation of its syntactic structure. In this lecture you will learn about parsing to two target representations: phrase structure trees and dependency trees. The central model for the parsing to phrase structure trees is that of a probabilistic context-free grammar. For parsing to dependency trees, you will learn about the transition-based dependency parsing algorithm, which is also used by Google.

#### Materials

- Slides: Syntactic analysis
- Formal Grammars of English, chapter 11 of Jurafsky and Martin (2017), sections 11.1–11.2, 11.4
- Statistical Parsing, chapter 13 of Jurafsky and Martin (2017), sections 13.1–13.1.1
- Dependency Parsing, chapter 14 of Jurafsky and Martin (2017), sections 14.1–14.4.1

#### Materials (advanced)

The advanced material for this section is the CKY algorithm for computing the most probable parse tree for a sentence under a probabilistic context-free grammar.

- Syntactic Parsing, chapter 12 of Jurafsky and Martin (2017), section 12.2
- Statistical Parsing, chapter 13 of Jurafsky and Martin (2017), section 13.2

#### Content

After this module you should be able to explain and apply the following concepts:

- phrase structure tree, dependency tree
- probabilistic context-free grammar
- transition-based dependency parser
- CKY algorithm for probabilistic context-free grammars advanced
- extraction of semantic relations from dependency trees advanced

After this module you should be able to perform the following procedures:

- learn a probabilistic context-free grammar from a treebank
- simulate a transition-based dependency parser
- simulate the CKY algorithm for probabilistic context-free grammars advanced

### Topic 5: Semantic analysis

In this lecture you will learn about word senses and the problems they posed for language technology, as well as about two important problems in semantic analysis: word sense disambiguation and word similarity. For each task you will learn about both knowledge-based and data-driven methods, including the popular continuous bag-of-words model used in Google’s word2vec software.

#### Materials

- Slides: Semantic analysis
- Computing with Word Senses, chapter 17 in Jurafsky and Martin (2017) (excluding 17.7–17.9)

#### Concepts

- word sense, homonymy, polysemy
- synonymy, antonymy, hyponymy, hypernymy, WordNet
- Simplified Lesk algorithm
- word similarity, distributional hypothesis, co-occurrence matrix

#### Procedures

- simulate the Simplified Lesk algorithm
- compute the path length-based similarity of two words
- derive a co-occurrence matrix from a document collection

Page responsible: Marco Kuhlmann

Last updated: 2018-01-15