Welcome to the course! This unit introduces you to natural language processing and to written language as a type of data, presents the course logistics, and reviews basic concepts from linguistics and machine learning. You will also learn how to implement a simple sentiment classifier based on the bag-of-words representation and softmax regression.

Teaching session

0.1 Introduction to natural language processing (slides)
0.2 Course overview (slides)
0.3 Course logistics (slides)
0.4 Softmax regression (slides)
0.5 Implementing softmax regression (notebook, live notebook)

Video lectures (review)

0.6 Essentials of linguistics (slides, video)
0.7 Basic text processing (notebook, live notebook, video)

Reading

Eisenstein (2019), chapter 1 and sections 2.1, 2.5–2.7, 4.3
Detailed information about the course logistics is available on this website.

Concepts, models, and algorithms

search and learning, Zipf’s law, Heaps’ law
lexeme, lemma, part-of-speech, dependency tree, synonym
tokenization, vocabulary, word token, word type, normalization
sentiment analysis, bag-of-words
softmax regression, cross-entropy loss, gradient-based optimization

Unit 1: Word representations

To process words using neural networks, we need to represent them as vectors of numerical values. In this unit you will learn different methods for learning these representations from data, including the widely-used skip-gram model. The unit also introduces the idea of subword representations, and in particular character-level representations, which can be learned using convolutional neural networks.

Video lectures and quizzes

1.1 Introduction to word representations (slides, video, quiz)
1.2 Learning word embeddings via singular value decomposition (slides, video, quiz)
1.3 Learning word embeddings with neural networks (slides, video, quiz, notebook, live notebook)
1.4 The skip-gram model (slides, video, quiz)
1.5 Subword models (slides, video, quiz)
1.6 Convolutional neural networks (slides, video, quiz)

Reading

Eisenstein (2019), chapter 14
Research article: Word Embeddings Quantify 100 Years of Gender and Ethnic Stereotypes

Concepts, models, and algorithms

one-hot vectors, word embeddings, distributional hypothesis, co-occurrence matrix
truncated singular value decomposition, positive pointwise mutual information
embedding layers, continuous bag-of-words classifier, representation learning, transfer learning
skip-gram model, negative sampling
word piece tokenization, byte pair encoding algorithm, character-level word representations, word dropout
convolutional neural network, CNN architecture for text classification

Unit 2: Language modelling

Language modelling is the task of predicting which word comes next in a sequence of words. This unit presents two types of language models: n-gram models and neural models, with a focus on models based on recurrent neural networks. You will also learn how these language models can be used to learn more powerful, contextualized word representations.

Lectures

2.1 Introduction to language modelling (slides, video, quiz)
2.2 N-gram language models (slides, video, quiz)
2.3 Recurrent neural networks (slides, video, quiz, live notebook)
2.4 Long Short-Term Memory (LSTM) networks (slides, video, quiz)
2.5 Recurrent neural network language models (slides, video, quiz, live notebook)
2.6 Contextualized word embeddings (slides, quiz, video)

Reading

Eisenstein (2019), chapter 6
Blog article: Understanding LSTM Networks

Concepts, models, and algorithms

language modelling as a prediction task and as a probability model, perplexity
n-gram language models, maximum likelihood estimation, smoothing, interpolation
recurrent neural networks, backpropagation through time, encoder/transducer/decoder architectures
Long Short-Term Memory (LSTM) architecture, gating mechanism
fixed-window language model, recurrent language model
polysemy, contextualized word embeddings, bidirectional LSTM, ELMo architecture

Unit 3: Large language models

Machine translation is one of the classical problems in artificial inteligence. In this unit you will learn about neural machine translation and one of its standard models, the encoder–decoder architecture. A crucial ingredient in this architecture is the mechanism of attention. This concept is also the key to some of the most recent developments in the field of NLP, the Transformer architecture, which we will cover in the last lectures of this unit.

Lectures

3.1 Introduction to machine translation (slides, video, quiz)
3.2 Neural machine translation (slides, video, quiz)
3.3 Attention (slides, video, quiz)
3.4 The Transformer architecture (slides, video, quiz)
3.5 Decoder-based language models (GPT) (slides, video, quiz)
3.6 Encoder-based language models (BERT) (slides, video, quiz)

Reading

Eisenstein (2019), chapter 18

Concepts, models, and algorithms

interlingual machine translation, noisy channel model, word alignments, BLEU score
encoder–decoder architecture
recency bias, attention, Bahdanau attention, scaled dot-product attention, multi-head attention
Transformer architecture, self-attention
GPT, pre-training and fine-tuning, zero-shot behaviour
BERT, masked language modelling task, next sentence prediction task

Unit 4: Sequence labelling

Sequence labelling is the task of assigning a class label to each item in an input sequence. Many tasks in natural language processing can be cast as sequence labelling problems over different sets of output labels, including part-of-speech tagging, word segmentation, and named entity recognition. This unit introduces several models for sequence labelling, both with local and global search.

Lectures

4.1 Introduction to sequence labelling (slides, video, quiz)
4.2 Sequence labelling with local search (slides, video, quiz)
4.3 Part-of-speech tagging with the perceptron (slides, video, quiz)
4.4 The perceptron learning algorithm (slides, video, quiz)
4.5 Sequence labelling with global search (slides, video, quiz)
4.6 The Viterbi algorithm (slides, video, quiz)

Reading

Eisenstein (2019), chapters 7–8, sections 2.3.1–2.3.2
Daumé, A Course in Machine Learning, section 4.6 (link)

Concepts, models, and algorithms

different types of sequence labelling tasks: tagging, segmentation, bracketing; accuracy, precision, recall
fixed-window model and bidirectional RNN model for sequence labelling, autoregressive models, teacher forcing
perceptron, features in part-of-speech tagging, feature templates
perceptron learning algorithm, averaged perceptron
Maximum Entropy Markov Model (MEMM), label bias problem, Conditional Random Field (CRF)
Viterbi algorithm, backpointers

Unit 5: Syntactic analysis

Syntactic analysis, also called syntactic parsing, is the task of mapping a sentence to a formal representation of its syntactic structure. In this lecture you will learn about two approaches to dependency parsing, where the target representations take the form of dependency trees: the Eisner algorithm, which casts dependency parsing as combinatorial optimisation over graphs, and transition-based dependency parsing, which is the algorithm also used by Google.

Lectures

5.1 Introduction to dependency parsing (slides, video, quiz)
5.2 The arc-standard algorithm (slides, video, quiz)
5.3 The Eisner algorithm (slides, video, quiz)
5.4 Neural architectures for dependency parsing (slides, video, quiz)
5.5 Dynamic oracles (slides, video, quiz)

(Note that there is no Lecture 5.6.)

Reading

Eisenstein (2019), chapter 11

Concepts, models, and algorithms

dependency tree, head, dependent, graph-based parsing, transition-based parsing
arc-standard algorithm, projective/non-projective dependency trees, static oracle
Eisner algorithm, backpointers
parsing architectures of Chen and Manning (2014), Kiperwasser and Goldberg (2016), Dozat and Manning (2017)
dynamic oracle, transition cost, arc reachability, arc-hybrid algorithm

Course introduction

Teaching session

Video lectures (review)

Reading

Concepts, models, and algorithms

Unit 1: Word representations

Video lectures and quizzes

Reading

Concepts, models, and algorithms

Unit 2: Language modelling

Lectures

Reading

Concepts, models, and algorithms

Unit 3: Large language models

Lectures

Reading

Concepts, models, and algorithms

Unit 4: Sequence labelling

Lectures

Reading

Concepts, models, and algorithms

Unit 5: Syntactic analysis

Lectures

Reading

Concepts, models, and algorithms