Hide menu

NLP Natural Language Processing


This page links to the video lectures and study materials for the interactive sessions (notebooks and additional reading) and lists the central concepts, models, and algorithms that you are expected to master after each unit.

Course introduction

Welcome to the course! This unit introduces you to natural language processing and to written language as a type of data, presents the course logistics, and reviews basic concepts from linguistics and machine learning. You will also learn how to implement a simple sentiment classifier based on the bag-of-words representation and softmax regression.

Lectures

Reading

  • Eisenstein (2019), chapter 1 and sections 2.1, 2.5–2.7, 4.3
  • Detailed information about the course logistics is available on this website.

Concepts, models, and algorithms

  • search and learning, Zipf’s law, Heaps’ law
  • lexeme, lemma, part-of-speech, dependency tree, synonym
  • tokenization, vocabulary, word token, word type, normalization
  • sentiment analysis, bag-of-words
  • softmax regression, cross-entropy loss, gradient-based optimization

Unit 1: Word representations

To process words using neural networks, we need to represent them as vectors of numerical values. In this unit you will learn different methods for learning these representations from data, including the widely-used skip-gram model. The unit also introduces the idea of subword representations, and in particular character-level representations, which can be learned using convolutional neural networks.

Lectures

Reading

Concepts, models, and algorithms

  • one-hot vectors, word embeddings, distributional hypothesis, co-occurrence matrix
  • truncated singular value decomposition, positive pointwise mutual information
  • embedding layers, continuous bag-of-words classifier, representation learning, transfer learning
  • skip-gram model, negative sampling
  • word piece tokenization, byte pair encoding algorithm, character-level word representations, word dropout
  • convolutional neural network, CNN architecture for text classification

Unit 2: Language modelling

Language modelling is the task of predicting which word comes next in a sequence of words. This unit presents two types of language models: n-gram models and neural models, with a focus on models based on recurrent neural networks. You will also learn how these language models can be used to learn more powerful, contextualized word representations.

Lectures

Reading

Concepts, models, and algorithms

  • language modelling as a prediction task and as a probability model, perplexity
  • n-gram language models, maximum likelihood estimation, smoothing, interpolation
  • recurrent neural networks, backpropagation through time, encoder/transducer/decoder architectures
  • Long Short-Term Memory (LSTM) architecture, gating mechanism
  • fixed-window language model, recurrent language model
  • polysemy, contextualized word embeddings, bidirectional LSTM, ELMo architecture

Unit 3: Sequence labelling

Sequence labelling is the task of assigning a class label to each item in an input sequence. Many tasks in natural language processing can be cast as sequence labelling problems over different sets of output labels, including part-of-speech tagging, word segmentation, and named entity recognition. This unit introduces several models for sequence labelling, both with local and global search.

Lectures

  • 3.1 Introduction to sequence labelling (slides, video)
  • 3.2 Sequence labelling with local search (slides, video)
  • 3.3 Part-of-speech tagging with the perceptron (slides, video)
  • 3.4 The perceptron learning algorithm (slides, video)
  • 3.5 Sequence labelling with global search (slides, video)
  • 3.6 The Viterbi algorithm (slides, video)

Reading

  • Eisenstein (2019), chapters 7–8, sections 2.3.1–2.3.2
  • Daumé, A Course in Machine Learning, section 4.6 (link)

Concepts, models, and algorithms

  • different types of sequence labelling tasks: tagging, segmentation, bracketing; accuracy, precision, recall
  • fixed-window model and bidirectional RNN model for sequence labelling, autoregressive models, teacher forcing
  • perceptron, features in part-of-speech tagging, feature templates
  • perceptron learning algorithm, averaged perceptron
  • Maximum Entropy Markov Model (MEMM), label bias problem, Conditional Random Field (CRF)
  • Viterbi algorithm, backpointers

Unit 4: Syntactic analysis

Syntactic analysis, also called syntactic parsing, is the task of mapping a sentence to a formal representation of its syntactic structure. In this lecture you will learn about two approaches to dependency parsing, where the target representations take the form of dependency trees: the Eisner algorithm, which casts dependency parsing as combinatorial optimisation over graphs, and transition-based dependency parsing, which is the algorithm also used by Google.

Lectures

Reading

  • Eisenstein (2019), chapter 11

Concepts, models, and algorithms

  • dependency tree, head, dependent, graph-based parsing, transition-based parsing
  • arc-standard algorithm, projective/non-projective dependency trees, static oracle
  • Eisner algorithm, backpointers
  • parsing architectures of Chen and Manning (2014), Kiperwasser and Goldberg (2016), Dozat and Manning (2017)
  • dynamic oracle, transition cost, arc reachability, arc-hybrid algorithm

Unit 5: Machine translation & current research

Machine translation is one of the classical problems in artificial inteligence. In this unit you will learn about neural machine translation and one of its standard models, the encoder–decoder architecture. A crucial ingredient in this architecture is the mechanism of attention. This concept is also the key to some of the most recent developments in the field of NLP, the Transformer architecture, which we will cover in the last lectures of this unit.

Lectures

Reading

  • Eisenstein (2019), chapter 18

Concepts, models, and algorithms

  • interlingual machine translation, noisy channel model, word alignments, BLEU score
  • encoder–decoder architecture
  • recency bias, attention, context vector
  • Transformer architecture, self-attention, scaled dot-product attention
  • BERT, masked language modelling task, next sentence prediction task

Page responsible: Marco Kuhlmann
Last updated: 2021-01-17