{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# L3: Part-of-speech tagging" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Introduction" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Part-of-speech (POS) tagging is the task of labelling words (tokens) with parts of speech such as noun, adjective, and verb. In this lab you will experiment with POS taggers trained on the [Stockholm Umeå Corpus (SUC)](http://spraakbanken.gu.se/eng/resources/suc), a Swedish text corpus containing more than 74,000 sentences (1.1 million tokens), which were manually tagged with, among others, parts of speech. The corpus is divided into two files:\n", "\n", "
suc-train.txt | 72,594 sentences | 1,142,802 tokens |
suc-test.txt | 1,569 sentences | 23,319 tokens |
Pick one of the problems that you worked on in this lab and write a short reflection piece about your experience. Use the following structure:
\n", "