{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# L4: Syntactic Analysis"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Introduction"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In this lab you will experiment with [MaltParser](http://www.maltparser.org/), a standard tool for syntactic analysis.\n",
"\n",
"Syntactic analysis, also called syntactic parsing, is the task to map a sentence to a representation of its syntactic structure. In this lab you will work with syntactic structures in the form of **dependency trees**. A dependency tree consists of directed arcs between individual words (tokens) of a sentence. A realistic example of how such a tree can look like is shown below. The tree shown is the first dependency tree from the test set of the [Swedish Treebank](https://stp.lingfil.uu.se/~nivre/swedish_treebank/)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"![Figur 1](https://www.ida.liu.se/~729G17/commons/dep_tree.png)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Every arc in a dependency tree links two words: the word the arc points to is called **dependent**; the word at the source of the arc is called **head**. In the example *pensionen* is the dependent to *är* and, at the same time, the head of *den* and *allmäna*. Every word except the special root word has exactly one head. The root word does not have a head (but it can have dependents). In the example tree, the word *är* is the tree's root, marked with a special arc.\n",
"\n",
"The arcs in a dependency tree are often labelled with **grammatical relations**. In the tree above, the arc from *är* to *pensionen* is labelled with the grammatical relation `SS`, expressing that *pensionen* functions as the grammatical subject to its head word *är*. More information on which grammtical relations are used in the Swedish Treebank can be found [here](https://stp.lingfil.uu.se/~nivre/swedish_treebank/GF.html)."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"As usual we start by loading the Python module needed for this lab:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import lt4"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Evaluation"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"In the first part of the lab, your task is to evaluate dependency parsers, something that is relevant if you for example want to compare various parsers. Four standard measures are used for the evaluation:\n",
"\n",
"**unlabelled attachment score (UAS)**
the percentage of words in the test set that were attached to their correct heads (as specified in the gold standard)\n",
"\n",
"**labelled attachment score (LAS)**
the percentage of words that were attached to their correct heads *and* assigned the correct dependency relation\n",
"\n",
"**unlabelled exact match (UEM)**
the percentage of sentences in which every word was attached to its correct head\n",
"\n",
"**labelled exact match (LEM)**
the percentage of sentences in which every word was attached to its correct head *and* assigned the correct dependency relation\n",
"\n",
"Thus every word/sentence the parser gets credit for under the labelled measures, it also gets credit for under the unlabelled measures, but not vice versa."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### First steps with MaltParser\n",
"\n",
"Before you do anything else, start reading the [MaltParser User Guide](http://maltparser.org/userguide.html), beginning with *Start Using MaltParser* which consists of *Train a parsing model* and *Parse data with your parsing model*. This section explains how you train a parsing model on gold-standard data and how you then use the model to parse new text. Your first task is to execute these two steps with the data from the Swedish Treebank. The data is divided into two files in the directory `/courses/729G17/labs/l4/data/`:\n",
"\n",
"| data type | filename | sentences |\n",
"|---------------|------------------------------|-----------|\n",
"| training data | `talbanken-dep-train.conll` | 4941 |\n",
"| test data | `talbanken-dep-test.conll` | 1219 |"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"
Set up an experiment to find out what type of parser performs better when parsing automatically POS-tagged data:
\n", "Write a short reflection piece about your experience. Use the following prompts:
\n", "