#
DF22400

Machine Learning (3p)

Introduction and Application for Automated Performance Tuning

##
EXERCISES FOR SELF-ASSESSMENT - HT/2012

###
Supervised Learning

- The complexity of most learning algorithms depends on the training set (size).
Suggest a filtering algorithm that eliminates redundant instances.
- The sum-of-difference-squares used as error function is popular but
not robust against outliers, and other error functions are possible.
Suggest an error function that is robust against outliers.
- Show that the VC dimension of a line is 3.

###
Decision Trees

- In a univariate decision tree, numeric properties are often tested
by a binary split (comparison of a single variable to a threshold value).
Discuss the advantages and
disadvantages of using ternary, quaternary etc. splits instead.
- Discuss advantages and disadvantages of using nonlinear split functions
in multivariate decision trees, compared to linear ones (as presented).
- Extend the greedy decision tree creation algorithm of the lecture
by backtracking to compute optimal decision trees for the given training set.

To be extended.

This page is maintained by Christoph Kessler.