DF22400
Machine Learning (3p)
Introduction and Application for Automated Performance Tuning

EXERCISES FOR SELF-ASSESSMENT - HT/2012

The complexity of most learning algorithms depends on the training set (size). Suggest a filtering algorithm that eliminates redundant instances.
The sum-of-difference-squares used as error function is popular but not robust against outliers, and other error functions are possible. Suggest an error function that is robust against outliers.
Show that the VC dimension of a line is 3.

In a univariate decision tree, numeric properties are often tested by a binary split (comparison of a single variable to a threshold value). Discuss the advantages and disadvantages of using ternary, quaternary etc. splits instead.
Discuss advantages and disadvantages of using nonlinear split functions in multivariate decision trees, compared to linear ones (as presented).
Extend the greedy decision tree creation algorithm of the lecture by backtracking to compute optimal decision trees for the given training set.

To be extended.

This page is maintained by Christoph Kessler.