# TDDD41 Data Mining - Clustering and Association Analysis and 732A75 Advanced Data Mining

## Example exams

No solutions available, but you are welcome to solve the questions and send the solutions to your teachers for checking.

## Collection of example exam question types. (not necessarily complete.)

### Data mining

• What is the purpose of data mining?
• When are patterns interesting?
• Data in the real world can be dirty. Give reasons and examples.
• Describe a typical process for the knowledge discovery process.
• Describe a typical architecture for a data mining system.

### Clustering

• Give examples of attributes of a specific type (interval-based, binary symmetric, binary asymmetric, categorical, ordinal, ...).
• Define distance measures for the different types of attributes.
• Compute the distance between two given data objects. The objects have the same attributes which may be of different types.
• Describe the principles and ideas regarding clustering algorithm X. Explain the different steps of the algorithm/Give the algorithm.
• Run (an iteration of) clustering algorithm X on a given data set and give partial results for each step.
• What are the main strengths and weaknesses of clustering algorihm X.
• Describe the graph representation of the clustering problem when using partitioning approaches and medoids. In general or given a specific data set. Define/exemplify swapping cost.
• PAM/CLARA/CLARANS: Show how PAM, CLARA, CLARANS work on the graph representation of the clustering problem. Discuss the differences between PAM, CLARA, CLARANS using the graph representation.
• BIRCH: Define/give examples of CF, CF tree.
• ROCK: Define/give examples of neighbor, common neighbor, link, Link, goodness measure.
• Chameleon: Define/give examples of k-nearest neighbor graph, edge cut, interconnectivity, closeness.
• DBSCAN/OPTICS: Define/give examples of directly density reachable, density reachable, density connected, core point, core distance, reachability distance.

### Association analysis

• Given a transaction database, run the Apriori algorithm. Explain the execution step by step.
• Prove the correctness of the Apriori algorithm.
• Show what the Apriori property is and how you use it.
• Given a transaction database, run the Apriori algorithm with given constraints. Explain the execution step by step.
• Given a transaction database, run the FP Growth algorithm.
• Given a transaction database, run the FP Growth algorithm with given constraints.
• Give examples of different kinds of constraints. Give an example of a convertible monotone constraint that is not monotone. Give an example of a convertible antimonotone constraint that is not antimonotone.
• Discuss how to incorporate different kind of constraints into the Apriori algorithm.
• Discuss how to incorporate different kind of constraints into the FP Growth algorithm.
• Discuss advantages and disadvantages of the FP Growth algorithm w.r.t. the Apriori algorithm.

Page responsible: Patrick Lambrix
Last updated: 2020-01-13