732A31 Data Mining - Clustering and Association Analysis
The first task in the second period is to prepare a presentation on a topic related to the course. You can do this alone or in group of 2. The presentation should be ca 25 minutes per student.
The following are possible topics.
- outlier analysis
- other clustering and association analysis algorithms
- data mining streams, time series, sequence data
- social network analysis
- text mining
- web mining
For most of these topics you will find a (part of a) chapter in the text book. You can also find articles by using google or looking at the journals/conferences which are mentioned at the end of the slides of the first lecture.
You can start looking at what topic you are interested in and find literature. Come and show/discuss the text/articles you will use for the presentation latest on April 12. The actual presentations will be on May 20. A draft of your presentation slides will need to be handed in to the examiner latest 1 week before your presentation so that we can book a discussion session if needed.
The total expected work time for this part is ca 80 hours per student (excluding presentations). Observe that attendance to all presentations is mandatory.
For the project we expect you to run association analysis/clustering algorithms on data sets and analyze and discuss the results. You can implement the algorithms yourself or use weka, SAS or another tool. You will be assigned a supervisor for the project.
There are a number of ways to define a project.
- Find a topic and data set that is interesting to you. Use association analysis/clustering to analyze the data in different ways and derive conclusions.
- Find an article that uses association analysis or clustering. Use the same strategies on different data sets and discuss differences and similarities.
- Find an article that uses association analysis or clustering. Use other algorithms on the same data sets and discuss differences and similarities.
- Ask a teacher if they have a clustering/association analysis problem connected to their research.
Send a proposal to the examiner latest April 29 and he will assign a supervisor to you. After you have received a supervisor, get the project approved by him or her (after discussion and possible changes in the proposal) latest May 13.
The examination of this part consists of writing a report on the project. We may ask you to come to the office and ask questions about the content of the reports. DEADLINE: June 14.
The report should be between 5 and 10 pages and contain at least the following:
- introduction - describe the area/problem, motivate why is this interesting to look at, what kind of solutions have been used before
- background - any background knowledge that is needed to read the report (e.g. domain of the application)
- algorithms - describe the algorithms that you have used, motivate why you chose these algorithms, why do you think they should give good results? (don't describe the basic algorithms that we have seen in the course, but any extensions or other algorithms should be described)
- test: describe the test data and the test set-up
- test results: describe the results
- discussion: analyze the results, compare with what was done before, did you get good/bad results?, were these results expected? discuss
- future: ideas for how to improve? other things that could be done?
Make sure to properly reference other people's work.
The total expected work time for this part is ca 130 hours per student.
Page responsible: Patrick Lambrix
Last updated: 2013-04-05