Hide menu

732A32 Data mining project

Course information


The aim of this course is that, after its completion, the student is able to

  • apply previously obtained knowledge in the field of data mining in a real setting,
  • plan, perform and report on an individual task, and
  • demonstrate insight in research and development work.
The course is worth 6 ECTS credits and, thus, the course work load corresponds to 8 weeks at half-speed. The course consists in project work. The project should be chosen in cooperation with a supervisor and, in general, it will be related to the research of the supervisor. The work is performed individually with support and guidance of a supervisor.

Projects available:
  • Statistical analysis of neuroimaging data, supervised by Mattias Villani.

    fMRI is a non-invasive technique for measuring activity in the brain. The data from an fMRI scanning session are a sequence of 3D-images of the brain taken in fairly rapid succession. It is typical to analyze the brain activity in one region of the brain at the time (a voxel). The measurements in a voxel is a time series. An alternative view of fMRI data is therefore that it is a very large collection of spatially dependent time series. In theory, the fMRI measurements follow a so called Rice distribution, but are in practice usually modeled as normally distributed variables. The aim of this project is to explore the distribution of fMRI data, to learn if the normal distribution is a reasonable model, or if the more complicated Rice distribution is needed to accurately fit fMRI data.
  • Evaluation of support vector machines for analysis of genome-wide DNA data, supervised by Patrik Waldmann.
  • Analysis of predictive power of data mining algorithms with embedded monotonicity constraints, supervised by Oleg Sysoev.
  • Implementation and evaluation of an algorithm for learning chain graphs, supervised by Jose M. Peña.

    Chain graphs are graphs with (possibly) both directed and undirected edges and they model the independence structure of a probability distribution, i.e. a missing edge in the graph represents an independence between the random variables corresponding to the nodes in the graph. There are three different interpretations of chain graphs as independence models, i.e. different researchers assign different independencies to the missing edges in a chain graph. We would like to learn the chain graph that best represents the independencies in a probability distribution, given a sample from the distribution. The project I propose consists in implementing and evaluating the algorithm in this or this paper. A warning may be in place: The papers mentioned are rather theoretical but you don't have to understand all in them before you can start implementing the algorithms. As you move forward in the project, you will be able to understand more and more.


Page responsible: José M Pena
Last updated: 2012-11-21