Hide menu

Big Data analytics, vt 2017
(PhD student course, 6hp)


The exponential increase in computational power and storage capacity over the last decades, combined with progress in data science, has facilitated a gigantic leap in the digital revolution. Today, data-driven and big-data methods have far reaching applications throughout society, including for identifying new important materials, for predicting and understanding environmental effects, and for solving crimes and keeping our society safe. In this course we introduce the notion of Big Data and study how we can store, manage, query and analyze this kind of data.


After the completion of the course you should be able to:
  • collect and store Big Data in a distributed computer environment
  • perform basic queries to a database operating on a distributed file system
  • account for basic principles of parallel computations
  • use MapReduce concept to parallelize common data processing algorithms
  • account for how standard machine learning models should be modified in order to process Big Data
  • use tools for machine learning for Big Data


The course introduces main concepts and tools for storing, processing and analyzing Big Data which are necessary for professional work and research in data analytics.
  • Introduction to Big Data: concepts and tools
  • Basic principles of parallel computing
  • Introduction to databases
  • File systems and databases for Big Data
  • Querying Big Data
  • Resource management in a cluster environment
  • Parallelizing computations for Big Data
  • Basic Machine Learning algorithms
  • Machine Learning for Big Data


Week 12 (March 20-24), 2017.

Page responsible: BDA
Last updated: 2017-01-30