732A94 Advanced Programming in R
Course information
Autumn 2024
This is the coursepage for the course in Advanced R programming.
The first course occasion will be a lecture in A32 on Thursday 2024-08-05 13:15-17!
Course Content
The course introduces general programming techniques and their practical implementation in the R language. More specifically, the course includes:
- reading data from file, from the internet, and printing to output,
- data structures, functions and objects,
- iteration and conditional statements,
- numerical linear algebra in R,
- debugging,
- object-oriented programming,
- performance enhancement,
- parallel programming,
- literate programming,
- test-driven development,
- reactive programming,
- development of R packages.
Course structure
The course contains 7 lectures, 3 computer labs, 1 exercise session and 4 seminars.
The two first weeks will focus on basic R programming and syntax and students will work individually.
The last five weeks will focus more on advanced concepts and students will then work in groups of two and two.
Lectures, computer labs, seminars and the exercise session will be on campus.
The best thing is that the students work with their own computer. See below how to install R, git and R-Studio.
The course contains three teaching activities:
- Lecture (Fö) Introduction of new concepts.
- Computer lab (DATALAB) Individual computer lab with individual help.
- Exercise session (Lektion) Presentation of solutions to computational complexity exercises.
- Seminar (SE) Each student group will present their current work and we will discuss eventual problems.
The following content will be presented each week:
Lecture | Material |
---|---|
1 | Data structures, subsetting and intro to functions |
2 | Program control, functions and R packages |
3 | Performant code: Writing R packages, roxygen, testthat, git/github |
4 | Linear algebra, vignettes & rmarkdown, graphics and object-orientation in R |
5 | Advanced Input/Output (I/O) |
6 | Performant code: Computational complexity, writing fast code, parallelism and handling big data |
Course literature
The art of R programming by Norman Matloff (NM1). The book can be found online here.
Introduction to ggplot2 by Norman Matloff (NM2). The pdf can be found here.
Advanced R by Hadley Wickham (HW1). An online version can be found here.
R Packages by Hadley Wickham (HW2). An online version can be found here.
Efficient R Programming by Colin Gillespie and Robin Lovelace (CGRL). An online version can be found here.
Mastering Shiny by Hadley Wickham (HW10). An online version can be found here.
Statistics w R by Karol Flisikowski (KF1). An online version can be found here.
R for Data Science by Garrett Grolemund and Hadley Wickham (GGHW). An online version can be found here.
R Programming and Development From Basics to Advanced Topics by Emmanuel Paradis (EP). An online version can be found here.
Pro Git by Scott Chacon and Ben Straub (SCBS). An online version can be found here.
Best practices for scientific computing by Greg Wilson et. al. (GW) The article can be found here.
Semantic versioning (SV) rules found here
Testthat: Get started with testthat (HW3) by Hadley Wickham that can be found here.
httr2: (HW4) by Hadley Wickham that can be found here.
Databases (HW5) by Hadley Wickham that can be found here.
Persistent data storage (DA) by Dean Attalli that can be found here.
HTTP: the protocol (PP) by Panwan Podila that can be found here.
An introduction to API:s (BC) by Brian Cooksley here
Best practices for writing an API package by Hadley Wickham (HW6) here. Download package version 1.4.6 and open the vignettes directory. Check compatibility with httr2.
rvest: Easy websraping with R (RS1) by R Studio that can be found here.
Shiny tutorial (RS2) by R Studio that can be found here.
memoise (HW7) by Hadley Wickham that can be found here.
Introduction to dplyr (HW8) by Hadley Wickham that can be found here.
Big Oh notation (BO) can be found here
Tidy data (HW9) by Hadley Wickham can be found here
Data wrangling cheet sheet (DWCS) here
A short introduction to the caret package (MK) by Max Kuhn can be found here
The elements of statistical learning (HTF) by Trevor Hastie, Robert Tibshirani and Jerome Friedman here
Required reading
Below are the required reading for each week.
- Data structures, subsetting and intro to functions
- NM1: Chap: 1, 2.2, 2.3, 3, 4.1-4.4, 4.6, 4.7, 4.9, 5.1 - 5.6, 6.1, 6.2, 6.4 - 6.6, 7, 9.1, 10.1, 10.3, 10.7
- HW1: Chap: Introduction, Data structures, Subsetting
- Program control, functions, basic I/O and R packages
- NM1: Chap: 8, 9.1 - 9.6, 11.3
- HW1: Chap: Functions, Environment, Functional programming, Functionals
- Performant code I
- GW: Whole article
- SCBS: Ch. 1.1 - 1.3
- HW1: Exceptions and Debugging
- HW2: Introduction, Package structure, Package components, Code, Data, Package metadata, Object documentation, Testing, Namespaces, Checking, Git and GitHub
- HW3: Whole page
- SV: Whole page
- Linear algebra, dynamic reporting, graphics and object orientation
- NM1: Chap: 10.4, 12 (not 12.1)
- NM2: Full article
- HW1: Chap: OO field guide
- HW2: Chap: Vignettes
- Input and output
- NM1: Chap: 11
- HW4: Whole page
- HW5: Read through, but the dplyr package will be discussed in week 6.
- DA: Whole page
- PP: HTTP Basics, URLs, Verbs, Status codes, Summary
- BC: Chapter 1-3 (Ch. 4-5 can be relevant for some projects)
- HW6: Up to Authenticating, the rest can be skipped.
- RS1: Whole page
- RS2: Lesson 1-7
- Performant code II
- NM1: Chap: 4.10, 15, 17
- HW1: Chap: Performance, Profiling, Memory, Rcpp, Functionals, Function operators
- HW7: Front page
- BO: All pages
Extra reading material
- A beginning with R from RStudio tutorial can be found here.
- More on functionals from Data Camp can be found here.
- More on environments can be found here
- More on how to install git and make it work with R-Studio here and here.
- More on HTTP and web crawling in this book.
- More on building a Shiny application here.
- The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!) here
Extra video material
- Google developers R videos (GD) here
- Introduction to R videos by Roger Peng (RP) here
- R markdown (Rmd) here
- R-Studio and github here and here.
- Debugging in R here
- Extra material on program control in R here
- Short video on Dijkstras algorithm here
- Introduction to object oriented programming here and more extensive here.
- A series of 5 videos on good coding practices here
Reference cards
Examples
- ggplot2 examples here
Software
The students are suggested to work using their own computers. For this course the following software is needed. Everything is open source and free.
Information on how to install R and R-Studio: - Windows, - Mac - Linux/Ubuntu
Assignments
The course contains 6 assignments that all are mandatory.
Assignment 1 and 2 will be turned in as an R script file. All assignments needs to be fully correct to get a pass. To test your assignment the R package markmyassignment
can be used.
Assignment 3 to 6 will be fully implemented R packages that should be possible to install directly from github.com. To turn in the package just send in the github adress to the package. Read the particular instructions in each lab for what is needed to pass.
All labs and seminar assignments have a deadline. But please observe that you will present your assignment on a seminar before the deadline has passed! This means that sometimes, you will not be completely finished with the assignment on the seminar, but that is ok.
In order to pass the labs each group has to present at least once during the seminars. In order to pass, each group has to also present at the seminars when requested to. Seminar attendance is obligatory.
Late submissions will result in penalty assignments!
NO late submissions, corrections NOR resubmissions will be allowed for the Bonus lab!
Submission is done through LISAM or via email to lab assistants.
ALL will be CHECKED through URKUND for plagarism (also with respect to past labs)!
Assignments 3 to 6 and the bonus one are done in groups. All group members have to contribute to, understand and be able to explain all aspects of the work. In case some member(s) of a group do not contribute equally this has to be reported and in this situation a formal group work contract will be signed, stipulating the consequences for further unequal contributions.
For all labs there is an additional deadline of 24 November 2024 by which corrections to labs should be submitted. If after this further corrections will be required there is a FINAL deadline of 22 December 2024. After this date NO corrections NOR hand-ins will be accepted.
Assignment no. | Instructions | Material presented | Seminar presentation | Deadline |
---|---|---|---|---|
1 | 2024-09-05 | 2024-09-06 (Lab, no seminar) | 2024-09-15 | |
2 | 2024-09-05 | 2024-09-11 (Lab, no seminar) | 2024-09-22 | |
3 | 2024-09-13 | 2024-09-25 | 2024-09-29 | |
Tips for Lab 3 | Lab session 2024-09-20 | |||
4 | 2024-09-27 | 2024-10-02 | 2024-10-06 | |
QR material is here | LiU's graphical profile (in Swedish) | |||
5 | 2024-10-03 | 2024-10-09 | 2024-10-13 | |
API examples | ||||
6 | 2024-10-10,11 | 2023-10-16 | 2024-10-20 | |
Bonus | Materials | 2024-11-05 (NO resubmission) | ||
(NO late submission) | ||||
Lectures
Lecture slides for 2024 can be found below:
Other additional materials for lectures and labs can also be found here.
Lecture slides from 2023 can be found below:
Lecture slides from 2022 can be found below:
Lecture slides for 2021 can be found below:
Lecture slides for 2020 can be found below:
Lecture slides from 2017 (which 2021 slides are based on) can be found below:
Lecture | Slides | |
---|---|---|
L1 | Lecture 1 | |
L2 | Lecture 2 | R code for slide 31 |
L3 | Lecture 3 | |
L4 | Lecture 4 | |
L5 | Lecture 5 | |
L6 | Lecture 6 | |
L7 | Lecture 7 |
Lecture slides from 2016 (which 2017 slides are based on) can be found below:
Lecture | Slides |
---|---|
L1 | Lecture 1 |
L2 | Lecture 2 |
L3 | Lecture 3 |
L4 | Lecture 4 |
L5 | Lecture 5 |
L6 | Lecture 6 |
L7 | Lecture 7 |
Lecture slides from 2015 (which 2016 slides are based on) can be found below:
Lecture | Slides |
---|---|
L1 | Slides 2015 |
L2 | Slides 2015 |
L3 | Slides 2015 |
L4 | Slides 2015 |
L5 | Slides 2015 |
L6 | Slides 2015 |
L7 | Slides 2015 |
Excersise sessions
Active participation in the exercise sessions gives maximum 2 bonus points towards the exam. Active participation means that a student comes prepared to the seminar session with all the given day's exercises (please hand-in your solutions to me at the beginning of the session; or if you did not solve them on paper, e-mail them to me before the session), correctly solves an exercise on the board, is able to answer questions about the presented solution and is able to give help and comments to the classmates' presented solutions. In the sessions, for each exercise a student will be selected (how depends on the number of students) to present a solution.
Physical presence at the excersise sessions is a necessary condition to obtain the bonus points.
This is the same system as for the Statistical Methods bonus exercise sessions. The exercises are taken from:[KG] Giaro, K., 2011, Exercises in Computational Complexity of Algorithms (Zlożoność obliczeniowa algorytmów w zadaniach, in Polish). Published by The Prof. Tadeusz Kotarbiński University of Informatics and Management, Olsztyn, Gdańsk.
[MK] Kubale, M., 1999, A Gentle Introduction to the Analysis of Algorithms (Łlagodne wprowadzenie do analizy algorytmów, in Polish). Published by the Gdańsk University of Technology, Gdańsk.
Thursday October 17 13:15-15 in A31
2 possible bonus points.
Computational complexity excerises for the session can be found under this link.They are: KG: Problems 1.1, 1.2, 4.1, 4.3;
KM: Problems 1.8, 1.10, 1.15, 1.17, 2.6, 2.7, 2.10, 2.13, 2.11, 2.12;
Exercise 12
Consider the following matrix
Propose an algorithm that calculates the sum of the lower triangular (including diagonal) elements of the matrix (i.e., those in bold) using
(a) Θ(n2) summations
(b) Θ(n) summations
(c) Θ(1) summations
Exercise 13
What is the computational complexity of the bewlo code snippets and what do they return?
(a)
procedure FX1(n)
if n==1 then return 1
else return FX1(n-1)+FX1(n-1)
end if
end procedure
(b)
procedure FX2(n)
if n==1 then return 1
else return 2×FX2(n-1)
end if
end procedure
(c)
procedure FX3(n)
if n==1 then return 1
else return FX3(n-1)
end if
end procedure
(d)
procedure FX4(n)
if n==1 then return 1
else return FX4(n-1)+1
end if
end procedure
Computer exam
The exam will be a computer exam on 2024-10-29
The material that will be included with the exam can be downloaded here.
Previous exams can be found here.
Exams from the basic course in R programming can be found here.
Four bonus points can be obtained in total from the Computational Complexity exercise session (2 points) and Bonus Lab (2 points).
Staff
- Krzysztof Bartoszek, lecturer
- Woodrow Hao Chi Kiang, lecturer
- Krzysztof Bartoszek, examiner
- Bayu Brahmantio, teaching assistants
Page responsible: Krzysztof Bartoszek
Last updated: 2024-07-17