Continious variables involved in the following:
- Parallel coordinate plots
- Heatmaps
- Star charts
Continious variables involved in the following:
Construction:
Analysis: - Clusters - Outliers - Correlated variables
Example: Iris dataset - How many clusters do you see?
Problem formulation: Given data set \(\chi=\left( x_{ij}| i=1, \ldots, n, j=1, \ldots, p \right)\)
Note: \(p!\) possible orderings exist…
\(\min_{\Psi} {\sum_{j=1}^{p-1} d'_{j,j+1}}\)
Solution: modern approaches
Objective functions:
They based on \(\min_{\Psi} {L(\Psi(D))}\)
Optimization algorithms:
Aim: distances should increase from diagonal
\[ d_{ik}\leq d_{ij} \mbox{ for } 1\leq i<k<j\leq n \] \[ d_{kj}\leq d_{ij} \mbox{ for } 1\leq i<k<j\leq n \] Objective function:
\[ L(D)= \sum_{i<k<j}f(d_{ik}, d_{ij})+f(d_{kj}, d_{ij}) \] where \[ f(z,y)=sign(z-y) \mbox{ or } f(z,y)=z-y \]
Hamiltonian path length:
\[ L(D)=\sum_{i=1}^{n-1} d_{i,i+1} \]
Least squares criterion (PCA)
\[ L(D)= \sum_i \sum_j (d_{ij}-|i-j|)^2 \]
Partial enumeration methods
TSP solver
Hierarchical clustering
A heat map visualizes a matrix [ n x m]
Analysis:
If juxtaposed, analyse:
If superimposed,
Ordering:
Other positioning possible - PCA/MDS
Idea:
Analogy: cutting a sausage
Faceting = one more aesthetics
–> Useful tool for modeling!
facet_grid
)facet_wrap
)Example: Aids data (Age, Time of Death, Time of Diag)
Chapter 5
Paper "Hahsler, M., Hornik, K., & Buchta, C. (2008). Getting things in order: an introduction to the R package seriation. Journal of Statistical Software, 25(3), 1-34".
(Browse through) paper "Ankerst, M., Berchtold, S., & Keim, D. A. (1998, October). Similarity clustering of dimensions for an enhanced visualization of multidimensional data. In Information Visualization, 1998. Proceedings. IEEE Symposium on (pp. 52-60). IEEE."
Becker, R. A., Cleveland, W. S., & Shyu, M. J. (1996). The visual design and control of trellis display. Journal of computational and Graphical Statistics, 5(2), 123-155.