# The LiU Seminar Series in Statistics and Mathematical Statistics

Spring 2014

### Tuesday, January 28, 3.15 pm, 2014. Seminar in Statistics

**Regularized multiple regression: application in genome-wide association studies**

Patrik Waldmann, Statistics, LiU.

Patrik Waldmann

*Abstract*: One common characteristic for the new genomic techniques is that data sets are very large, often with many more variables than observations (p >> n). The goal of genome-wide association studies (GWAS) is to identify the best subset of single-nucleotide polymorphisms (SNPs) that strongly influence a certain trait, for example a disease or production trait. State of the art GWAS comprise several thousand or even millions of SNPs, scored on a number of individuals on the order of a few thousands.

Most of the GWAS carried out previously are single-SNP studies where each SNP is tested individually for its association to the phenotype. This presentation will show that regularized multiple regression provides an attractive alternative. Both classical and Bayesian methods will be presented and compared. Moreover, the effect of intricate correlation patterns between SNPs on the methods will be evaluated.

Location: Alan Turing

### Tuesday, February 11, 3.15 pm, 2014. Seminar in Mathematical Statistics

**Large deviations for Markov bridges**

Xiangfeng Yang, Mathematical Statistics, LiU.

Xiangfeng Yang

*Abstract*: Markov bridges have numerous applications, such as the use of Brownian bridges in the Kolmogorov-Smirnov test in the area of statistical inference. In this talk I will propose a method to study large deviations for suitable Markov bridges, including Brownian bridges, LÃ©vy bridges, Bernstein bridges, etc. The main ingredient of the method is to employ an equivalent form of large deviations and consider the associated compact level sets instead of closed sets.

Location: Kompakta rummet.

### Tuesday, March 11, 3.15 pm, 2014. Seminar in Mathematical Statistics

**Efficient estimation of the number of false positives in high-throughput screening**

Holger Rootzén, Mathematical Statistics, Chalmers.

Holger Rootzén

*Abstract*: This talk is about tail estimation methods to handle false positives in very highly multiple testing problems where testing is done at extreme significance levels and with low degrees of freedom, and where the true null distribution may differ from the theoretical one. We show that the number of false positives, conditional on the total number of positives, approximately has a binomial distribution, and find estimators of its parameter. We also develop methods for estimation of the true null distribution, and techniques to compare it with the theoretical one. Analysis is based on a simple polynomial model for the tail of the distribution of p-values. Asymptotics which motivate the model, properties of the parameter estimators, and model checking tools are provided. The methods are applied to two large genomic studies and an fMRI brain scan experiment.

Location: Kompakta rummet.

### Tuesday, April 8, 3.15 pm, 2014. Seminar in Mathematical Statistics

**Comparison of asymptotic variances of inhomogeneous Markov chains with applications to Markov Chain Monte Carlo methods**

Jimmy Olsson, Mathematical Statistics, KTH.

Jimmy Olsson

*Abstract*: In this talk we will discuss the asymptotic variance of sample path averages for inhomogeneous Markov chains evolving alternatingly according to two different pi-reversible Markov transition kernels. More specifically, we define a partial ordering over the pairs of pi-reversible Markov kernels that allows us to compare directly the asymptotic variances for the inhomogeneous Markov chains associated with each pair. As an important application we use our result for comparing different data-augmentation-type Metropolis Hastings algorithms. In particular, we compare some pseudo-marginal algorithms and propose a novel exact algorithm, referred to as the random refreshment algorithm, which is more efficient, in terms of asymptotic variance, than the Grouped Independence Metropolis Hastings algorithm and has a computational complexity that does not exceed that of the Monte Carlo Within Metropolis algorithm.

Location: Kompakta rummet.

### Tuesday, April 22, 3.15 pm, 2014. Seminar in Statistics

**Merging longitudinal datasets from studies in cognition and brain imaging**

Anders Lundquist, Statistics, Umeå University.

Anders Lundquist

*Abstract*: Human cognitive abilities are interesting to investigate from many different aspects, e.g. developmental trajectories during childhood as well as decline during aging, possibly with pathological components such as dementia. Currently, no single longitudinal dataset covering the human life span exist. However, by merging two available longitudinal datasets, coming from the Brainchild and Betula studies respectively, we are able to cover an age span of 6-85 years. The primary objective in this talk will be modelling the association between episodic memory performance and age, which is non-linear across the lifespan. We therefore use Generalized Additive Mixed Models (GAMM:s), permitting the memory performance to be a smooth function of age. As this is an ongoing project, results are preliminary, and time will be devoted for discussing alternative modelling strategies as well as other issues arising from joining data sets such as these.

Location: Alan Turing

### Tuesday, May 6, 3.15 pm, 2014. Seminar in Mathematical Statistics

**Using Stein Couplings for the Study of Fringe Trees**

Cecilia Holmgren, Mathematical Statistics, SU

Cecilia Holmgren

*Abstract*: The binary search tree (in computational science known as Quicksort, the most used of all sorting algorithms) and the random recursive tree are important examples of random trees. We have examined fringe trees ("small" subtrees) in these two types of random trees. The use of certain couplings based on Stein's method allow provision of simple proofs showing that in both of these trees, the number of fringe trees of size k, where k tends to infinity, converges to a Poisson distribution. Furthermore, combining these results and another version of Stein's method, we can also show that for k=o(sqrt{n}) (where n is the size of the whole tree) the number of fringe trees in both types of random trees converges to a normal distribution. We can then use these general results on fringe trees to obtain simple solutions to a broad range of problems relating to random trees; as an example, we obtain a simple proof showing that the number of protected nodes in the binary search tree has a normal distribution. (Joint work with Svante Janson, Uppsala University)

Location: Hopningspunkten.

### Tuesday, May 20, 3.15 pm, 2014. Seminar in Statistics

**Variational Inference in Factorized Latent Variable Models**

Carl Henrik Ek, Computer Science, KTH.

Carl Henrik Ek

*Abstract*: In this talk I will discuss variational inference and its application to latent variable models for multi-view learning. We will start with a brief introduction to the topic and then proceed to introduce recent developments in the field. Given this as a background we will introduce factorized latent variable models for multi-view data and explain how variational methods are essential to make inference feasible in these model. We will show experiments on models based on both Dirichlet and Gaussian Process priors.

Location: Alan Turing

### Tuesday, May 27, 3.15 pm, 2014. Seminar in Mathematical Statistics

**Testing of multivariate data with block compound symmetry covariance structure**

Daniel Klein, P.J. Safarik University, Slovakia

Daniel Klein

*Abstract*: It is well-known that Hotelling's T2 test is the conventional method to test the equality of mean vectors in two populations. However, Hotelling's T2 statistic is based on the unbiased estimate of the unstructured variance-covariance matrix. Nevertheless, the variance-covariance matrix may have some structure, and one should use an unbiased estimate of that structure to test the equality of mean vectors. A natural extension of the Hotelling's T2 statistic, called the Block T2 statistic, is obtained for doubly multivariate data for q response variables at p time points in block compound symmetric covariance matrix setting. The minimum sample size needed for this test is only q +1, unlike pq +1 in Hotelling's T2 test.

Location: Hopningspunkten.

Page responsible: Mattias Villani

Last updated: 2014-05-25