Hide menu

The LiU Seminar Series in Statistics and Mathematical Statistics



Tuesday, September 5, 3.15 pm, 2023. Seminar in Statistics.

Zero-sum intra-cluster correlations—why it arises and some ways of handling it
Rojan Karakaya
, Institute for Future Studies, Stockholm
Abstract: This seminar will be divided into two parts. In the first, I will give some advice to new students of the Statistics and Machine Learning MS program as an alumnus myself. In the second part I will discuss some aspects of my work as a research engineer at the Institute for Futures Studies related to methodological issues that arise in zero-sum competitions.
Many social situations involve an element of competition for limited resources. If the amount of resources are fixed then the autocovariance matrix of the outcome can be shown to have, for diagonal values v, off-diagonal values -v/(n-1). This matrix is not invertible, which in turn prevents us from using otherwise well-established methods of handling autocorrelation such as GLS. For data with intra-cluster fixed-sum covariance, one also can not use random effects to handle intra-cluster covariance as these can only model positive correlations.
In this seminar I will present two methods for handling such zero-sum situations: one for continuous outcomes that is an OLS regression performed on a particular type of row reduction transformation of the design matrix, and one for discrete outcomes that is based on the multinomial logit model. The use of these methods will be illustrated with two examples from the fields of labour market sociology (specifically labour market discrimination) and animal behaviour (specifically parental allocation of resources).
Seminar slides
Location: Ada Lovelace (CHANGE OF ROOM!).

Tuesday, September 12, 3.15 pm, 2023. Seminar in Statistics.

The evolutionary process balancing immune selection during tumour growth
Eszter Lakatos
, Applied Mathematics and Statistics, University of Gothenburg
Abstract: A growing tumour is an inherently stochastic evolving system: during consecutive divisions, cancer cells randomly acquire mutations that may provide them with a beneficial phenotype. However, these genetic alterations can also be a disadvantage: they can give rise to neoantigens, cancer-specific peptides presented on the cell surface that help the immune system recognise cancer cells. This process is particularly aggravated in hyper-mutated tumours, which gain mutations and neoantigens at a much higher pace.
In this work, we construct a stochastic branching process-based model of tumour evolution, that accounts for neoantigen acquisition and advantageous mutations. Using the principle that selective pressures acting on cells influence the number of off-spring sharing the genotype, we establish the characteristic variant allele frequency (VAF) distribution expected under immune selection. We explore the limitations of sequencing technologies and unveil the mechanism through which hyper-mutated tumours and pre-cancerous tissue balance the accumulation of advantageous and immunogenic mutations.
Location: Alan Turing.

Tuesday, October 10, 3.15 pm, 2023. Seminar in Statistics.

Optimal subsampling designs
Henrik Imberg
, Applied Mathematics and Statistics, University of Gothenburg
Abstract: We consider the problem of optimal subsample selection in an experiment setting where observing, or utilising, the full dataset for statistical analysis is practically unfeasible. This may be due to, e.g., computational or economic cost-constraints. As a result, statistical analyses must be restricted to a subset of data. Choosing this subset in a manner that captures as much information as possible is essential. Existing subsampling methods are often limited in scope and use optimality criteria (e.g., A-optimality) with well-known deficiencies, such as lack of invariance to the measurement-scale of the data and parameterisation of the model.
We present a theory of optimal design for general data subsampling problems, including finite population inference, parametric density estimation, and regression modelling using generalised linear models or quasi-likelihood methods. Our theory encompasses and generalises most existing methods in the field of optimal subdata selection based on unequal probability sampling and inverse probability weighting. We derive optimality conditions and optimal sampling schemes for a general class of optimality criteria under Poisson and multinomial sampling designs. We also study optimal design from an expected-distance-minimising perspective. This naturally leads us to a novel class of linear optimality criteria with good theoretical and practical properties, including computational tractability and invariance under non-singular affine transformations of the data and under a re-parameterisation of the model. We discuss the use of sequential optimal design for the implementation of optimal subsampling methods in practice. An active sampling strategy is proposed that iterates between estimation and data collection with optimal subsamples, guided by machine learning predictions on yet unseen data. The methodology is illustrated on an application in the vehicle safety domain.
Seminar slides
Location: Alan Turing.

Tuesday, November 14, 3.15 pm, 2023. Seminar in Statistics.

Deep learning-based estimation of time-dependent parameters in Markov models with application to SDEs - theoretical foundations
Martyna Wiącek
, AGH University of Kraków
Abstract: Estimation of parameters in SDE-based models is a complex problem that has important practical applications in many fields, for example, in finances, energy prices, or consumption forecasting. Estimating time-dependent parameters is a challenge due to the multitude of estimated values, and the most common approach is simplification using piecewise-constant functions.
We propose a novel method for estimating time-dependent parameters that is based on neural networks and which extends the approach known from literature for the regression in the heteroscedasticity case. The main idea of the algorithm is to define a suitable loss function based on the maximum likelihood approach, which enables us to translate our approximation task into an optimization problem. Such an approach enables us to use deep learning techniques and software.
In this seminar, we present theoretical results in the SDEs case - we prove that under certain conditions, the solution process of the underlying SDE with the actual parameter function is close to the SDE with the parameter function estimated by the trained neural network.
Seminar slides
Location: Alan Turing.

Tuesday, November 21, 3.15 pm, 2023. Seminar in Statistics.

Topological data analysis—methods and perspectives
Paweł Dłotko
, Dioscuri Centre in Topological Data Analysis, Institute of Mathematics of the Polish Academy of Sciences
Abstract: In this presentation, I will provide an introduction to topological data analysis, emphasizing its practical applications. I will cover fundamental concepts such as persistent homology and mapper, showcasing their standard applications. Additionally, I will offer a brief overview of new tools, with a specific focus on their relevance to biology and phylogenetics.
Seminar slides
Location: Alan Turing.

Tuesday, December 5, 3.15 pm, 2023. Seminar in Statistics.

Towards Deep Learning-based Fetal Birth Weight Estimation from Imaging and Tabular Data
Michał Grzeszczyk
, Sanoscience, Cracow
Abstract: Many clinical procedures involve collecting data samples in the form of imaging and tabular data. To enhance predictive capabilities, novel deep learning architectures are emerging, aiming to fuse information from both sources of information. In this seminar, we will focus on the integration of imaging and tabular data in Convolutional Neural Networks. As an example of an application, we will use a fetal birth weight prediction task. It is a challenging task requiring clinicians to collect ultrasound videos of fetal body parts and fetal biometry measurements. The predicted weight is the indicator of perinatal health prognosis or complications in pregnancy and has an impact on the method of delivery. Firstly, we will explore the feasibility of fetal birth weight prediction solely from imaging data, employing hybrid architectures that combine CNNs with Transformer-based models. Subsequently, we will focus on the refinement of predictions by incorporating the attention mechanism, computed with the assistance of tabular data.
In the latter part of the seminar, Sano - Centre for Computational Personalised Medicine will be presented. Situated in Cracow, Poland, Sano is an International Research Foundation operating as a non-profit research institute. It is dedicated to the advancement of computational medicine, developing sophisticated computer methods for the prevention, diagnosis and treatment of disease, to meet the overarching worldwide need for efficient, effective and streamlined healthcare.
Location: Alan Turing.


Page responsible: Krzysztof Bartoszek
Last updated: 2023-12-30