Oleg Sysoev

My current research is devoted to the development of statistical machine learning and data mining models for different applications, primarily in medicine, public health and in telecommunications. The methods proposed in my publications are mostly based on decision trees, deep learning and statistical uncertainty evaluation methods. I have been also doing research on large-scale data modelling, in particular monotonic regression models for large and multivariate data.

My journal publications

(2025) Agholme A, Ahtola K, Toll E, Carlhäll CJ, Henriksson P, Kechagias S, Lundberg P, Nasr P, Sysoev O, Wijkman M, Ekstedt M, Ulander M, Iredahl F. Clinically available predictors of obstructive sleep apnoea requiring treatment in type 2 diabetes patients in primary care. Scientific Reports volume 15

We use classical statistical methods, such as hypothesis testing and logistic regression to discover risk factors of obstructive sleep apnoea for patients having diabetes

(2025) Mahmud F, Mansour Aly D, Zhao Y, Benson M, Smelik M, Sysoev O, Wang H, Li X. Proteogenomic analysis reveals Arp 2/3 complex as a common molecular mechanism in high risk pancreatic cysts and pancreatic cancer. Scientific Reports volume 15

We investigate genetic variants associated wih pancreatic cysts, and we discover that Arp2/3 complex-associated genes can serve as potential biomarkers for predicting the malignant transformation of pancreatic cyst

(2024) Smelik M, Zhao Y, Mansour Aly D, Mahmud F, Sysoev O, Li X and Benson M. Multiomics biomarkers were not superior to clinical variables for pan-cancer screening. Communications Medicine volume

Here we construct a machine learning method based on multi-omics and clinical variables and UK Biobank to predict the disease state and we discover that omics were not superior to the clinical variables

(2024) Smelik M, Zhao Y, Li X, Loscalzo J, Sysoev O, Mahmud F, Mansour Aly D, Benson M. An interactive atlas of genomic, proteomic, and metabolomic biomarkers promotes the potential of proteins to predict complex diseases.Scientific Reports volume 14

Here we construct a machine learning method based on multi-omics and UK Biobank to predict the disease state and discover which kind of omics is most informative for prediction

(2024) Schäfer S, Smelik M, Sysoev O, Zhao Y, Eklund D, Lilja S, Gustafsson M, Heyn H, Julia A, Kovács IA, Loscalzo J, Marsal S, Zhang H, Li X, Gawel D, Wang H, Benson M. scDrugPrio: A framework for the analysis of single-cell transcriptomics to address multiple problems in precision medicine in immune-mediated inflammatory diseases. Genome Med 2024 (in press)

We present a novel method that ranks a set of user specified drugs based on user specified single cell data from one or more sick individuals and knowledge about the regulatory networks and the targets of the drugs in these networks. The most efficient drug is ranked first, and least efficient drug is ranked last. The efficiency of the method is demonstrated by in vitro (in lab) and in vivo (mouse) experiments.

(2023) Lilja S., Li X., Lee EJ., Loscalzo J., Marthanda PB., Hu L.., Magnusson M., Sysoev O., Zhang H., Zhao Y., Sjövall C., Gawel D., Wang H., Benson M. Multi-organ single-cell analysis reveals an on/off switch system with potential for personalized treatment of immunological diseases. Cell Reports Medicine DOI:10.1016/j.xcrm.2023.100956

Multicellular Disease Model of multiple tissues reveals that inflammations switch on/off based on altered balance between pro- and anti-inflammatory upstream regulators and downstream pathways

(2022) Li X., Lee EJ, Lilja S., Loscalzo J., Schäfer S., Smelik M., Strobl M.R., Sysoev O., Wang H., Zhang H., Zhao Y., Gawel D.R., Bohle B., Benson, M. A dynamic single cell-based framework for digital twins to prioritize disease genes and drug targets. Genome Medicine 14:48

Here we propose a new approach based on Multicellular Disease Models to detect early biomarkers of the disease. The approach employs longitudinal single cell data for patients and controls.

(2022) Svahn C., Sysoev. O. Selective Imputation of Covariates in High Dimensional Censored Data. Journal of Computational and Graphical Statistics, 31:4, 1397-1405. DOI:10.1080/10618600.2022.2035233

Do you have data with many features where some or all features are observed subject to detection limits, for example some features are not observed below some threshold? This publication may help you to build your prediction model more efficiently! The method is based on a combination of an 'improper' multiple imputation approach and subspace k-Nearest Neighbors.

(2021) Pérez W., Selling K.E, Blandón E.Z, Peña R., Contreras M, Persson LA, Sysoev O., Källestål C. Trends and factors related to adolescent pregnancies: an incidence trend and conditional inference trees analysis of northern Nicaragua demographic surveillance data. BMC Pregnancy Childbirth 21, 749

This paper uses conditional inference tree framework to build an interpretable machine learning model of adolescent pregnancy based on the data from the Northern Nicaragua.

(2020) Lee EJ, Gawel D, Lilja S, Li X, Schäfer S, Sysoev O, Zhang H and Benson M. . Analysis of expression profiling data suggests explanation for difficulties in finding biomarkers for nasal polyps. Rhinology 58(4) pp. 360-367

In this paper we search for biomarkers of nasal polyps by studying regulatory mechanisms. It is concluded that there are multiple components, and combinations of biomarkers is needed for successful diagnostics

(2020) Björnsson B, Borrebaeck C, Elander N, Gasslander T, Gawel DR, Gustafsson M, Jörnsten R, Lee EJ, Li X, Lilja S, Martínez-Enguita D, Matussek A, Sandström P, Schäfer S, Stenmarker M, Sun XF, Sysoev O, Zhang H, Benson M. Digital twins to personalize medicine. Genome Med 12, 4, doi: 10.1186/s13073-019-0701-3

This paper describes a generic framework called Digital Twins that can be used to create "digital copies" of any individual patient, treat the person computationally with thousands of different drugs, and then select a treatment that would be optimal for that particular patient. Possible ways to implement this generic framework are explained.

(2020) Källestål C., Blandón E.Z., Peña R., Peréz W., Contreras M., Persson L.Å., Sysoev O. Selling, K.E.: Assessing the Multiple Dimensions of Poverty. Data Mining Approaches to the 2004–14 Health and Demographic Surveillance System in Cuatro Santos, Nicaragua. Frontiers in Public Health vol 7, pp 409. doi: 10.1186/s12939-019-1054-7

This is an applied study where k-means algorithm is applied to discover clusters of various variables associated with poverty. Descriptive analyses are done to characterize the obtained clusters.

(2019) Källestål C., Blandón E.Z., Peña R., Peréz W., Contreras M., Persson L.Å., Sysoev O. Selling, K.E.: Predicting poverty. Data mining approaches to the health and demographic surveillance system in Cuatro Santos, Nicaragua. International Journal for Equity in Health vol 18, no 165. doi: 10.3389/fpubh.2019.00409

In this applied study, conditional inference trees were used to build a decision tree model that predicts poverty status from multiple predictors, such as education, emigration and food consumption.

(2019) Svefors P., Sysoev O., Ekström E.C., Person L.Å., El Arifeen S., Naved R., Rahman A., Islam Khan A., Ekholm Selling K.: Relative importance of prenatal and postnatal determinants of stunting: data mining approaches to the MINIMat cohort, Bangladesh. BMJ Open vol 9:e025154. doi: 10.1136/bmjopen-2018-025154

This is an applied study where conditional inference random forests are applied to identify the most important risk factors of child stunting. In addition, conditional inference trees are used to build interpretable predictive models.

(2019) Sysoev. O, Bartoszek K., Ekstrom EC and Ekholm Selling K. PSICA: decision trees for probabilistic subgroup identification with categorical treatments. Statistics in Medicine, pp. 1– 17. doi : 10.1002/sim.8308

Do you have data in which multiple treatments or interventions were applied to a group of individuals and you want to discover subgroups that share similar characteristics and that also benefit from the same kinds of treatments? A novel interpretable machine learning (decision tree) framework is proposed in this paper to discover such subgroups and estimate the probabilities of certain treatments to be the best ones in these subgroups.

(2018) Sysoev, O., Burdakov, O. A smoothed monotonic regression via l2 regularization. Knowledge and Information Systems, 1-22.
(2017) Burdakov, O., Sysoev, O. A Dual Active-Set Algorithm for Regularized Monotonic Regression. Journal of Optimization Theory and Applications 172.3: 929-949.

These two publications introduce a new regression method to estimate the target variable as a smooth and monotonically increasing function of the predictor. The probabilistic model and hyperparameter selection strategy is presented in 2018 paper, while some additional mathematical proofs regarding complexity estimates are given in 2017 paper.

(2016) Kalish, M.L, Dunn J.C., Burdakov O. and Sysoev O.: A statistical test of the equality of latent orders Journal of mathematical psychology, vol 70, pp 1-11.

Do you have two observed features that correspond to two latent (unobserved) features and you want to find out whether the latent features are monotonically related to each other? The hypothesis testing framework developed in this paper can help you! The method is based on a modification of the monotonic regression approach and bootstrap methods.

(2015) Sysoev, O., Grimvall, A., and Burdakov, O..: Bootstrap confidence intervals for large-scale multivariate monotonic regression problems. Statistics-Simulation and Computation, pp 1-16.

The paper introduces new methods to compute confidence intervals for monotonic regression in a multivariate setting. The methods are based on bootstrap and can handle large data sets and focus on inference of the expected target.

(2013) Sysoev, O., Grimvall, A., and Burdakov, O.: Bootstrap estimation of the variance of the error term in monotonic regression models. Journal of Statistical Computation and Simulation 83.4 : pp 627-640.

If AIC criterion is used for model selection, it is important to be able to estimate variance of the error in the regression model. This work introduces a variance estimator for monotonic regression models, and this estimator can be used for example for feature selection in monotonic regression models. The estimator has been demonstrated to have good finite sample properties.

(2011) Sysoev, O., Burdakov, O., Grimvall, A.: A Segmentation-Based Algorithm for Large-Scale Monotonic Regression Problems. Computational Statistics and Data Analysis 55, pp. 2463-2476

A new method for solving large-scale monotonic regression problems is proposed here. The approach acts by splitting the data into smaller segments, solving regression problem in these segments, and then modifying these local solutions in a special manner into a global monotonic prediction.

(2006) Burdakov, A. Grimvall and O. Sysoev. Data preordering in generalized PAV algorithm for monotonic regression. Journal of Computational Mathematics. 24, No. 6, pp. 771-790.
(2006) Burdakov O. , Sysoev O. ,Grimvall A. and Hussian M. An O(n2) algorithm for isotonic regression. In: G. Di Pillo and M. Roma (Eds) Large-Scale Nonlinear Optimization. Series: Nonconvex Optimization and Its Applications, Springer-Verlag, 83, pp. 25-33.
(2005) Hussian M. ,Grimvall A. ,Burdakov O. and Sysoev O. Monotonic regression for the detection of temporal trends in environmental quality data. MATCH Commun. Math. Comput. Chem. 54, pp. 535-550.

Do you have data in which target variable is expected to be an increasing or decreasing function of multiple features, but the observed data is not monotonic due to noise? Our publication from 2006 proposes a new method which has quadratic complexity and is able to find an approximate solution to the given monotonic regression problem. The choice of hyperparameter (data preordering) is investigated in our second 2006 publication. An empirical study using the general methodology was published already in 2005.

My conference publications

(2025) Qummar S, Ernstsson A, Kessler C, Sysoev O. SkePU-DNN: Algorithmic Skeleton Programming for Deep Learning on Heterogeneous Systems. Accepted to PDSEC'25.

Here we extend algoritmic skeleton model in SkePU framework to accomodate deep neural networks, focusing on convolutional neural networks. By using image datasets and various backends, we show that model accuracy of SkePU-DNN is on-par with that of Keras.

(2022) Svahn, C., Sysoev. O,: CCVAE: A variational autoencoder for handling sensored covariates. IEEE ICMLA 2022.
Link here login: icmlapub22 pass: conf22//

Do you have large-scale data with thousands of observations and hundreds of features in which some or all features are censored subject to detection limits, for example they are not observed below some threshold? We propose a new variational autoencoder framework that is able to handle that scenario! It can find a latent representation of the features, denoise the features and make probabilistic predictions of the target variable, including uncertainty estimates.

(2021) Sysoev O., Gawel D., Lilja S., Schäfer S., Benson M. Cell type identification for single cell RNA data by bulk data reference projection. 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM), pp. 742-746, doi: 10.1109/BIBM52615.2021.9669148.

Single cell data is composed of cells that belong to different cell types which are not measured. To define the cell types for each single cell, we propose to use other kind of data (bulk data) where the observations are labeled with their cell types. In this paper, a novel machine learning method is proposed to "project" bulk data into clusters of single cells and, by doing this, predicting the cell types of single cells from these clusters.

(2019) Svahn, C., Sysoev, O., Cirkic M., Gunnarsson F. and Berglund J.: Inter-frequency radio signal quality prediction for handover, evaluated in 3GPP LTE. 2019 IEEE 89th Vehicular Technology Conference (VTC2019-Spring), Kuala Lumpur, Malaysia, pp. 1-5. doi: 10.1109/VTCSpring.2019.8746369

In this paper, we aim to predict whether the signal on the current frequency is stronger than the signal on an alternative frequency, based on the current frequency signal characteristics. This is formulated as a classification problem which is modeled by various machine learning methods such as random forests and neural networks. The quality of the prediction appears to be poor, and we introduce a duo-threshold approach which is based on the rejection option principle. This allows us to increase the accuracy of predictions dramatically while the costs increase only marginally.

Older conference publications

(2013) Sysoev, O.: Estimating binary monotonic regression models and their uncertainty by incorporating kernel smoothers. Complex Data Modelling and Computationally Intensive Statistical Methods for Estimation and Prediction conference.
(2009) O. Burdakov, A. Grimvall and O. Sysoev. Generalized PAV algorithm with block refinement for partially ordered monotonic regression. In: A. Feelders and R. Potharst (Eds.) Proceedings of the Workshop on Learning Monotone Models from Data at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 23-37.
(2004) O. Burdakov, O. Sysoev, A. Grimvall and M. Hussian. An algorithm for isotonic regression problems. In: The Proceedings of the 4th European Congress of Computational Methods in Applied Science and Engineering `ECCOMAS 2004'.
(2004) M. Hussian, A. Grimvall, O. Burdakov and O. Sysoev. Monotonic regression for trend assessment of environmental quality data. In: The Proceedings of the 4th European Congress of
Computational Methods in Applied Science and Engineering `ECCOMAS 2004'

Page responsible: Oleg Sysoev
Last updated: 2025-08-25

IDA - Department of Computer and Information Science

My journal publications

My conference publications

Older conference publications