**My current research**is focused on development of novel statistical machine learning models for different applications, primarily in personalized medicine, public health and in telecommunications. The methods proposed in my publications are based on decision trees, kernel models, neural networks and statistical uncertainty evaluation methods such as bootstrap and jackknife. I have been also working on creating models for large-scale data, in particular monotonic regression models for large and multivariate data.

### My journal publications

(2024) Schäfer S, Smelik M, Sysoev O, Zhao Y, Eklund D, Lilja S,
Gustafsson M, Heyn H, Julia A, Kovács IA, Loscalzo J, Marsal S,
Zhang H, Li X, Gawel D, Wang H, Benson M. scDrugPrio: A
framework for the analysis of single-cell transcriptomics to
address multiple problems in precision medicine in
immune-mediated inflammatory diseases. Genome Med 2024 (in
press) We present a novel method that ranks a set of user specified drugs based on user specified single cell data from one or more sick individuals and knowledge about the regulatory networks and the targets of the drugs in these networks. The most efficient drug is ranked first, and least efficient drug is ranked last. The efficiency of the method is demonstrated by in vitro (in lab) and in vivo (mouse) experiments. |

(2023) Lilja S., Li X., Lee EJ., Loscalzo J., Marthanda PB., Hu
L.., Magnusson M., Sysoev O., Zhang H., Zhao Y., Sjövall C., Gawel
D., Wang H., Benson M. Multi-organ single-cell analysis
reveals an on/off switch system with potential for personalized
treatment of immunological diseases. Cell Reports Medicine
DOI:10.1016/j.xcrm.2023.100956Multicellular Disease Model of multiple tissues reveals that inflammations switch on/off based on altered balance between pro- and anti-inflammatory upstream regulators and downstream pathways |

(2022) Li X., Lee EJ, Lilja S., Loscalzo J., Schäfer S., Smelik
M., Strobl M.R., Sysoev O., Wang H., Zhang H., Zhao Y., Gawel
D.R., Bohle B., Benson, M. A dynamic single cell-based
framework for digital twins to prioritize disease genes and drug
targets. Genome Medicine 14:48Here we propose a new approach based on Multicellular Disease Models to detect early biomarkers of the disease. The approach employs longitudinal single cell data for patients and controls. |

(2022) Svahn C., Sysoev. O. Selective Imputation of
Covariates in High Dimensional Censored Data. Journal
of Computational and Graphical Statistics, 31:4, 1397-1405.
DOI:10.1080/10618600.2022.2035233 Do you have data with many features where some or all features are observed subject to detection limits, for example some features are not observed below some threshold? This publication may help you to build your prediction model more efficiently! The method is based on a combination of an 'improper' multiple imputation approach and subspace k-Nearest Neighbors. |

(2021) Pérez W., Selling K.E, Blandón E.Z, Peña R., Contreras M,
Persson LA, Sysoev O., Källestål C. Trends and factors
related to adolescent pregnancies: an incidence trend and
conditional inference trees analysis of northern Nicaragua
demographic surveillance data. BMC Pregnancy
Childbirth 21, 749 This paper uses conditional inference tree framework to build an interpretable machine learning model of adolescent pregnancy based on the data from the Northern Nicaragua. |

(2020) Lee EJ, Gawel D, Lilja S, Li X, Schäfer S, Sysoev O,
Zhang H and Benson M. . Analysis of expression profiling data
suggests explanation for difficulties in finding biomarkers for
nasal polyps. Rhinology 58(4) pp. 360-367 In this paper we search for biomarkers of nasal polyps by studying regulatory mechanisms. It is concluded that there are multiple components, and combinations of biomarkers is needed for successful diagnostics |

(2020) Björnsson B, Borrebaeck C, Elander N, Gasslander T, Gawel
DR, Gustafsson M, Jörnsten R, Lee EJ, Li X, Lilja S,
Martínez-Enguita D, Matussek A, Sandström P, Schäfer S, Stenmarker
M, Sun XF, Sysoev O, Zhang H, Benson M. Digital twins to
personalize medicine. Genome Med 12, 4, doi:
10.1186/s13073-019-0701-3This paper describes a generic framework called Digital Twins that can be used to create "digital copies" of any individual patient, treat the person computationally with thousands of different drugs, and then select a treatment that would be optimal for that particular patient. Possible ways to implement this generic framework are explained. |

(2020) Källestål C., Blandón E.Z., Peña R., Peréz W., Contreras
M., Persson L.Å., Sysoev O. Selling, K.E.: Assessing the
Multiple Dimensions of Poverty. Data Mining Approaches to the
2004–14 Health and Demographic Surveillance System in Cuatro
Santos, Nicaragua. Frontiers in Public Health vol 7, pp
409. doi: 10.1186/s12939-019-1054-7This is an applied study where k-means algorithm is applied to discover clusters of various variables associated with poverty. Descriptive analyses are done to characterize the obtained clusters. |

(2019) Källestål C., Blandón E.Z., Peña R., Peréz W., Contreras
M., Persson L.Å., Sysoev O. Selling, K.E.: Predicting
poverty. Data mining approaches to the health and demographic
surveillance system in Cuatro Santos, Nicaragua.
International Journal for Equity in Health vol 18, no 165. doi:
10.3389/fpubh.2019.00409In this applied study, conditional inference trees were used to build a decision tree model that predicts poverty status from multiple predictors, such as education, emigration and food consumption. |

(2019) Svefors P., Sysoev O., Ekström E.C., Person L.Å., El
Arifeen S., Naved R., Rahman A., Islam Khan A., Ekholm Selling K.:
Relative importance of prenatal and postnatal determinants of
stunting: data mining approaches to the MINIMat cohort,
Bangladesh. BMJ Open vol 9:e025154. doi:
10.1136/bmjopen-2018-025154This is an applied study where conditional inference random forests are applied to identify the most important risk factors of child stunting. In addition, conditional inference trees are used to build interpretable predictive models. |

(2019) Sysoev. O, Bartoszek K., Ekstrom EC and Ekholm Selling K.
PSICA: decision trees for probabilistic subgroup
identification with categorical treatments. Statistics in
Medicine, pp. 1– 17. doi : 10.1002/sim.8308Do you have data in which multiple treatments or interventions were applied to a group of individuals and you want to discover subgroups that share similar characteristics and that also benefit from the same kinds of treatments? A novel interpretable machine learning (decision tree) framework is proposed in this paper to discover such subgroups and estimate the probabilities of certain treatments to be the best ones in these subgroups. |

(2018) Sysoev, O., Burdakov, O. A smoothed monotonic
regression via l2 regularization. Knowledge and Information
Systems, 1-22.(2017) Burdakov, O., Sysoev, O. A Dual Active-Set Algorithm
for Regularized Monotonic Regression. Journal of
Optimization Theory and Applications 172.3: 929-949.These two publications introduce a new regression method to estimate the target variable as a smooth and monotonically increasing function of the predictor. The probabilistic model and hyperparameter selection strategy is presented in 2018 paper, while some additional mathematical proofs regarding complexity estimates are given in 2017 paper. |

(2016) Kalish, M.L, Dunn J.C., Burdakov O. and Sysoev O.: A
statistical test of the equality of latent orders Journal
of mathematical psychology, vol 70, pp 1-11. Do you have two observed features that correspond to two latent (unobserved) features and you want to find out whether the latent features are monotonically related to each other? The hypothesis testing framework developed in this paper can help you! The method is based on a modification of the monotonic regression approach and bootstrap methods. |

(2015) Sysoev, O., Grimvall, A., and Burdakov, O..: Bootstrap
confidence intervals for large-scale multivariate monotonic
regression problems. Statistics-Simulation and Computation,
pp 1-16. The paper introduces new methods to compute confidence intervals for monotonic regression in a multivariate setting. The methods are based on bootstrap and can handle large data sets and focus on inference of the expected target. |

(2013) Sysoev, O., Grimvall, A., and Burdakov, O.: Bootstrap
estimation of the variance of the error term in monotonic
regression models. Journal of Statistical Computation and
Simulation 83.4 : pp 627-640.If AIC criterion is used for model selection, it is important to be able to estimate variance of the error in the regression model. This work introduces a variance estimator for monotonic regression models, and this estimator can be used for example for feature selection in monotonic regression models. The estimator has been demonstrated to have good finite sample properties. |

(2011) Sysoev, O., Burdakov, O., Grimvall, A.: A
Segmentation-Based Algorithm for Large-Scale Monotonic
Regression Problems. Computational Statistics and Data
Analysis 55, pp. 2463-2476A new method for solving large-scale monotonic regression problems is proposed here. The approach acts by splitting the data into smaller segments, solving regression problem in these segments, and then modifying these local solutions in a special manner into a global monotonic prediction. |

- (2006) Burdakov, A. Grimvall and O. Sysoev.
*Data preordering in generalized PAV algorithm for monotonic regression*. Journal of Computational Mathematics. 24, No. 6, pp. 771-790. - (2006) Burdakov O. , Sysoev O. ,Grimvall A. and Hussian M.
*An O(n2) algorithm for isotonic regression.*In: G. Di Pillo and M. Roma (Eds) Large-Scale Nonlinear Optimization. Series: Nonconvex Optimization and Its Applications, Springer-Verlag, 83, pp. 25-33. - (2005) Hussian M. ,Grimvall A. ,Burdakov O. and Sysoev O.
*Monotonic regression for the detection of temporal trends in environmental quality data.*MATCH Commun. Math. Comput. Chem. 54, pp. 535-550.
Do you have data in which target variable is expected to be an increasing or decreasing function of multiple features, but the observed data is not monotonic due to noise? Our publication from 2006 proposes a new method which has quadratic complexity and is able to find an approximate solution to the given monotonic regression problem. The choice of hyperparameter (data preordering) is investigated in our second 2006 publication. An empirical study using the general methodology was published already in 2005. |

### My conference publications

A variational autoencoder
for handling sensored covariates. IEEE ICMLA 2022.Link here login: icmlapub22 pass: conf22// Do you have large-scale data with thousands of observations and hundreds of features in which some or all features are censored subject to detection limits, for example they are not observed below some threshold? We propose a new variational autoencoder framework that is able to handle that scenario! It can find a latent representation of the features, denoise the features and make probabilistic predictions of the target variable, including uncertainty estimates. |

(2021) Sysoev O., Gawel D., Lilja S., Schäfer S., Benson M. Cell
type identification for single cell RNA data by bulk data
reference projection. 2021 IEEE International Conference on
Bioinformatics and Biomedicine (BIBM), pp. 742-746, doi:
10.1109/BIBM52615.2021.9669148.
Single cell data is composed of cells that belong to different cell types which are not measured. To define the cell types for each single cell, we propose to use other kind of data (bulk data) where the observations are labeled with their cell types. In this paper, a novel machine learning method is proposed to "project" bulk data into clusters of single cells and, by doing this, predicting the cell types of single cells from these clusters. |

(2019) Svahn, C., Sysoev, O., Cirkic M., Gunnarsson F. and Berglund J.: Inter-frequency radio signal quality prediction
for handover, evaluated in 3GPP LTE. 2019 IEEE 89th
Vehicular Technology Conference (VTC2019-Spring), Kuala Lumpur,
Malaysia, pp. 1-5. doi: 10.1109/VTCSpring.2019.8746369
In this paper, we aim to predict whether the signal on the current frequency is stronger than the signal on an alternative frequency, based on the current frequency signal characteristics. This is formulated as a classification problem which is modeled by various machine learning methods such as random forests and neural networks. The quality of the prediction appears to be poor, and we introduce a duo-threshold approach which is based on the rejection option principle. This allows us to increase the accuracy of predictions dramatically while the costs increase only marginally. |

### Older conference publications

- (2013) Sysoev, O.:
*Estimating binary monotonic regression models and their uncertainty by incorporating kernel smoothers.*Complex Data Modelling and Computationally Intensive Statistical Methods for Estimation and Prediction conference. - (2009) O. Burdakov, A. Grimvall and O. Sysoev.
*Generalized PAV algorithm with block refinement for partially ordered monotonic regression.*In: A. Feelders and R. Potharst (Eds.) Proceedings of the Workshop on Learning Monotone Models from Data at the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, pp. 23-37. - (2004) O. Burdakov, O. Sysoev, A. Grimvall and M. Hussian.
*An algorithm for isotonic regression problems.*In: The Proceedings of the 4th European Congress of Computational Methods in Applied Science and Engineering `ECCOMAS 2004'. - (2004) M. Hussian, A. Grimvall, O. Burdakov and O. Sysoev.
*Monotonic regression for trend assessment of environmental quality data.*In: The Proceedings of the 4th European Congress of

Computational Methods in Applied Science and Engineering `ECCOMAS 2004'

Page responsible: Oleg Sysoev

Last updated: 2024-03-16