Clustering large-scale biomedical data to model dynamic accumulation processes in disease progression and anti-microbial resistance evolution

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Clustering large-scale biomedical data to model dynamic accumulation processes in disease progression and anti-microbial resistance evolution

Authors

Dauda, K. A.; Aga, O. N.; Johnston, I.

Abstract

Accumulation modelling uses machine learning to discover the dynamics by which systems acquire discrete features over time. Many systems of biomedical interest show such dynamics: from bacteria acquiring resistances to sets of drugs, to patients acquiring symptoms during the course of progressive disease. Existing approaches for accumulation modelling are typically limited either in the number of features they consider or their ability to characterise interactions between these features - a limitation for the large-scale genetic and/or phenotypic datasets often found in modern biomedical applications. Here, we demonstrate how clustering can make such large-scale datasets tractable for powerful accumulation modelling approaches. Clustering resolves issues of sparsity and high dimensionality in datasets but complicates the interpretation of the inferred dynamics, especially if observations are not independent. Focussing on hypercubic hidden Markov models (HyperHMM), we introduce several approaches for interpreting, estimating, and bounding the results of the dynamics in these cases and show how biomedical insight can be gained in such cases. We demonstrate this \"Cluster-based HyperHMM\" (CHyperHMM) pipeline for synthetic data, clinical data on disease progression in severe malaria, and genomic data for anti-microbial resistance evolution in Klebsiella pneumoniae, reflecting two global health threats.

Follow Us on

0 comments

Add comment