GOTFlow: Learning Directed Population Transitions from Cross-Sectional Biomedical Data with Optimal Transport
GOTFlow: Learning Directed Population Transitions from Cross-Sectional Biomedical Data with Optimal Transport
Wright, G.; Alzaid, E.; Muter, J.; Brosens, J.; Minhas, F.
AbstractMotivation: Many biological and clinical processes are dynamic, yet most datasets are cross-sectional, capturing populations at discrete states rather than tracking individuals over time. This makes it difficult to quantify how populations change across developmental, physiological, or disease-associated conditions. Existing trajectory and transport-based methods often rely on fixed feature spaces, assumptions tailored to transcriptomic time-course data, or approximately linear progression, limiting their ability to model heterogeneous and unbalanced transitions across diverse biomedical modalities. Flexible methods are needed that can infer directed population-level change from cross-sectional data while retaining biological interpretability. Results: We present GOTFlow, a framework for learning directed population transitions from cross-sectional biomedical data using graph-constrained optimal transport in a learned latent space. GOTFlow integrates representation learning with unbalanced optimal transport to jointly estimate embeddings and transport couplings between biological states. This enables hypothesis-driven modelling of progression structures while accommodating non-linear geometry, branching relationships, and changes in population mass. From the inferred transport plans, GOTFlow derives interpretable summaries of dynamics, including drift vectors quantifying transitions, and feature-level transported changes that highlight molecular drivers of progression. In synthetic data, GOTFlow recovered known transitions with strong agreement between inferred and ground-truth drifts. Across three biological applications, endometrial remodelling, breast cancer risk progression, and prion disease, GOTFlow identified state-to-state transitions and biologically meaningful feature shifts reflecting impaired decidualisation, increasing cancer risk, and neurodegenerative progression. These results establish GOTFlow as a general and interpretable framework for analysing directed population dynamics from cross-sectional data. Availability: Code available at: https://github.com/wgrgwrght/GOTFlow Supplementary information: Available online.