A modular Bayesian framework for inferring transmission networks from polyclonal infections, with application to Plasmodium falciparum
A modular Bayesian framework for inferring transmission networks from polyclonal infections, with application to Plasmodium falciparum
Murphy, M. R.; Nielsen, R.; Perkins, A.; Greenhouse, B.
AbstractMotivation: Molecular surveillance and infectious disease transmission network reconstruction can provide compelling evidence for estimating public-health quantities that are difficult to observe directly, including importation, source-sink structure, and differences in onward transmission across locations or intervention strata. These quantities can be expressed as functions of the underlying transmission network, but individual transmission events are rarely observed and many networks may be consistent with the same data. Existing transmission network reconstruction methods leveraging genetic data are often built for settings in which each infection has one dominant source, one representative haplotype, and mutation-driven genetic divergence along transmission chains. These assumptions are poorly matched to polyclonal infections, in which hosts carry multiple genetically distinct clones and recipient infections may reflect contributions from multiple sources. Such infections are common in malaria, tuberculosis, HIV, and many parasitic infections. Methods are needed that can accommodate these data. Results: We present a modular Bayesian framework for estimating directed transmission on sampled cases, where an infection may have no sampled parent, one parent, or several parents, including sources outside the observed panel. Pathogen-specific modules supply likelihoods over candidate parent sets and connect to shared inference that yields marginal directed edge probabilities, posterior mean out-degree, and inclusion probabilities for unobserved parents. We demonstrate our framework with Plasmotrack, a transmission network model for \textit{Plasmodium falciparum} that uses targeted amplicon sequencing data. We implemented these components with a per-locus allele-mixture transmission likelihood, an amplicon genotyping error model, and data augmentation allowing for unobserved parents. Simulations from a biologically informed generative model, under which the inferential per-locus allele-mixture likelihood is misspecified, showed recovery of aggregate network summaries including mean outdegree and mean unobserved-source inclusion, alongside high precision and recall for detecting directed transmission. Other pathogens can reuse the same modular composition after substituting transmission and observation likelihoods. Availability: The Plasmotrack software and documentation are available at https://github.com/eppicenter/plasmotrack. Source code and example datasets are provided under an open-source license.