A Cophylogenetic Approach for Virus-Host Interaction Prediction

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

A Cophylogenetic Approach for Virus-Host Interaction Prediction

Authors

Chowdhury, M. Z. U. S.; Murali, T. M.; Sashittal, P.

Abstract

Advances in metagenomics have rapidly expanded viral discovery, revealing vast diversity across Earth's virosphere. Yet most virus-host interactions i.e., which viruses infect which hosts remain unrecorded. Identifying these interactions is essential for anticipating zoonotic spillover events and advancing biomedical applications such as bacteriophage therapy. However, the sheer diversity of viruses and hosts makes comprehensive experimental mapping infeasible, motivating the need for computational approaches. Most existing prediction methods rely on supervised learning strategies that use sequence derived features, such as codon usage bias or k-mer frequencies, and do not model the coevolutionary processes that shape virus-host interactions. This limits their ability to generalize and the evolutionary interpretability of their predictions. We introduce CoEvoLink, a framework for predicting virus-host interactions that integrates sequence-based evidence with phylogenetic signal by explicitly modeling the coevolutionary histories of viruses and hosts. CoEvoLink infers likely but unobserved interactions by minimizing the number of evolutionary events required to explain them, yielding the most parsimonious interaction under a coevolutionary model. This formulation generalizes classical maximum parsimony, typically defined on a single phylogeny, by jointly optimizing parsimony across both virus and host phylogenies. Sequence-based information is incorporated by assigning a cost to each potential interaction that reflects its likelihood based on genomic features. By drawing a connection between computing parsimony on interaction matrices and maximum parsimony on phylogenetic networks, we derive a polynomial-time algorithm that balances parsimony with sequence-derived prediction cost. We demonstrate the effectiveness of CoEvoLink on simulated data under diverse coevolutionary models. Applying CoEvoLink, we identified putative bat hosts of betacoronaviruses that have not yet been cataloged in the VIRION database. On a benchmark derived from metagenomic sequencing data, we demonstrate that CoEvoLink improves the performance of existing phage-host prediction tools using cophylogenetic information.

Follow Us on

0 comments

Add comment