The Viral AlphaFold Database of monomers and homodimers reveals conserved protein folds in viruses of bacteria, archaea, and eukaryotes
The Viral AlphaFold Database of monomers and homodimers reveals conserved protein folds in viruses of bacteria, archaea, and eukaryotes
Odai, R.; Leemann, M.; Al-Murad, T.; Abdullah, M.; Shyrokova, L.; Tenson, T.; Hauryliuk, V.; Durairaj, J.; Pereira, J.; Atkinson, G. C.
AbstractViruses are among the most abundant and genetically diverse entities on Earth, yet the functions and evolutionary origins of most viral proteins remain poorly understood. Their rapid evolution often obscures evolutionary relationships, making it difficult to assign functions using sequence-based methods alone. Although conservation of protein fold can reveal deep homologies undetectable by sequence comparison, viral proteins remain vastly underrepresented in structural databases, limiting our ability to explore them at the structural level. Here, we address this gap by clustering all unique viral sequences from the NCBI RefSeq database and predicting the structures of ~27,000 representative proteins using AlphaFold2, creating a large-scale viral structural resource, the Viral AlphaFold Database (VAD). We uncover ~10,000 proteins belonging to clusters that share folds across viruses infecting bacteria, archaea, and eukaryotes, revealing shared protein folds across diverse host-infecting viruses. We also predict oligomeric states using AlphaFold2-based homodimer modelling, alongside structural comparisons to the Protein Data Bank, providing valuable new data on the potential for viral proteins to oligomerise. We further reveal that large regions of the viral protein universe remain functionally dark and report the discovery and experimental validation of a previously uncharacterised antiviral toxin-antitoxin (TA) system. VAD is a resource that provides a foundation for exploring viral structure-function relationships, including ancient folds that shape viral interactions across all life. Predicted structures used in this study are available at data-sharing.atkinson-lab.com/vad/.