ProtHGT: Heterogeneous Graph Transformers for Automated Protein Function Prediction Using Biological Knowledge Graphs and Language Models

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

ProtHGT: Heterogeneous Graph Transformers for Automated Protein Function Prediction Using Biological Knowledge Graphs and Language Models

Authors

Ulusoy, E.; Dogan, T.

Abstract

Motivation: The rapid accumulation of protein sequence data, coupled with the slow pace of experimental annotations, creates a critical need for computational methods to predict protein functions. Existing models often rely on limited data types, such as sequence-based features or protein-protein interactions (PPIs), failing to capture the complex molecular relationships in biological systems. To address this, we developed ProtHGT, a heterogeneous graph transformer-based model that integrates diverse biological datasets into a unified framework using knowledge graphs for accurate and interpretable protein function prediction. Results: ProtHGT achieves state-of-the-art performance on benchmark datasets, demonstrating its ability to outperform current graph-based and sequence-based approaches. By leveraging diverse biological entity types and highly representative protein language model embeddings at the input level, the model effectively learns complex biological relationships, enabling accurate predictions across all Gene Ontology (GO) sub-ontologies. Ablation analyses highlight the critical role of heterogeneous data integration in achieving robust predictions. Finally, our use-case study has indicated that it\'s possible to interpret ProtHGT\'s predictions via exploring the related parts of our input biological knowledge graph, offering plausible explanations to build or test new hypotheses. Availability and Implementation: ProtHGT is available as a programmatic tool on https://github.com/HUBioDataLab/ProtHGT and as a web service on https://huggingface.co/spaces/HUBioDataLab/ProtHGT. Contact: To whom the correspondence should be addressed: Tunca Do[g]an ([email protected])

Follow Us on

0 comments

Add comment