Science Cast

Protein Function Prediction with Pretrained ProtT5 Embeddings and Gradient Boosting

librarianApril 29, 2026 3:56am

Views (3)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Protein Function Prediction with Pretrained ProtT5 Embeddings and Gradient Boosting

bioRxivPDFApril 28, 2026 12:00am

Authors

Appel, J.; Butcher, N.

Abstract

Protein function prediction remains a central challenge in computational biology due to the extreme sparsity and long-tail distribution of Gene Ontology (GO) [1] annotations. Advances in protein language models enable the extraction of dense, fixed-length representations from amino acid sequences, offering a scalable alternative to hand-picked features such as physicochemical properties. In this work, we evaluate a transformer-based embedding approach using ProtT5-XL combined with classical and modern multi-label classifiers for Gene Ontology prediction in the CAFA-6 setting. Fixed-length embeddings were generated via mean pooling of transformer hidden states and used as input to one-vs-rest logistic regression, gradient-boosted decision trees, and a neural network. Models were evaluated on held-out validation data with a focus on threshold selection, prediction sparsity, and behavior across frequent and rare GO terms. Gradient boosting consistently provided the best balance between predictive performance and stable prediction behavior, motivating its use for ontology-specific predictors across molecular function, biological process, and cellular component annotations. This study highlights practical modeling choices for large-scale protein function prediction using pretrained sequence embeddings and provides an interpretable baseline for future CAFA evaluations.

TwitterandLinkedIn

0 comments

Add comment

Protein Function Prediction with Pretrained ProtT5 Embeddings and Gradient Boosting

Protein Function Prediction with Pretrained ProtT5 Embeddings and Gradient Boosting

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments