Investigation of Protein Melting Temperature Prediction with Cross-Method Validation on Biophysical Data

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Investigation of Protein Melting Temperature Prediction with Cross-Method Validation on Biophysical Data

Authors

Pailozian, K.; Kohout, P.; Damborsky, J.; Mazurenko, S.

Abstract

Motivation: Protein melting temperature (Tm) prediction accelerates the discovery of thermostable enzymes which are crucial for industrial biotechnology often requiring harsh reaction conditions. Experimental determination of Tm remains labour-intensive and varies across techniques, motivating the development of in silico predictors. Mass-spectrometry datasets such as Meltome Atlas now enable large-scale Tm prediction with models based on deep learning, but model generalisation across diverse experimental datasets has not been systematically tested. Results: We evaluated the generalisability of state-of-the-art deep learning approaches and explored ESM-based embeddings for Tm prediction. To this end, we assembled the ProMelt training dataset (45 441 proteins) and five independent biophysics-based validation datasets. Our analysis revealed substantial differences between proteomics- and biophysics-based Tm measurements, highlighting the challenge of cross-domain generalisation. Existing state-of-the-art predictors trained on large-scale proteomics datasets showed reduced performance on biophysics-based validation sets. Our fine-tuned embedding-based models, particularly LoRA-adapted ESM-2 (TmProt 1.0), outperformed state-of-the-art predictors in identifying thermostable proteins Tm [≥] 60{degrees}C) across heterogeneous datasets, achieving AUC scores of 0.75--0.77. We also demonstrated that the available models could be used efficiently in the sequence prioritization task. Availability: The TmProt web server is available at https://loschmidt.chemi.muni.cz/tmprot/.

Follow Us on

0 comments

Add comment