MSACLR: Contrastive Learning of Protein Conformations from MSAs

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

MSACLR: Contrastive Learning of Protein Conformations from MSAs

Authors

ZHANG, J.; Xing, E.; Cheng, X.

Abstract

We propose MSACLR ( Multiple Sequence Alignment Contrastive Learning Representation), a two-stage contrastive learning framework that maps MSA space to conformational space. In Stage 1, embeddings are trained to discriminate structural folds across diverse proteins using only MSA information. In Stage 2, embeddings are fine-tuned on subMSAs labeled by their associated predicted structural clusters, enabling discrimination of alternative conformations within the same protein. To enrich training data, we introduce BLOSUM62-guided [1] augmentation, which expands the pool of subMSAs associated with each structural cluster label by introducing sequence-level diversity. Our experiments show that MSACLR embeddings achieve clearer fold-level separation than single sequence baselines, while fine-tuned embeddings capture conformational variation across scales from local loop motions to domain motions and fold switching. MSACLR provides a foundation for efficient exploration of MSA space and enables sampling of conformational ensembles, bridging the gap between static structure pre- diction and dynamic protein behavior.

Follow Us on

0 comments

Add comment