DualLoc: Full-parameter fine-tuning of cascaded dual transformers for protein subcellular localization prediction
DualLoc: Full-parameter fine-tuning of cascaded dual transformers for protein subcellular localization prediction
Chen, Y. G.; Chung, W.-Y.; Chang, K. Y.
AbstractAccurate protein subcellular localization is essential for biological function, and mislocalization is linked to numerous diseases. While current methods like DeepLoc 2.0 employ lightweight fine-tuning of protein language models (PLMs), their ability to predict multi-compartment localization remains limited. To address this, we introduce DualLoc, a multi-label localization predictor for ten compartments. DualLoc leverages full-parameter fine-tuning of a cascaded dual-transformer architecture, built upon foundational PLMs and augmented with attention and dropout layers. We evaluated this framework using three foundational PLMs--ProtBERT, ESM-2, and ProtT5--as backbones. Cross-validation on Swiss-Prot and independent validation on the Human Protein Atlas demonstrate consistent superiority over state-of-the-art baselines. The best-performing variant, DualLoc-ProtT5, achieves 0.5872 accuracy, 0.8271 micro-F1, and 0.7811 macro-F1, with substantial gains in the Matthews correlation coefficient for the nucleus (+0.13), cell membrane (+0.13), and extracellular space (+0.07). Pointwise mutual information analysis of model outputs reveals biologically relevant compartment couplings, notably between the Golgi apparatus and endoplasmic reticulum (PMI = 0.25, P <10-6), accurately reflecting secretory pathway coordination. DualLoc provides both a highly accurate predictive tool and a robust framework for investigating protein multi-localization mechanisms.