Unveiling the Hidden Rules: Enhancing NMD Prediction for Protein-Truncating Variants
Unveiling the Hidden Rules: Enhancing NMD Prediction for Protein-Truncating Variants
Egab, I.; Schmidt, J.; Cortazar, M.; Xu, J.; Orchard, P.; Bozkurt-Yozgatli, T.; Dawood, M.; Koh, J.; Mestroni, L.; Taylor, M.; Yi, S. S.; Calame, D.; Posey, J.; Gibbs, R. A.; Boerwinkle, E.; Reiner, A. P.; de Vries, P. S.; Morrison, A.; Shaw, C. A.; Lupski, J. R.; Carvalho, C. M. B.; Montgomery, S. B.; Jagannathan, S.; Coban Akdemir, Z.
AbstractNonsense-mediated decay (NMD) is a conserved RNA quality-control pathway that degrades transcripts containing premature termination codons. Because roughly a third of pathogenic variants in ClinVar can lead to truncated protein synthesis, predicting whether such transcripts undergo NMD is central to interpreting variant effects, yet the canonical 50-55 nucleotide rule explains only about half of observed outcome variability. Using paired whole-genome and RNA-sequencing from 10,306 individual samples in the Trans-Omics for Precision Medicine (TOPMed) program, we quantified NMD efficiency for 5,749 germline truncating variants via allele-specific expression and trained a gradient-boosting classifier, TrunCat, that distinguished NMD-sensitive from NMD-escape transcripts with ~78% ROC-AUC (Receiver Operating Characteristic - Area Under the Curve). A reduced model using the ten features with the highest mean SHAP (SHapley Additive exPlanations) value as a measure of each feature's average contribution to predictions nearly matched this performance. Applied across large variant databases and a rare-disease cohort, the model produced NMD outcome predictions, with variants of uncertain significance showing higher predicted escape than pathogenic ones. This framework confirms the canonical rule, identifies non-canonical determinants, and offers a scalable resource for interpreting protein-truncating variants.