In vivo validation of predicted fitness effects at single-base resolution in a Brachypodium distachyon mutant population
In vivo validation of predicted fitness effects at single-base resolution in a Brachypodium distachyon mutant population
Moslemi, C.; Folgoas, M.; Yu, X.; Jensen, J. D.; Hentrup, S.; Li, T.; Wang, H.; Boelt, B.; Asp, T.; Sibout, R.; Ramstein, G. P.
AbstractComputational tools, including biological language models (LMs), show substantial promise in predicting the impact of genetic variants on plant fitness. However, validating variant effect predictions (VEP) requires experimental populations where genetic variation consists of discrete point mutations rather than segregating recombination blocks. In this study, we generated a novel population of Brachypodium distachyon mutant lines to evaluate the accuracy of VEP at single-base resolution. These lines were advanced through single-seed descent for five generations (M1 to M5), with whole-genome sequencing performed at M2 and M5 and phenotypic measurements recorded at M3 and M4. Using state-of-the-art VEP models, we predicted the functional impact of missense protein-coding variants and gene-proximal non-coding variants. We validated these predictions by estimating the effect of mutations on whole-plant measurements (burden tests) and their probability of fixation from M2 to M5 (purging tests). Among missense variants, the protein LM ESM showed superior predictive accuracy compared to the bioinformatic standard SIFT and the genomic LM PlantCAD. Notably, the relationship between VEP scores and allele fixation suggested a log-linear relationship between VEP scores and variant fitness. Among gene-proximal variants, PlantCAD appeared more accurate than supervised models of regulatory activity, such as chromatin accessibility (a2z) and RNA abundance (PhytoExpr). Collectively, our findings highlight the utility of state-of-the-art VEP tools as predictors of fitness and demonstrate the potential of mutant populations to evaluate computational tools for precision breeding applications.