Evo 2's Perception of Single Nucleotide Substitutions in the Genes of Two Plant Model Organisms

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Evo 2's Perception of Single Nucleotide Substitutions in the Genes of Two Plant Model Organisms

Authors

Mantegazza, O.; Bertolini, L.; Leoni, G.; Colaiacovo, M.; Petrillo, M.; Bonfini, L.; Savini, C.; Ceresa, M.; Zaoui, X.

Abstract

Although DNA Large Language Models (DNA-LLMs) offer a path to decoding genetic complexity, our ability to evaluate these models is constrained by our incomplete understanding of the very same genetic syntax and functional logic that these models are trained to learn. In this study we use single nucleotide substitutions that have or have not been observed in living organisms, to evaluate how the DNA-LLM Evo 2 interprets gene sequences from two plant model organisms, Arabidopsis thaliana and Oryza sativa japonica. Using perplexity as a measure of the model's confidence, we observe that alleles containing simulated substitutions are perceived, on average, as less likely than those observed in vivo. Although the size of the effect is modest, the effect is statistically significant and robust, suggesting that Evo 2 is aligned with our current understanding of evolutionary selective constraints. This approach is designed to be model-agnostic and species-agnostic and could serve as a generic framework for evaluating the performance of DNA-LLMs.

Follow Us on

0 comments

Add comment