Science Cast

Improving genomic language model reliability under distribution shift

Gavin HearneMarch 21, 2026 4:58am

Views (1)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Improving genomic language model reliability under distribution shift

bioRxivPDFMarch 20, 2026 12:00am

Authors

Hearne, G.; Refahi, M. S.; Polikar, R.; Rosen, G. L.

Abstract

Transformer-based Genomic Language Models (GLMs) have achieved strong performance across diverse genomic prediction tasks. However, their tendency toward overconfident predictions---particularly on noisy or unfamiliar data---limits reliability. In genomics, where unknown species and novel variants are common, developing models robust to distribution shift is crucial for dependable predictions. Here, we analyze the impact of several common and novel uncertainty quantification (UQ) methods in the context of GLMs, evaluating their performance across diverse downstream genomic and metagenomic prediction tasks. Comparing model behavior on both in-distribution (ID) and out-of-distribution (OOD) data, we show that temperature scaling and epistemic neural networks are capable of improving classification reliability across multiple GLM architectures and domains. The software is available at: https://github.com/EESI/glm-epinet-pyt

TwitterandLinkedIn

0 comments

Add comment

Improving genomic language model reliability under distribution shift

Improving genomic language model reliability under distribution shift

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments