CANDI: self-supervised, confidence-aware denoising imputation of genomic data

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

CANDI: self-supervised, confidence-aware denoising imputation of genomic data

Authors

Foroozandeh Shahraki, M.; Diab, A. R.; Libbrecht, M. W.

Abstract

Large-scale epigenomic datasets such as histone modifications and DNA accessibility have greatly advanced our understanding of genomic function. However, these measurements often suffer from noise, batch effects and irreproducibility. Epigenome imputation has emerged as a promising solution to these challenges. These methods integrate patterns across experiments, cell types, and genomic loci to predict the results of experiments, yielding predictions that often surpass observed data in quality. Thus, researchers increasingly leverage imputation for denoising data prior to downstream analysis. However, existing methods for imputation-based denoising have significant limitations. Here, we propose CANDI (Confidence-Aware Neural Denoising Imputer), a method for epigenome imputation that (1) predicts raw counts and handles experiment-specific covariates such as sequencing depth, (2) can (optionally) incorporate information from a low-quality existing experiment when predicting a target without retraining, and (3) outputs a calibrated measure of uncertainty. This approach is enabled using a Transformer model with self-supervised learning (SSL) training.

Follow Us on

0 comments

Add comment