Controlling for DNA dosage with Whole Genome Sequencing improves ATAC-seq peak calling
Controlling for DNA dosage with Whole Genome Sequencing improves ATAC-seq peak calling
Vroland, C.; Salma, M.; Zhigulev, A.; Sahlen, P.; Soler, E.; Brehelin, L.; Lecellier, C.-H.
AbstractThe Assay for Transposase-Accessible Chromatin using sequencing (ATAC-seq) is a scalable and sensitive method for profiling chromatin accessibility, enabling the identification of cis-regulatory elements (CREs) that govern gene expression in diverse cellular contexts. Although ATAC-seq is routinely applied to both bulk and single-cell samples, we reveal that its peak calling process is compromised by local biases in DNA dosage, arising not only from copy number variations (CNVs) but also from DNA replication timing (RT). These biases can distort read coverage and compromise peak detection accuracy. As part of the FANTOM consortium's efforts to elucidate genomic regulation, we propose enhancing the MACS pipeline by integrating whole-genome sequencing (WGS) data to account for local DNA dosage effects, analogous to the use of input controls in ChIP-seq analyses. By incorporating WGS data, we demonstrate an increase in both the number and width of ATAC peaks, with improved proximity to transcription start sites (TSSs). WGS-controlled ATAC peaks exhibit canonical CRE epigenetic marks and are enriched for trait- and disease-associated genetic variants. Furthermore, the number of WGS-controlled peaks correlates more strongly with gene expression levels compared to peaks called without WGS control. Collectively, these results demonstrate that integrating WGS as a control significantly enhances the accuracy of ATAC-seq peak calling. Critically, we show that even low-depth WGS data is sufficient to improve peak calling performance, making this approach both cost-effective and readily adoptable for routine analyses. To ensure accessibility and reproducibility, we implemented this method as an open-source Nextflow pipeline. By challenging the assumption of uniform genomic visibility, our approach also holds broad implications for other DNA sequencing-based technologies.