Long-read MitoScope reveals tissue-resolved somatic mitochondrial variation and landscape of nuclear-embedded mitochondrial sequences
Long-read MitoScope reveals tissue-resolved somatic mitochondrial variation and landscape of nuclear-embedded mitochondrial sequences
Zakarian, C.; Smith, J. D.; Wong, C. H.; Frazar, C. D.; Ryke, E.; McGee, S. R.; Richardson, M.; Weiss, J. M.; Munson, K. M.; Hoekzema, K.; Mack, T.; Kwon, Y.; Ou, J.; Neph, S. J.; Sohn, M.-H.; Minkina, A.; Bennett, J. T.; Stergachis, A. B.; Eichler, E. E.; Wei, C.-L.
AbstractThe mitochondrial genome (mtDNA), rich in repeats and prone to nuclear mitochondrial DNA segments (NUMTs), drives somatic mosaicism implicated in cancer, metabolic syndromes, and neurodegeneration, yet short-read sequencing yields incomplete catalogs, mapping artifacts, and false heteroplasmies. Here, we introduce MitoScope, a scalable long-read workflow to assemble mtDNA, perform high-fidelity variant calling, resolve heteroplasmy, and characterize NUMTs in benchmarking tissues from the Somatic Mosaicism Across Human Tissues (SMaHT) Network. MitoScope shows high sensitivity and precision, determines copy number, and uncovers low-frequency variants. We define an age- and tissue-dependent landscape of mtDNA mosaicism, including low-frequency pathogenic heteroplasmies, a bimodal heteroplasmy spectrum shaped by purifying selection, and age-accumulating deletions enriched for microhomology. Parallel profiling of NUMTs identifies high-confidence events with >2-fold more NUMTs than short-read surveys--with evidence of nonrandom trinucleotide contexts at breakpoints. These findings expose pervasive, tissue-resolved somatic mtDNA and NUMT instability with direct relevance for variant interpretation, aging, and human disease.