ALPINE: A Scalable Pipeline for Comprehensive Classification of Gene-Editing Outcomes from Long-Read Amplicon Sequencing
ALPINE: A Scalable Pipeline for Comprehensive Classification of Gene-Editing Outcomes from Long-Read Amplicon Sequencing
Chen, Y.; Gao, X.-H.; Vichas, A.; Wang, J.; Golhar, R.; Neuhaus, I.
AbstractCRISPR genome editing has enabled precise genetic modification for gene and cell therapies, but edits often produce heterogeneous on-target outcomes, including homology-directed repair (HDR) knock-ins, DNA repair template integrations, and structural variants. Existing tools are frequently limited to short reads or lack viral vector-specific integration categories needed for therapeutic development. Here, we present ALPINE (Amplicon Long-read Pipeline for INtegration Evaluation), a scalable and reproducible pipeline for classifying and quantifying gene-editing outcomes from long-read amplicon sequencing. ALPINE classifies reads into 10+ categories, including DNA repair vector integration subtypes, and performs variant calling near the gene-edited site with batch, multi-sample reporting. Uniquely, ALPINE can distinguish between cells treated with multiple DNA repair vectors and identify distinct molecular features, such as inverted terminal repeats (ITRs), enabling comprehensive characterization of complex gene editing outcomes. Benchmarking on simulated datasets showed high accuracy, and application to edited T cell samples demonstrated comprehensive gene-editing outcome profiling. Supplementary data are available online. Availability: ALPINE is implemented in Python and distributed as Docker containers with Common Workflow Language (CWL) support for cloud deployment. The pipeline is available under MIT license at https://github.com/Maggi-Chen/ALPINE.