ALPINE: A Scalable Pipeline for Comprehensive Classification of Gene-Editing Outcomes from Long-Read Amplicon Sequencing

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

ALPINE: A Scalable Pipeline for Comprehensive Classification of Gene-Editing Outcomes from Long-Read Amplicon Sequencing

Authors

Chen, Y.; Gao, X.-H.; Vichas, A.; Wang, J.; Golhar, R.; Neuhaus, I.

Abstract

CRISPR genome editing has enabled precise genetic modification for gene and cell therapies, but edits often produce heterogeneous on-target outcomes, including homology-directed repair (HDR) knock-ins, DNA repair template integrations, and structural variants. Existing tools are frequently limited to short reads or lack viral vector-specific integration categories needed for therapeutic development. Here, we present ALPINE (Amplicon Long-read Pipeline for INtegration Evaluation), a scalable and reproducible pipeline for classifying and quantifying gene-editing outcomes from long-read amplicon sequencing. ALPINE classifies reads into 10+ categories, including DNA repair vector integration subtypes, and performs variant calling near the gene-edited site with batch, multi-sample reporting. Uniquely, ALPINE can distinguish between cells treated with multiple DNA repair vectors and identify distinct molecular features, such as inverted terminal repeats (ITRs), enabling comprehensive characterization of complex gene editing outcomes. Benchmarking on simulated datasets showed high accuracy, and application to edited T cell samples demonstrated comprehensive gene-editing outcome profiling. Supplementary data are available online. Availability: ALPINE is implemented in Python and distributed as Docker containers with Common Workflow Language (CWL) support for cloud deployment. The pipeline is available under MIT license at https://github.com/Maggi-Chen/ALPINE.

Follow Us on

0 comments

Add comment