Rapid-PFP: Accelerating Prefix-Free Parsing with GPU Parallelism
Rapid-PFP: Accelerating Prefix-Free Parsing with GPU Parallelism
Ferro, E. A.; Pencinger, T.; Green, O.; Lotfollahi, M.; Boucher, C.
AbstractPrefix-Free Parsing (PFP) is widely used in genomic data processing to construct compressed indexes on massive, highly repetitive datasets. However, existing CPU implementations are constrained by sequential bottlenecks, limiting their ability to scale to large-scale modern pangenomic collections. We introduce RAPID-PFP, a redesigned implementation of the PFP algorithm that takes advantage of the massive parallelism and high memory bandwidth of modern GPUs. RAPID-PFP parallelizes trigger-string detection, phrase parsing, dictionary construction, and parse generation through custom CUDA kernels and GPU-resident data structures built using cuDF, CuPy, and Numba-CUDA. The algorithm operates entirely within GPU memory, minimizes host interaction, and dynamically adapts to available VRAM, enabling efficient processing in a range of hardware configurations. Across E. coli and Human Pangenome (HPRC) datasets, RAPID-PFP produces identical output to established CPU pipelines while delivering an order-of-magnitude acceleration. On 3,682 E. coli assemblies, RAPID-PFP reduces runtime from 552 seconds to 17 seconds compared to PFP-FL (32.1 times) and from 1,078 seconds to 17 seconds compared to PFP-ITL (62.6 times). On the complete 46-sample HPRC dataset, RAPID-PFP achieves a 33.4 time speedup and successfully processes scales that PFP-ITL cannot handle. Performance improves with dataset size, reflecting that PFP maps naturally onto thousands of CUDA cores, yielding sublinear scaling relative to CPU implementations. RAPID-PFP demonstrates that foundational compressed-indexing algorithms can be re-engineered for accelerators, enabling scalable and practical preprocessing for large-scale genomic indexing workflows.