QuickEd: High-performance exact sequence alignment based on bound-and-align
QuickEd: High-performance exact sequence alignment based on bound-and-align
Doblas, M.; Lostes-Cazorla, O.; Aguado-Puig, Q.; Iniguez, C.; Moreto, M.; Marco-Sola, S.
AbstractPairwise sequence alignment is a core component of multiple sequencing-data analysis tools. Recent advancements in sequencing technologies have enabled the generation of longer sequences at a much lower price. Thus, long-read sequencing technologies have become increasingly popular in sequencing-based studies. However, classical sequence analysis algorithms face significant scalability challenges when aligning long sequences. As a result, several heuristic methods have been developed to improve performance at the expense of accuracy, as they often fail to produce the optimal alignment. This paper introduces QuickEd, a sequence alignment algorithm based on a bound-and-align strategy. First, QuickEd effectively bounds the maximum alignment-score using efficient heuristic strategies. Then, QuickEd utilizes this bound to reduce the computations required to produce the optimal alignment. Using QuickEd\'s bound-and-align strategy, we reduce O(n^2) complexity of traditional dynamic programming algorithms to O(n[s]), where n is the sequence length and [s] is an estimated upper bound of the alignment-score between the sequences. As a result, QuickEd is consistently faster than other state-of-the-art implementations, such as Edlib and BiWFA, achieving performance speedups of 1.6-7.3x and 2.1-2.5x, respectively, aligning long and noisy datasets. In addition, QuickEd maintains a stable memory footprint below 50 MB while aligning sequences up to 1 Mbp