Ali-U-Net: A Convolutional Transformer Neural Net for Multiple Sequence Alignment of DNA Sequences. A proof of concept

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Ali-U-Net: A Convolutional Transformer Neural Net for Multiple Sequence Alignment of DNA Sequences. A proof of concept

Authors

Arsic, P.; Mayer, C.

Abstract

We report a convolutional transformer neural network that is capable of aligning multiple nucleotide sequences. The neural network is based on the U-Net commonly used in image segmentation which we employ to transform unaligned sequences to aligned sequences. For alignment scenarios our Ali-U-Net neural network has been trained on, it is in most cases more accurate than programs such as MAFFT, T-Coffee, MUSCLE, and Clustal Omega, while being considerably faster than similarly accurate programs on a single CPU core. Limitations are that the neural network is still trained specifically for certain alignment problems and can perform poorly for gap distributions it has not seen before. Furthermore, the algorithm currently works with fixed-size alignment windows of 48x48 or 96x96 nucleotides. At this stage, we view our study as a proof of concept, confident that the present findings can be extended to larger alignments and more complex alignment scenarios in the near future.

Follow Us on

0 comments

Add comment