Ali-U-Net: A Convolutional Transformer Neural Net for Multiple Sequence Alignment of DNA Sequences. A proof of concept
Ali-U-Net: A Convolutional Transformer Neural Net for Multiple Sequence Alignment of DNA Sequences. A proof of concept
Arsic, P.; Mayer, C.
AbstractWe report a convolutional transformer neural network that is capable of aligning multiple nucleotide sequences. The neural network is based on the U-Net commonly used in image segmentation which we employ to transform unaligned sequences to aligned sequences. For alignment scenarios our Ali-U-Net neural network has been trained on, it is in most cases more accurate than programs such as MAFFT, T-Coffee, MUSCLE, and Clustal Omega, while being considerably faster than similarly accurate programs on a single CPU core. Limitations are that the neural network is still trained specifically for certain alignment problems and can perform poorly for gap distributions it has not seen before. Furthermore, the algorithm currently works with fixed-size alignment windows of 48x48 or 96x96 nucleotides. At this stage, we view our study as a proof of concept, confident that the present findings can be extended to larger alignments and more complex alignment scenarios in the near future.