ActSeekN: A Structural-Motif-Based Pipeline for Interpretable Enzyme Function Annotation

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

ActSeekN: A Structural-Motif-Based Pipeline for Interpretable Enzyme Function Annotation

Authors

Castillo, S.; Gu, C.; Jouhten, P.; Peddinti, G.; Ollila, S. O. H.

Abstract

Accurate enzyme function annotation remains a major bottleneck in genome analysis despite the rapid expansion of available protein sequence and structure data. Most existing methods rely on sequence similarity or machine-learning representations, which often perform poorly for proteins with low sequence identity or convergent evolutionary histories. Because enzymatic activity is determined by the three-dimensional arrangement of catalytic and binding-site residues, structure-based approaches offer a mechanistically grounded alternative. However, their broader application has been constrained by the limited size and coverage of curated active-site reference databases. To address this challenge, we developed ActSeekN, a structural-motif-based functional annotation pipeline that combines the ActSeek active-site search algorithm with a newly constructed large-scale reference database derived from AlphaFold-predicted structures, UniProt annotations, and curated catalytic residue information. This framework enables rapid and scalable identification of conserved catalytic motifs across structurally related proteins, allowing function to be transferred on the basis of local three-dimensional catalytic geometry rather than global sequence similarity. In this way, ActSeekN overcomes a central limitation of previous structure-based methods by expanding the searchable space of catalytic motifs while retaining mechanistic interpretability. Benchmarking against state-of-the-art machine-learning approaches demonstrates competitive or superior performance. Applications to yeast, human, and Trichoderma reesei proteomes refine existing annotations, complete partial EC assignments, and identify previously unrecognized enzymatic functions, highlighting ActSeekN as a powerful tool for genome annotation and biotechnology.

Follow Us on

0 comments

Add comment