MycoCirc: A Pan-Fungal Multi-Modal Pretrained Model for Fungal circRNA Prediction from Genome Sequence

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

MycoCirc: A Pan-Fungal Multi-Modal Pretrained Model for Fungal circRNA Prediction from Genome Sequence

Authors

Hu, X.; Jin, Y.; Wang, J.; Yang, E.

Abstract

Motivation: Exploring the fungal circular RNA (circRNA) landscape is bottlenecked by both experimental and computational limits. While standard mRNA-seq systematically discards circRNAs due to their lack of poly(A) tails, high-cost total RNA-seq remains prohibitive for large-scale screening. Consequently, discovery relies heavily on computational prediction. However, existing models trained exclusively on human or plant sequences fail in fungi because of the extreme genomic divergence across fungal lineages, which span from intron-poor Candida to intron-rich filamentous fungi. As a result, no computational framework currently exists for de novo fungal circRNA prediction, leaving the vast majority of non-model fungi entirely inaccessible. Results: We present mycoCirc, the first end-to-end pan-fungal multi-modal pretrained model for fungal circRNA prediction, integrating three mandatory modalities with bidirectional cross-attention for donor-acceptor site interaction. Pre-trained on 22 strains with 16,483 positive gene-circRNA associations spanning Ascomycete yeast, Basidiomycete yeast, and Filamentous fungi groups and fine-tuned per group using 5-fold cross-validation, mycoCirc achieved AUROC 0.69-0.70 on held-out test species under Mode A (Genome+GTF, no RNA-seq), substantially outperforming JEDI (0.51-0.57) and CircPCBL (0.49-0.53). Cross-species evaluation on four independent fungi datasets demonstrated robust generalization across all fine-tuned variants (AUROC 0.63-0.72). Beyond gene-level classification, the JunctionEncoder module enabled backsplicing junction identification for detailed circRNA validation. We further build mycoCircAtlas, a companion database providing 319,860 high-confidence gene-circRNA predictions across 768 fungal species from Ensembl Fungi Release 113, enabling researchers to query precomputed predictions and design validation primers without local model deployment.

Follow Us on

0 comments

Add comment