StrataBionn: a neural network supervised classification method for microbial communities
StrataBionn: a neural network supervised classification method for microbial communities
Symons, A. E.; Huynh, A. V.; Cornejo, O. E.
AbstractThe classification of microbial communities into discrete states or community state types (CSTs) is fundamental to understanding host-microbiome interactions and their clinical implications. Traditional methods, such as the nearest-neighbor approaches, often struggle with the inherent noise, high dimensionality, and non-linear signatures of taxonomic profiles. We present a novel supervised framework for microbial community classification, leveraging an Artificial Neural Network (ANN) architecture implemented in a new tool we named StrataBionn. We rigorously evaluated our approach using large-scale vaginal microbiome datasets, directly benchmarking performance against VALENCIA and a Random Forest (RF) classifier. To demonstrate the versatility of our models, we further extended the framework to oral microbiome classification, assessing its stability across diverse anatomical sites. Our supervised models consistently outperformed the nearest-neighbor approach across all evaluated datasets. In the vaginal microbiome, our method achieved an 11.6% to 13.3% increase in performance across all primary metrics, including precision, recall, accuracy, and F1-score. Furthermore, we demonstrate that this performance advantage is maintained in the oral microbiome, highlighting the generalizability of our neural network and ensemble strategies to various microbial ecosystems without the need for niche-specific algorithmic adjustments. By capturing complex feature dependencies that distance-based methods overlook, our approach provides a more robust and accurate census of microbial community structures. StrataBionn's ability to learn classification schemes for any microbiome with high accuracy and explainability, through the use of provided utilities to visualize feature-space classification boundaries and perform perturbation analysis on trained classifiers, makes it ideal for broad application in micro-ecology research. This framework offers a scalable, high-performance alternative for microbiome researchers, facilitating more precise clinical stratification and biological insights across hosts body sites.