A Systematic Approach Toward Implementing Machine Learning Techniques to Analyze Gut Microbiome Data

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

A Systematic Approach Toward Implementing Machine Learning Techniques to Analyze Gut Microbiome Data

Authors

Jahanikia, S.; Taada, A.; George, A.; Biruduraju, D.; Lu, E.; Singh, I.; Chhajer, K.; Wang, M.; Pentela, T.

Abstract

This study investigates the relationship between the gut microbiota and specific diseases. Data was collected from the Human Gut Microbiome Atlas, which examines regional variations across 20 countries on five continents, categorizing microbial species by taxonomy, from genus to species. The Atlas provides color-coded phylum classifications, numerical species counts within the same genus, and an analysis of dysbiosis-related associations with 23 diseases, as well as region-enriched species. The data stratified samples into distinct categories such as westernized, non-westernized, cancerous, and non-cancerous. The findings demonstrate that tree-based ensemble methods, such as Bagging and Boosting prediction methods, achieved the highest accuracies across all categories due to their robustness in handling the complex, high-dimensional data. The XGBoost model yielded the strongest predictive performance, achieving 91% accuracy for westernized cancer-associated samples, 84% accuracy for non-westernized cancer-associated samples, 92% accuracy for westernized samples, and 78% for non-westernized samples. Additionally, advanced topological data analysis was used to assess the global structure and underlying patterns within the dataset.

Follow Us on

0 comments

Add comment