High-dimensional Biomarker Identification for Scalable and Interpretable Disease Prediction via Machine Learning Models

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

High-dimensional Biomarker Identification for Scalable and Interpretable Disease Prediction via Machine Learning Models

Authors

Dai, Y.; Zou, F.; Zou, B.

Abstract

Omics data generated from high-throughput technologies and clinical features jointly impact many complex human diseases. Identifying key biomarkers and clinical risk factors is essential for understanding disease mechanisms and advancing early disease diagnosis and precision medicine. However, the high-dimensionality and intricate associations between disease outcomes and omics profiles present significant analytical challenges. To address these, we propose an ensemble data-driven biomarker identification tool, Hybrid Feature Screening (HFS), to construct a candidate feature set for downstream advanced machine learning models. The pre-screened candidate features from HFS are further refined using a computationally efficient permutation-based feature importance test, forming the comprehensive High-dimensional Feature Importance Test (HiFIT) framework. Through extensive numerical simulations and real-world applications, we demonstrate HiFITs superior performance in both outcome prediction and feature importance identification. An R package implementing HiFIT is available on GitHub.

Follow Us on

0 comments

Add comment