Facility-Scale Workflows for Data Acquisition, Standardization, Machine Learning Analysis, and Reproducible Science

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Facility-Scale Workflows for Data Acquisition, Standardization, Machine Learning Analysis, and Reproducible Science

Authors

Madugula, S. S.; Brown, S. R.; Bible, A. N.; Solsona, R. M.; Checa, M.; Massenburg, L.; Williams, A. N.; Collins, L.; Harris, S. B.; Morrell-Falvey, J.; Retterer, S. T.; Vasudevan, R. K.

Abstract

Scientific user facilities routinely generate large-scale microscopy datasets across diverse instruments and vendors, differing substantially in file formats, dimensionality, and resolution. Beyond these inconsistencies, datasets are frequently fragmented living across isolated instruments and constrained by security policies and uneven metadata practices. Consequently, tracking, standardizing, processing, and visualizing these datasets in a manner compatible with modern machine learning and autonomous experimentation workflows remains a major challenge. While existing initiatives address data archiving, standardization, or analysis individually, few provide integrated solutions that bridge instrument-level acquisition and scalable ML workflows within heterogeneous, security-constrained user facilities. Here, we establish a deployable, facility-scale infrastructure that bridges instrument-level data generation with cloud-based ML analytics while remaining compliant with institutional network constraints. Our framework integrates on-premises cloud computing, the in-house Pycroscopy ecosystem, and an open-source metadata management platform to transform heterogeneous microscopy datasets into standardized, ML-ready representations. We demonstrate this approach across distinct microscopy modalities through end-to-end workflows encompassing metadata capture, format harmonization, automated database ingestion, segmentation-based ML inference, and interactive visualization. By structurally separating acquisition from cloud-based analysis services, the framework enables scalable model deployment and iterative refinement without direct connectivity to instrument computers. Together, this work provides a reproducible blueprint for facility-scale data and AI infrastructure, enabling ML-ready analytics, metadata traceability, and future autonomous experimentation workflows in microscopy-driven research

Follow Us on

0 comments

Add comment