Predicting Discrete Structural Transformations in Small Molecules from Tandem Mass Spectrometry
Predicting Discrete Structural Transformations in Small Molecules from Tandem Mass Spectrometry
Wang, X.; Kiler, G.; Herrera-Rosero, D.; Shahneh, M. R.; Strobel, M.; Geibel, C.; El Abiead, Y.; Phelan, V. V.; Petras, D.; Wang, M.
AbstractTandem mass spectrometry (MS/MS) fragments molecules into smaller pieces, generating spectra composed of m/z values and intensities that encode structural information for molecular annotation. With increasing mass spectrometry data acquisition speeds, manual annotation from MS/MS lags far behind data generation and remains a bottleneck in metabolite annotation. Current computational methods, such as molecular networking, address this challenge by organizing similar structures into families of related compounds. However, they generally provide only similarity scores, offering weak actionable insights for structural annotation. To address this limitation, we present the Molecular Transformation Graph Edit Measure (MT-GEM), a distance metric that quantifies discrete structural transformations between molecules through graph edge removals that approximate structural modifications. Building on this metric, we developed an ensemble machine learning architecture, the Spectrum Transformation Edit Predictor (STEP), that builds upon TransExION and DREAMS to predict MT-GEM distances from MS/MS spectra. STEP achieves an average precision of 48.4% for identifying single structural transformations between MS/MS pairs, representing more than a tenfold improvement over state-of-the-art similarity metrics, including spectral entropy similarity (3.8%) and modified cosine (2.5%). On experimental human gut microbial community data, STEP identifies 3 times more single-transformation metabolite pairs than feature-based molecular networking at equivalent precision. In a discovery application, STEP highlights one drug metabolite and two new natural product analogs missed by modified cosine in feature-based molecular networking. By providing discrete transformation predictions rather than continuous similarity scores, MT-GEM and STEP enable hypothesis-driven metabolite annotation with testable structural modifications, which we envision will accelerate discovery of new molecules from MS/MS metabolomics datasets.