DeSpotX: Identifiability-Based Decontamination for Spatial Transcriptomics
DeSpotX: Identifiability-Based Decontamination for Spatial Transcriptomics
Wang, R. H.; Gentles, A. J.
AbstractSpatial transcriptomics (ST) at single-cell resolution profiles gene expression in its native spatial context, but a substantial fraction of transcripts contaminate neighboring cells, compromising downstream biological analyses. Existing decontamination methods rely on heuristic priors and either ignore the spatial structure of contamination or aggregate over neighbors without separating contamination from native expression, leaving the decomposition ambiguous. To resolve this ambiguity, we introduce DeSpotX, a deep generative model that uses anchor genes, defined as genes not natively expressed in a given cell cluster, to constrain the contamination decomposition and make it identifiable. DeSpotX further uses spatial information to estimate contamination locally through a cluster-masked, distance-weighted average over neighboring cells, and prevents over-correction of low-expression signal through a learned diffusion prior. On spike-in simulations across five datasets and four ST platforms, DeSpotX achieves AUROC >0.94 on every dataset, with gains of 0.02 to 0.12 over the best baseline, and remains robust to inaccuracies in the cell-cluster annotation and in anchor gene construction. On real tissues, we show that the decontaminated counts produce improved marker-gene specificity, more spatially coherent expression, and cell-cell communication networks consistent with known biology. We further show that iterating decontamination and cell-cluster annotation refines these outcomes, reassigning ligand-receptor signaling to the expected source cells in mouse brain and breast cancer tissues.