scConcept enables concept-level exploration of single-cell transcriptomic data
scConcept enables concept-level exploration of single-cell transcriptomic data
Chen, H.; Li, Y.
AbstractInterpreting high-dimensional single-cell transcriptomic data remains challenging, as existing methods rely on latent representations or prior knowledge that require extensive post hoc analysis to derive biologically meaningful insights. Topic models provide interpretable gene-level signals but often produce redundant and coarse-grained programs that are difficult to translate into coherent biological concepts. While recent foundation models and large language models (LLMs) show promise, they are not readily applicable to large-scale single-cell data or fail to provide structured, cell-level interpretations. Here we present scConcept, a framework that introduces concept-level representation by transforming gene-level topic representations into structured, human-interpretable biological concepts. By integrating neural topic modeling with LLMs, scConcept distills fragmented gene programs into semantically coherent concepts defined by a biological label, description, and gene set, and quantitatively maps them back to individual cells. Across 16 single-cell datasets, scConcept improves clustering performance by 27.1\% and interpretability by 50.7\% over state-of-the-art methods. These concept-level representations enable interpretable cell-state annotation and capture gene programs that generalize across datasets. In cancer applications, scConcept identifies clinically relevant programs associated with tumor progression and patient survival, and links them to candidate therapeutic targets. Together, scConcept establishes concept-level representation as a general and scalable abstraction for interpretable single-cell analysis.