Spurious correlation inflates performance in single-cell perturbation prediction
Spurious correlation inflates performance in single-cell perturbation prediction
Nicol, P. B.; Shivakumar, S.; Irizarry, R.
AbstractThe increasing number of computational methods designed to predict the effects of genetic perturbations on cellular gene expression profiles has led to a need for rigorous evaluation metrics. Recent benchmarking studies rely on correlation or cosine similarity of differential expression relative to a shared population of control cells. We show that these metrics are systematically inflated by statistical bias induced by reusing the same control population to define both quantities being compared. As a result, even non-informative methods can appear to perform well, particularly in datasets with limited numbers of control cells. Reanalysis of published datasets using a simple control-splitting procedure that removes this bias leads to a substantial reduction in performance previously attributed to biological signal.