PocketBagger: Generalizable pocket druggability prediction via positive-unlabeled learning
PocketBagger: Generalizable pocket druggability prediction via positive-unlabeled learning
Gingrich, P. W.; Biswas, A.; Mica, I. L.; Brammer, K. M.; Shu, Z.; Maxwell, D. S.; Russell, K. P.; Al-Lazikani, B.
AbstractReliable structure-based prediction of small-molecule druggability is hindered by a fundamental labeling problem. Experimentally confirmed liganded sites (positives) are observable, but credible "undruggable" pockets (negatives) are almost impossible to define. Standard supervised machine learning consequently relies on arbitrary definitions of 'undruggable', leading to bias and false negatives. Here we introduce PocketBagger, a positive-unlabeled (PU) learning framework for pocket druggability prediction trained exclusively on experimentally determined Protein Data Bank (PDB) structures. PocketBagger uses PU bagging to learn key features associated with reliable 'druggable' pockets and considers all remaining pockets in the structurally characterized proteome as unlabeled. We demonstrate the capability of PocketBagger through the training of a simple Random Forest classifier and demonstrate its power in recall (0.804), even when challenged with increasingly difficult generalizability assessments and entire protein-family hold outs. We benchmark and demonstrate the added value of PU learning by comparing PocketBagger to a leading deep-learning predictor. However, PocketBagger is intended to be used as a framework for any model architecture. Along with the code, the data generated by PocketBagger are deployed in canSAR.ai, providing scalable, generalizable pocket druggability predictions to the drug discovery community.