Efficient Search of Ultra-Large Synthesis On-Demand Libraries with Chemical Language Models

Avatar
Poster
Voice is AI-generated
Connected to paperThis paper is a preprint and has not been certified by peer review

Efficient Search of Ultra-Large Synthesis On-Demand Libraries with Chemical Language Models

Authors

Heyer, K.; Yang, D.; Diaz, D. J.

Abstract

Ultra-large building block catalogs provide inexpensive access to billions of synthesis- on-demand molecules, but the combinatorial scale renders conventional virtual screening impractical. We present Vector Virtual Screen (VVS), a score-function-agnostic machine learning framework for efficient navigation of combinatorial libraries and rapid identifi- cation of promising molecules for experimental validation. VVS comprises four key innovations: (i) the Embedding Decomposer, which factors molecules into building blocks in latent space; (ii) ChemRank, a correlation-based loss that improves retrieval precision; (iii) BBKNN, an algorithm for nearest-neighbor search directly in building block space; and (iv) a multi-scale hill-climbing algorithm for gradient-based navi- gation of molecular embedding vector databases. Across diverse scoring functions, VVS consistently outperforms existing methods in retrieving high-scoring molecules while evaluating only a fraction of the library, achieving orders-of-magnitude run- time improvements. By turning ultra-large libraries into tractable search spaces, VVS enables virtual screening to keep pace with the rapid expansion of chemical space and adapt seamlessly to future advances in scoring functions.

Follow Us on

0 comments

Add comment