

Even for drugs that have been tested in clinical, patient responses to the drug can remarkably vary. This discrepancy is responsible for the high cost and low success rate of drug discovery. Unfortunately, the activity of a compound in vitro is poorly correlated with its efficacy in humans. In the early stage of drug discovery, cell-line and other in vitro models have been extensively applied to screen drug candidates. However, the success of such predictive models largely relies on the availability of sufficient amounts of high-quality labelled data. Omics profiling, particularly transcriptomics, is a powerful technique to characterize cellular activity under various conditions, allowing the development of machine learning models for personalized phenotype compound screening 1, 2, 3. Our results are consistent with existing clinical observations, suggesting the potential of CODE-AE in developing personalized therapies and drug response biomarkers. Using CODE-AE, we screened 59 drugs for 9,808 patients with cancer. Extensive comparative studies demonstrated that CODE-AE effectively alleviated the out-of-distribution problem for the model generalization and significantly improved accuracy and robustness over state-of-the-art methods in predicting patient-specific clinical drug responses purely from cell-line compound screens. Here we have developed a novel context-aware deconfounding autoencoder (CODE-AE) that can extract intrinsic biological signals masked by context-specific patterns and confounding factors.

Although many methods have been developed to utilize cell-line screens for predicting clinical responses, their performances are unreliable owing to data heterogeneity and distribution shift. However, patient data are often too scarce to train a generalized machine learning model. Accurate and robust prediction of patient-specific responses to a new compound is critical to personalized drug discovery and development.
