The ABC transporter P-glycoprotein (P-gp) actively transports a wide range of drugs and toxins out of cells, and is therefore related to multidrug resistance and the ADME profile of therapeutics. Thus, development of predictive in silico models for the identification of P-gp inhibitors is of great interest in the field of drug discovery and development. So far in-silico P-gp inhibitor prediction was dominated by ligand-based approaches, due to the lack of high-quality structural information about P-gp. The present study aims at comparing the P-gp inhibitor/non-inhibitor classification performance obtained by docking into a homology model of P-gp, to supervised machine learning methods, such as Kappa nearest neighbor, support vector machine (SVM), random forest and binary QSAR, by using a large, structurally diverse data set. In addition, the applicability domain of the models was assessed using an algorithm based on Euclidean distance. Results show that random forest and SVM performed best for classification of P-gp inhibitors and non-inhibitors, correctly predicting 73/75 % of the external test set compounds. Classification based on the docking experiments using the scoring function ChemScore resulted in the correct prediction of 61 % of the external test set. This demonstrates that ligand-based models currently remain the methods of choice for accurately predicting P-gp inhibitors. However, structure-based classification offers information about possible drug/protein interactions, which helps in understanding the molecular basis of ligand-transporter interaction and could therefore also support lead optimization.
Journal of Chemical Information and Modeling, 2014, Vol 54, Issue 1