ADMET evaluation in drug discovery. 20. Prediction of breast cancer resistance protein inhibition through machine learning

Jiang, Dejun; Lei, Tailong; Wang, Zhe; Shen, Chao; Cao, Dongsheng; Hou, Tingjun

doi:10.1186/s13321-020-00421-y

Table 5 The reported classification models for BCRP inhibitors and non-inhibitors

From: ADMET evaluation in drug discovery. 20. Prediction of breast cancer resistance protein inhibition through machine learning

Year	Data size	Data set		Method	Descriptors	Model validation	Statistical results	Refs.
Year	Data size	Training	Test	Method	Descriptors	Model validation	Statistical results	Refs.
2007	123	80	43	OPLS-DA	Descriptors from SELMA software package	Y-rand	GA_TE = 0.79	Matsson et al. [16]
2009	122	83	39	PLS-DA	Descriptors from DragonX version 3.0	Y-rand	^aNA	Matsson et al. [115]
2013	109	30	79	Pharmacophore modeling	NA	NA	MCC_TE = 0.29, GA_TE = 0.66	Pan et al. [11]
2013	203	124	79	NB	ECFP_6, FCFP_6 fingerprints	LOO CV	AUC_{TR(LOO CV)} = 0.795, MCC_TE = 0.69	Pan et al. [11]
2013	382	382	NA	SVM, k-NN, RF, and consensus modeling	Dragon, MOE descriptors	Fivefold CV, Y-rand	BA_{TR(fivefold cv)} = 0.83 ± 0.04 (Consensus)	Sedykh et al. [121]
2014	275	96	Test: 32, external set: 147	ensembles of ANN, ensembles of SVM	Descriptors from ADMET Modeler	NA	GA_TE = 0.87, GA_External = 0.67 (ensembles of ANN)	Eric et al. [122]
2014	780	780	NA	NB	ECFP_6 fingerprints	Tenfold CV	GA_{TR(tenfold CV)} = 0.919, AUC_{TR(tenfold cv)} = 0.854	Montanari et al. [20]
2015	394	197	Test: 99, external set: 98	SVM, k-NN, ANN, and Consensus Modeling	Dragon descriptors	NA	GA_TE = 0.878, MCC_TE = 0.73; GA_External = 0.745, MCC_External = 0.46 (ANN)	Belekar et al. [21]
2016	^aNA	NA	NA	GTM-kNNd, GTM-Bayes, RF, SVM, and k-NN	MOE descriptors	Fivefold CV with five repetitions	NA	Gimadiev et al. [123]
2017	978	978	NA	NB, LR, SVM, and RF	MACCS, Morgan, ECFP8 fingerprints, VolSurf descriptors	Tenfold CV, leave-sources-out validation	MCC_{TR(tenfold CV)} = 0.65, AUC_{TR(tenfold CV)} = 0.90 (LR)	Montanari et al. [22]
2019	2799	2240	559	NB, LR, SVM, k-NN, XGBoost, SGB, DNN and consensus modeling	MOE descriptors and Pubchem fingerprints	Fivefold CV	MCC_TE = 0.812, AUC_TE = 0.958, GA_TE = 0.911, BA_TE = 0.905 (SVM)	This study

Mean ± st.dev across fivefold CV
TR training set, TE test set, OPLS-DA orthogonal partial least-squares projection to latent structures discriminant analysis, NA not available, GA global accuracy, Y-Rand Y-Randomization test, PLS-DA partial least-squares projection to latent structures discriminant analysis, NB Naive Bayes, LOO CV leave-one-out cross-validation, AUC the area under the receiver operating characteristic curve, MCC Matthews correlation coefficient, SVM support vector machine, k-NN k-nearest neighbors, RF random forest, CV cross-validation, BA balanced accuracy, ANN artificial neural networks, GTM generative topographic mapping, LR logistic regression
There are many models developed based on different methods or descriptors, and we only extracted the best statistical results for the test set or cross-validation
^aThe exact values are not available in the publication

Back to article page

ISSN: 1758-2946

Contact us

Submission enquiries: journalsubmissions@springernature.com

Journal of Cheminformatics

Contact us