Skip to main content

Table 5 The reported classification models for BCRP inhibitors and non-inhibitors

From: ADMET evaluation in drug discovery. 20. Prediction of breast cancer resistance protein inhibition through machine learning

Year

Data size

Data set

Method

Descriptors

Model validation

Statistical results

Refs.

Training

Test

2007

123

80

43

OPLS-DA

Descriptors from SELMA software package

Y-rand

GATE = 0.79

Matsson et al. [16]

2009

122

83

39

PLS-DA

Descriptors from DragonX version 3.0

Y-rand

aNA

Matsson et al. [115]

2013

109

30

79

Pharmacophore modeling

NA

NA

MCCTE = 0.29, GATE = 0.66

Pan et al. [11]

2013

203

124

79

NB

ECFP_6, FCFP_6 fingerprints

LOO CV

AUCTR(LOO CV) = 0.795, MCCTE = 0.69

Pan et al. [11]

2013

382

382

NA

SVM, k-NN, RF, and consensus modeling

Dragon, MOE descriptors

Fivefold CV, Y-rand

BATR(fivefold cv) = 0.83 ± 0.04 (Consensus)

Sedykh et al. [121]

2014

275

96

Test: 32, external set: 147

ensembles of ANN, ensembles of SVM

Descriptors from ADMET Modeler

NA

GATE = 0.87, GAExternal = 0.67 (ensembles of ANN)

Eric et al. [122]

2014

780

780

NA

NB

ECFP_6 fingerprints

Tenfold CV

GATR(tenfold CV) = 0.919, AUCTR(tenfold cv) = 0.854

Montanari et al. [20]

2015

394

197

Test: 99, external set: 98

SVM, k-NN, ANN, and Consensus Modeling

Dragon descriptors

NA

GATE = 0.878, MCCTE = 0.73; GAExternal = 0.745, MCCExternal = 0.46 (ANN)

Belekar et al. [21]

2016

aNA

NA

NA

GTM-kNNd, GTM-Bayes, RF, SVM, and k-NN

MOE descriptors

Fivefold CV with five repetitions

NA

Gimadiev et al. [123]

2017

978

978

NA

NB, LR, SVM, and RF

MACCS, Morgan, ECFP8 fingerprints, VolSurf descriptors

Tenfold CV, leave-sources-out validation

MCCTR(tenfold CV) = 0.65, AUCTR(tenfold CV) = 0.90 (LR)

Montanari et al. [22]

2019

2799

2240

559

NB, LR, SVM, k-NN, XGBoost, SGB, DNN and consensus modeling

MOE descriptors and Pubchem fingerprints

Fivefold CV

MCCTE = 0.812, AUCTE = 0.958, GATE = 0.911, BATE = 0.905 (SVM)

This study

  1. Mean ± st.dev across fivefold CV
  2. TR training set, TE test set, OPLS-DA orthogonal partial least-squares projection to latent structures discriminant analysis, NA not available, GA global accuracy, Y-Rand Y-Randomization test, PLS-DA partial least-squares projection to latent structures discriminant analysis, NB Naive Bayes, LOO CV leave-one-out cross-validation, AUC the area under the receiver operating characteristic curve, MCC Matthews correlation coefficient, SVM support vector machine, k-NN k-nearest neighbors, RF random forest, CV cross-validation, BA balanced accuracy, ANN artificial neural networks, GTM generative topographic mapping, LR logistic regression
  3. There are many models developed based on different methods or descriptors, and we only extracted the best statistical results for the test set or cross-validation
  4. aThe exact values are not available in the publication