Skip to main content

Table 1 Comparison between different current models that predict water solubility

From: Prediction of organic compound aqueous solubility using machine learning: a comparison study of descriptor-based and fingerprints-based models

Developer

Data Preparation Method

Total Size

ML Method

R2 Test Value7

MAE8

RMSE9

SEP10

Refs

Huuskonen

Descriptor-Based

1297

MLR1

0.88

–

0.71

–

[10]

ANN3

0.92

–

0.60

–

Yan

Descriptor-Based

1293

MLR

0.82

0.68

0.79

–

[11]

ANN

0.96

0.49

0.59

–

Delaney

Descriptor-Based

2874

MLR

0.71

0.68

0.87

–

[12]

Hou

Group Contribution

1294

MLR

0.9

0.52

0.63

–

[2]

Ali

Descriptor-Based

1290

MLR

0.73

0.72

0.94

–

[13]

Sorkun

Descriptor-Based

1290

Ensemble of ANN, RF2, and XGB4

0.93

0.397

0.53

–

[14]

Le

Descriptor-Based

4376

MLR

0.89

–

–

0.75

[15]

MLREM5

0.88

–

–

0.76

BRANNLP6

0.90

–

–

0.66

  1. Total size in this table stands for the number of datasets used to train each of the algorithms
  2. 1MLR: Multilinear Regression; 2RF: Random Forest; 3ANN: Artificial Neural Network; 4XGB: Gradient Boosted Trees; 5MLREM: multiple linear regression with expectation maximization; 6BRANNLP: Bayesian regularized artificial neural network with a Laplacian prior; 7R2: squared coefficient of determination; 8MAE: mean absolute error; 9RMSE: root-mean-square deviation; 10SEP: standard error of prediction