Skip to main content

Table 1 Summary of the distribution changes for the descriptors chosen

From: SIMPD: an algorithm for generating simulated time splits for validating machine learning approaches

Property

Sign

Frac

Median (frac change)

Median (train)

Number of projects

SA score

1

0.88

0.09

2.8

109

HeavyAtomCount

1

0.75

0.09

31.0

114

TPSA

1

0.76

0.14

88.6

109

fr_benzene/1000 HeavyAtoms

-1

0.81

0.19

43.5

118

  1. The meanings of the columns are: Sign = sign of the difference between training and test; Frac = fraction of the data sets showing a change with that sign; Median(frac change) = median fractional change of the value across the data sets; Median(train) = median value of the property in the training set; Number of projects = number of projects where the difference in the training/test distributions was statistically significant (see text)