Skip to main content

Table 2 Label and data imbalance of different folding methods averaged over all tasks of four partners and the ChEMBL subset. Fraction below 05: fraction of tasks below five compounds in one or more folds, fraction label imbalance: fraction of tasks where the fold standard deviation of the fraction of actives was greater than 0.05

From: Splitting chemical structure data sets for federated privacy-preserving machine learning

Fold method Task size bin lower limit Fraction below 05 Fraction label imbalance
LSH 10 0.90 0.35
100 0.29 0.37
1000 0.08 0.11
10000 0.03 0.00
100000 0.00 0.00
Sphere exclusion 10 0.95 0.46
100 0.37 0.49
1000 0.11 0.24
10000 0.04 0.00
100000 0.05 0.00
Scaffold network 10 0.96 0.58
100 0.46 0.64
1000 0.10 0.29
10000 0.04 0.08
100000 0.08 0.12
Random 10 0.67 0.07
100 0.18 0.05
1000 0.05 0.00
10000 0.00 0.00