Skip to main content

Table 2 Label and data imbalance of different folding methods averaged over all tasks of four partners and the ChEMBL subset. Fraction below 05: fraction of tasks below five compounds in one or more folds, fraction label imbalance: fraction of tasks where the fold standard deviation of the fraction of actives was greater than 0.05

From: Splitting chemical structure data sets for federated privacy-preserving machine learning

Fold method

Task size bin lower limit

Fraction below 05

Fraction label imbalance

LSH

10

0.90

0.35

100

0.29

0.37

1000

0.08

0.11

10000

0.03

0.00

100000

0.00

0.00

Sphere exclusion

10

0.95

0.46

100

0.37

0.49

1000

0.11

0.24

10000

0.04

0.00

100000

0.05

0.00

Scaffold network

10

0.96

0.58

100

0.46

0.64

1000

0.10

0.29

10000

0.04

0.08

100000

0.08

0.12

Random

10

0.67

0.07

100

0.18

0.05

1000

0.05

0.00

10000

0.00

0.00