Skip to main content

Table 1 Summary of the datasets employed in this study

From: Tuning gradient boosting for imbalanced bioassay modelling with custom loss functions

Name

Source

Tasks

Compounds per task

Imbalance ratio

Tox21

MoleculeNet

12

6400

1:16

HIV

MoleculeNet

1

40748

1:27

MUV

MoleculeNet

17

14000

1:511

Phosphatase

MolData

5

330000

1:325

NTPase

MolData

6

330000

1:2963

HTS

Merck KGaA

1

 > 330000

1:140

  1. For a given dataset, the number of compounds per task and imbalance ratio are reported as averages across all tasks