Skip to main content

Table 5 The 76 datasets used for our model building experiments

From: Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability

Type Dataset/group Num Compounds Active In-active Source
Balanced AMES 1 4337 2401 1936 [47]
Balanced CPDBAS 5 1102.6 545.8 556.8 [48]
Balanced NCTRER 1 217 126 91 [33]
Virtual-screening ChEMBL 50 10,100 100 10,000 [6, 49]
Virtual-screening DUD 3 1822.3 42 1780.3 [6, 50]
Virtual-screening MUV 16 15,026.8 30 14,996.8 [6, 51]
  1. Multiple occurrences of the same compound are inserted only once. E.g., some of the originally 15,000 decoys for each MUV dataset are removed. In case, multiple occurrences have differing endpoint values, the compound is omitted. Only 5 of 7 endpoints from the CPDBAS dataset could be used for this study as two endpoints (Hamster and Dog/Primates) are to small and yield less than 1024 ECFP4 fragments. A more detailed list of datasets is provided in Additional file 2