From: MolData, a molecular benchmark for disease and target based machine learning
PubChem Source | AID count | Active data points | Total data points | % Active datapoints | Unique active molecules | Total unique molecules | % Unique active molecules |
---|---|---|---|---|---|---|---|
Broad Institute | 67 | 125,627 | 22.2Â m | 0.56% | 85,579 | 472,858 | 18.1% |
Burnham Center for Chemical Genomics | 67 | 139,021 | 21.9Â m | 0.63% | 77,159 | 381,794 | 20.21% |
Emory University Molecular Libraries Screening Center | 12 | 24,195 | 2.47Â m | 0.98% | 20,964 | 348,231 | 6.02% |
ICCB-Longwood Screening Facility, Harvard Medical School | 11 | 8358 | 2.1Â m | 0.39% | 6656 | 564,021 | 1.18% |
Johns Hopkins Ion Channel Center | 22 | 48,545 | 6.8Â m | 0.71% | 35,487 | 344,497 | 10.30% |
NMMLSC | 42 | 48,186 | 11.5Â m | 0.42% | 37,949 | 369,431 | 10.27% |
National Center for Advancing Translational Sciences (NCATS) | 174 | 720,319 | 53.4Â m | 1.35% | 240,096 | 592,616 | 40.51% |
The Scripps Research Institute Molecular Screening Center | 148 | 275,224 | 47.6Â m | 0.58% | 142,055 | 920,418 | 15.43% |
Tox21 | 57 | 21,475 | 0.47Â m | 5.67% | 4183 | 8743 | 47.84% |