Skip to main content

Table 1 Data

From: MolData, a molecular benchmark for disease and target based machine learning

PubChem Source

AID count

Active data points

Total data points

% Active datapoints

Unique active molecules

Total unique molecules

% Unique active molecules

Broad Institute

67

125,627

22.2 m

0.56%

85,579

472,858

18.1%

Burnham Center for Chemical Genomics

67

139,021

21.9 m

0.63%

77,159

381,794

20.21%

Emory University Molecular Libraries Screening Center

12

24,195

2.47 m

0.98%

20,964

348,231

6.02%

ICCB-Longwood Screening Facility, Harvard Medical School

11

8358

2.1 m

0.39%

6656

564,021

1.18%

Johns Hopkins Ion Channel Center

22

48,545

6.8 m

0.71%

35,487

344,497

10.30%

NMMLSC

42

48,186

11.5 m

0.42%

37,949

369,431

10.27%

National Center for Advancing Translational Sciences (NCATS)

174

720,319

53.4 m

1.35%

240,096

592,616

40.51%

The Scripps Research Institute Molecular Screening Center

148

275,224

47.6 m

0.58%

142,055

920,418

15.43%

Tox21

57

21,475

0.47 m

5.67%

4183

8743

47.84%