Skip to main content

Table 1 The information of the datasets utilized in this study

From: The impact of cross-docked poses on performance of machine learning classifier for protein–ligand binding pose prediction

Dataset

Re-docked poses

Cross-docked poses

Complexes

Poses

Positives

Negatives

Complexes

Poses

Positives

Negatives

PDBbind-ReDocked

4057

83,876

39,978

43,898

PDBbind-ReDocked-Refined

3767

77,922

37,114

40,808

PDBbind-ReDocked-Core

290

5954 (5664)a

2864 (2574)

3090

CASF-Docking

285b

22,777 (22,492)

5494 (5209)

17,283

PDBbind-CrossDocked-Core-s

285

5551

2565

2,986

1058

20,859

5872

14,987

PDBbind-CrossDocked-Core-g

282

4795

1596

3,199

1030

17,814

3768

14,046

PDBbind-CrossDocked-Core-v

285

5693

301

5,392

1058

21,145

740

20,405

PDBbind-CrossDocked-Refined

3767

77,839 (74,072)

37,028 (33,261)

40,811

90,002

1,874,433

1,499,702

374,731

PDBbind-CrossDocked-Refined*c

3767

77,839 (74,072)

37,028 (33,261)

40,811

90,002

1,731,351

1,428,161

303,190

  1. aThe number in bracket refers to the number after removing the crystal poses
  2. bThe core set of original PDBbind 2016 has 290 complexes belonging to 58 clusters, while only 285 are remained when constructing the CASF because there is a duplicated cluster
  3. cThe set eliminates the cross-native poses