Skip to main content

Table 2 Average (\(\bar{x}\)) and standard deviation (s) of the similarity scores between 10,000 randomly-selected biologically-tested compounds (from Ref. [22, 23]), and the dissimilarity threshold (d thresh) used in the present study to generate the structure–activity relationship (SAR) clusters

From: PubChem structure–activity relationship (SAR) clusters

Similarity measures

\(\bar{x}\)

s

\(\bar{x}\) + 2s

d thresh

2-D

0.4229

0.1326

0.6881

0.3119

3-D (N max  = 1)

 ST ST-opt

0.5438

0.0986

0.7410

 ComboT ST-opt

0.6161

0.1276

0.8713

 CT CT-opt

0.1807

0.0609

0.3024

 ComboT CT-opt

0.5859

0.1440

0.8738

3-D (N max  = 10)

 ST ST-opt

0.6464

0.1017

0.8498

0.1502

 ComboT ST-opt

0.7682

0.1337

1.0356

0.4822

 CT CT-opt

0.2485

0.0706

0.3898

0.6102

 ComboT CT-opt

0.7733

0.1386

1.0505

0.4748

  1. N max is the maximum number of diverse conformers considered per compound for the 3-D similarity computation. The d thresh value for each of the five similarity measures were determined by subtracting its (\(\bar{x}\) + 2s) value from unity (after normalization to one for ComboT ST-opt and ComboT CT-opt). The statistical parameters for N max  = 10 were used to determine the d thresh value for the 3-D similarity measures.