Skip to main content

Table 2 Data sets used for the bioactivity benchmarks

From: Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

 

ACE inhibitors

GPCRs

NNRTIs

PIs

Total size (data points)

58

6,046

4,024

6,995

Total compounds

n/a

3,230

451

9

Average compound tanimoto distance (ECFP_6)

n/a

0.92

0.54

0.73

Average euclidian distance compounds (physicochemical)

n/a

1.28

n/a

0.90

Total targets (peptides / proteins)

58

32

14

1060

Average target tanimoto distance (ProtFP (Feature))

0.83

0.22

0.14

0.03

Average euclidian distance target (ProtFP (PCA3))

1.35

0.93

0.44

0.26

Completeness (% of total compound - target pairs)

n/a

0.06

0.64

0.73

  1. The datasets were selected to obtain a diverse collection of sets amenable to PCM modeling. To this extend both GPCRs and enzymes were included. Moreover sets representing initial hit discovery, lead optimization, and well-established structure-activity space modeling were included.