Skip to main content

Table 2 Data sets used for the bioactivity benchmarks

From: Benchmarking of protein descriptor sets in proteochemometric modeling (part 2): modeling performance of 13 amino acid descriptor sets

  ACE inhibitors GPCRs NNRTIs PIs
Total size (data points) 58 6,046 4,024 6,995
Total compounds n/a 3,230 451 9
Average compound tanimoto distance (ECFP_6) n/a 0.92 0.54 0.73
Average euclidian distance compounds (physicochemical) n/a 1.28 n/a 0.90
Total targets (peptides / proteins) 58 32 14 1060
Average target tanimoto distance (ProtFP (Feature)) 0.83 0.22 0.14 0.03
Average euclidian distance target (ProtFP (PCA3)) 1.35 0.93 0.44 0.26
Completeness (% of total compound - target pairs) n/a 0.06 0.64 0.73
  1. The datasets were selected to obtain a diverse collection of sets amenable to PCM modeling. To this extend both GPCRs and enzymes were included. Moreover sets representing initial hit discovery, lead optimization, and well-established structure-activity space modeling were included.