Skip to main content

Table 2 Synthetic data sets to benchmark interpretation of QSAR models

From: Benchmarks for interpretation of QSAR models

Dataset

Property type

End-point

Train/test set size

Expected atom contribution

N

Regression

Sum(N)

6995/2999

Nitrogen atoms: 1; others: 0

N − O

Regression

Sum(N) − sum(O)

6893/2969

Nitrogen atoms: 1; Oxygen atoms: − 1; others: 0

N + O

Regression

(Sum(N) + sum(O))/2,where sum(N) = sum(O)

7000/3000

Nitrogen and Oxygen atoms: 0.5; others: 0

Amide_reg

Regression

Sum(NC=O)

7000/3001

Any atom of amide groups: 1; others: 0

Amide_class

Classification

Active: if sum(NC=O) > 0; inactive: if sum(NC=O) = 0

6998/3000

Any atom of amide groups: 1; others: 0

Pharmacophore

Classification

Active: at least one conformer with exactly one pharmacophore match (same two atoms in all conformers); inactive: no pharmacophore matches for all conformers; pharmacophore match: HBD and HBA 9–10 Å apart

7000/3000

Atoms which are HBA or HBD of the pharmacophore: 1; others: 0

  1. Alternative is reasonable for interpretation methods that meet “summation to delta” property (see “Design of synthetic datasets”)