Skip to main content

Table 4 AUC calculated for classification sets (higher values are better)

From: Transformer-CNN: Swiss knife for QSAR modeling and interpretation

Dataset Descriptor based methodsa SMILES based (augm = 10)2 Transformer-CNN, no augm Transformer-CNN, augm = 10 CDDD descriptorsb
HIV 0.82 0.78 0.81 0.83 0.74
AMES 0.86 0.88 0.86 0.89 0.86
BACE 0.88 0.89 0.89 0.91 0.9
Clintox 0.77 ± 0.03 0.76 ± 0.03 0.71 ± 0.02 0.77 ± 0.02 0.73 ± 0.02
Tox21 0.79 0.83 0.81 0.82 0.82
BBBP 0.90 0.91 0.9 0.92 0.89
JAK3 0.79 ± 0.02 0.8 ± 0.02 0.70 ± 0.02 0.78 ± 0.02 0.76 ± 0.02
BioDeg 0.92 0.93 0.91 0.93 0.92
RP AR 0.85 0.87 0.83 0.87 0.86
  1. We omitted the standard mean errors, which are 0.01 or less, for the reported values
  2. aResults from our previous study [22]. bBest performance calculated with CDDD descriptors obtained using Sml2canSml autoencoder from [27]