Skip to main content

Table 8 Performance of models on subsets of the compiled all-substructure set.

From: An investigation into pharmaceutically relevant mutagenicity data and the influence on Ames predictive potential

Molecule set

Random Foresta

TopKat

Localb, c

Local (remaining data)

Set C

0.79

0.65

0.80

0.65

Not polyaromatic, ArNH2, ArNO2

0.78

0.62

-

 

Set D

0.88

0.90

0.89

0.69

Not polyaromatic, ArNH2, ArNO2

0.86

0.86

-

 

Set E

0.78

0.77

0.66

0.67

Kazius et al.

0.91

0.95

-

 

All

0.90b

0.89

-

 

Nitroaromatics

0.85

0.90

-

 

Aryl-amines (not nitroaromatic or polyaromatic)

0.87

0.87

-

 

Polyaromatic (not nitroaromatic)

0.75

0.86

-

 
  1. The four columns denote a random forest model built on the full set of data, the default TopKat[76, 77] model, a local random forest models built only on the indicated set, and the performance of this local model on the rest of the compiled all-substructure set.
  2. aGlobal model, trained on all data, bOOB performance, cTrained only on the particular set