An investigation into pharmaceutically relevant mutagenicity data and the influence on Ames predictive potential

Table 8 Performance of models on subsets of the compiled all-substructure set.

Molecule set	Random Forest^a	TopKat	Local^{b, c}	Local (remaining data)
Set C	0.79	0.65	0.80	0.65
Not polyaromatic, ArNH₂, ArNO₂	0.78	0.62	-
Set D	0.88	0.90	0.89	0.69
Not polyaromatic, ArNH₂, ArNO₂	0.86	0.86	-
Set E	0.78	0.77	0.66	0.67
Kazius et al.	0.91	0.95	-
All	0.90^b	0.89	-
Nitroaromatics	0.85	0.90	-
Aryl-amines (not nitroaromatic or polyaromatic)	0.87	0.87	-
Polyaromatic (not nitroaromatic)	0.75	0.86	-

The four columns denote a random forest model built on the full set of data, the default TopKat[76, 77] model, a local random forest models built only on the indicated set, and the performance of this local model on the rest of the compiled all-substructure set.
^aGlobal model, trained on all data, ^bOOB performance, ^cTrained only on the particular set

ISSN: 1758-2946