Skip to main content

Table 2 Overview of results

From: Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability

Selection of fragments

Intepretable fragments

Fast processing (low num. features)

Best performance

RF

SVM

NB

Unprocessed

Yes

Yes

Yes

Folded

Yes

Filtered

Yes

Yes

Yes

Yes

  1. Unprocessed fragments yield random forest (RF) models and support vector machine (SVM) models with good performance and retain interpretability, but require a high computational cost. Folded fragments allow fast processing, but generate inferior models and are non-interpretable due to bit collisions. Filtered fragments yield the best naive Bayes (NB) models and can be employed to build RF models that are equally good as those built with unprocessed fragments. Filtered fragments also retain interpretability and allow fast processing
  2. In summary, unprocessed (all) fragments are a good option if there are enough computational resources to optimize SVMs and the vast amount of (often redundant) features does not hinder interpreting predictions. Otherwise, filtered fragments should be preferred
  3. In general, RF models yield good results without parameter tuning, however, SVM models are usually better when their parameters have been optimized (see section on parameter optimization)