Skip to main content

Table 2 Overview of results

From: Filtered circular fingerprints improve either prediction or runtime performance while retaining interpretability

Selection of fragments Intepretable fragments Fast processing (low num. features) Best performance
Unprocessed Yes Yes Yes
Folded Yes
Filtered Yes Yes Yes Yes
  1. Unprocessed fragments yield random forest (RF) models and support vector machine (SVM) models with good performance and retain interpretability, but require a high computational cost. Folded fragments allow fast processing, but generate inferior models and are non-interpretable due to bit collisions. Filtered fragments yield the best naive Bayes (NB) models and can be employed to build RF models that are equally good as those built with unprocessed fragments. Filtered fragments also retain interpretability and allow fast processing
  2. In summary, unprocessed (all) fragments are a good option if there are enough computational resources to optimize SVMs and the vast amount of (often redundant) features does not hinder interpreting predictions. Otherwise, filtered fragments should be preferred
  3. In general, RF models yield good results without parameter tuning, however, SVM models are usually better when their parameters have been optimized (see section on parameter optimization)