Skip to main content

Table 1 Performance of OWSum and mlKNN (optimized k = 1) regarding the prediction of the descriptors ‘floral’, ‘medicinal’, ‘woody, resinous’, ‘sickening’, ‘fruity, other than citrus’ and ‘perfumery’ using five-fold cross-validation. One-versus-rest ROC AUC values and MCC values are the averaged results over all classes. See Supplementary Material for ROC AUC and MCC values per class as well as ROC curves per odor for the best-performing variant

From: OWSum: algorithmic odor prediction and insight into structure-odor relationships

Feature selection

Weighting

factor ai,j for OWSum or mlKNN

Overall accuracy (%)a

Predicted accuracy (%)a

Non-

predictable molecules (%)

Mean ROC AUC (underestimated)a

Mean ROC AUC (overestimated)a

Mean MCC (underestimated)a

Mean MCC (overestimated)a

  

Five-fold cross-validation

-

Same- weighted

46.8

46.8

0

0.62

0.67

0.24

0.40

idf

Same-weighted

56.3

58.1

3.2

0.66

0.77

0.32

0.47

idf

Tf-idf-weighted

75.0

77.6

3.2

0.75

0.81

0.47

0.63

idf

Tf-idf-weighted ∙

1/Pr(F|C)b

68.8

71.4

3.2

0.71

0.76

0.41

0.57

idf

Tf-idf-weighted ∙

1/Pr(F|C) Pr(C|F)b

64.1

66.6

3.2

0.70

0.75

0.38

0.54

-

mlKNN

69.0

69.0

0

0.77

0.82

0.53

0.62

idf

mlKNN

61.2

61.2

0

0.74

0.79

0.48

0.55

  1. aDefined in the Methods Section
  2. bWe divide by the weight Pr(Fj|Ci) in order to find the importance of this weight and compare the improvement using Pr(Fj|Ci) and not Pr(Ci|Fj) as AWSum does [47]