Skip to main content

Advertisement

Table 2 Distribution of optimal parameters

From: Cross-validation pitfalls when selecting and assessing regression and classification models

PLS on aquaticTox Number of components 10 11 12 13 14 15    
Frequency 1 9 9 23 6 2    
Ridge regression on AquaticTox Lambda ≤0.027 0.035 0.040 0.046 0.053 0.061 0.070 0.081 ≥0.093
Frequency 6 5 7 8 4 6 10 6 2
Ridge logistic regression on bbb2 Lambda ≤0.09 0.10 0.12 0.14 0.16 0.18 0.21 0.24 ≥0.28
Frequency 7 3 4 5 10 6 5 2 8
Ridge logistic regression on caco-PipelinePilotFP Lambda <0.0046 0.0046 0.0053 0.0061 0.0070 0.0081 0.0093 0.0107 >0.0107
Frequency 6 2 2 4 7 12 6 6 5
Ridge logistic regression on caco-QuickProp Lambda ≤0.018 0.021 0.024 0.028 0.032 0.037 0.042 0.049 ≥0.056
Frequency 7 2 8 7 7 7 4 4 4
PLS on MeltingPoint Number of components 34-35 36 37-40 41 42-46 47 48-51 57 60
Frequency 7 7 6 8 7 8 5 1 1
Ridge regression on MeltingPoint Lambda ≤0.031 0.036 0.042 0.048 0.055 0.063 0.073 0.084 ≥0.096
Frequency 5 1 4 6 5 5 7 10 5
Ridge logistic regression on Mutagen Lambda <0.0016 0.0016 0.0018 0.0021 0.0024 0.0031 0.0036 0.0042 >0.0042
Frequency 7 2 1 6 5 8 4 6 7
Ridge logistic regression on PLD Lambda ≤0.34 0.34 0.39 0.44 0.67 0.77 0.89 1.02 ≥1.17
Frequency 10 2 3 2 1 5 5 5 19
  1. Distribution of optimal parameters (number of components or lambda values) based on 50 single cross-validations for each pair of method/dataset.