Skip to main content

Table 2 Distribution of optimal parameters

From: Cross-validation pitfalls when selecting and assessing regression and classification models

PLS on aquaticTox

Number of components

10

11

12

13

14

15

   

Frequency

1

9

9

23

6

2

   

Ridge regression on AquaticTox

Lambda

≤0.027

0.035

0.040

0.046

0.053

0.061

0.070

0.081

≥0.093

Frequency

6

5

7

8

4

6

10

6

2

Ridge logistic regression on bbb2

Lambda

≤0.09

0.10

0.12

0.14

0.16

0.18

0.21

0.24

≥0.28

Frequency

7

3

4

5

10

6

5

2

8

Ridge logistic regression on caco-PipelinePilotFP

Lambda

<0.0046

0.0046

0.0053

0.0061

0.0070

0.0081

0.0093

0.0107

>0.0107

Frequency

6

2

2

4

7

12

6

6

5

Ridge logistic regression on caco-QuickProp

Lambda

≤0.018

0.021

0.024

0.028

0.032

0.037

0.042

0.049

≥0.056

Frequency

7

2

8

7

7

7

4

4

4

PLS on MeltingPoint

Number of components

34-35

36

37-40

41

42-46

47

48-51

57

60

Frequency

7

7

6

8

7

8

5

1

1

Ridge regression on MeltingPoint

Lambda

≤0.031

0.036

0.042

0.048

0.055

0.063

0.073

0.084

≥0.096

Frequency

5

1

4

6

5

5

7

10

5

Ridge logistic regression on Mutagen

Lambda

<0.0016

0.0016

0.0018

0.0021

0.0024

0.0031

0.0036

0.0042

>0.0042

Frequency

7

2

1

6

5

8

4

6

7

Ridge logistic regression on PLD

Lambda

≤0.34

0.34

0.39

0.44

0.67

0.77

0.89

1.02

≥1.17

Frequency

10

2

3

2

1

5

5

5

19

  1. Distribution of optimal parameters (number of components or lambda values) based on 50 single cross-validations for each pair of method/dataset.