Skip to main content

Table 2 Algorithms used in this work and their respective hyperparameter optimization spaces

From: The effect of noise on the predictive limit of QSAR models

Algorithm

Hyperparameters searched in optimizationa,b

Ridge regression (Ridge)

PCA n components \(\in \left( {1,{ }3,{ } \ldots { },59} \right)\)

α \(\in \left( {1, 2, 3, 4, 5, 10} \right)\)

k-nearest neighbors (kNN)

PCA n components \(\in \left( {1,{ }3,{ } \ldots { },59} \right)\)

k \(\in \left( {1,{ }2,{ } \ldots ,{ }20} \right)\)

Support vector regressor (SVR)

PCA n components \(\in \left( {1,{ }3,{ } \ldots { },59} \right)\)

C \(\in \left( {0.01,{ }0.1,{ }1,{ }10} \right)\)

kernel: radial basis function (RBF)

Random forest (RF)

PCA n components \(\in \left( {1,{ }3,{ } \ldots { },59} \right)\)

n estimators \(\in \left( {1,{ }10,{ } \ldots ,{ }200} \right)\)

max depth \(\in \left( {1,{ }3,{ } \ldots ,{ }99} \right)\)

max leaf nodes \(\in \left( {2,{ }12,{ } \ldots ,{ }92} \right)\)

Gaussian process (GP)

PCA n components \(\in \left( {1,{ }3,{ } \ldots { },59} \right)\)

kernel:c RBF, WhiteKernel, Matern, DotProduct, ExpSineSquared, ConstantKernel or RationalQuadratic

Normalize y: true

  1. aRidge, kNN, SVR, and GP algorithms were optimized using fivefold GridSearchCV, but RF was optimized using fivefold RandomSearchCV with 500 iterations
  2. bAll algorithm hyperparameters which are not listed in this table were set to the defaults provided in the sci-kit learn library
  3. cFor most datasets, only a single kernel converged. So the kernel was not optimized in GridSearchCV, it was chosen beforehand and used for the entire dataset