The effect of noise on the predictive limit of QSAR models

Table 2 Algorithms used in this work and their respective hyperparameter optimization spaces

Algorithm	Hyperparameters searched in optimization^a,b
Ridge regression (Ridge)	PCA n components \(\in \left( {1,{ }3,{ } \ldots { },59} \right)\) α \(\in \left( {1, 2, 3, 4, 5, 10} \right)\)
k-nearest neighbors (kNN)	PCA n components \(\in \left( {1,{ }3,{ } \ldots { },59} \right)\) k \(\in \left( {1,{ }2,{ } \ldots ,{ }20} \right)\)
Support vector regressor (SVR)	PCA n components \(\in \left( {1,{ }3,{ } \ldots { },59} \right)\) C \(\in \left( {0.01,{ }0.1,{ }1,{ }10} \right)\) kernel: radial basis function (RBF)
Random forest (RF)	PCA n components \(\in \left( {1,{ }3,{ } \ldots { },59} \right)\) n estimators \(\in \left( {1,{ }10,{ } \ldots ,{ }200} \right)\) max depth \(\in \left( {1,{ }3,{ } \ldots ,{ }99} \right)\) max leaf nodes \(\in \left( {2,{ }12,{ } \ldots ,{ }92} \right)\)
Gaussian process (GP)	PCA n components \(\in \left( {1,{ }3,{ } \ldots { },59} \right)\) kernel:^c RBF, WhiteKernel, Matern, DotProduct, ExpSineSquared, ConstantKernel or RationalQuadratic Normalize y: true

^aRidge, kNN, SVR, and GP algorithms were optimized using fivefold GridSearchCV, but RF was optimized using fivefold RandomSearchCV with 500 iterations
^bAll algorithm hyperparameters which are not listed in this table were set to the defaults provided in the sci-kit learn library
^cFor most datasets, only a single kernel converged. So the kernel was not optimized in GridSearchCV, it was chosen beforehand and used for the entire dataset

ISSN: 1758-2946