Skip to main content

Table 2 Hyper-parameters values explored for Bernoulli Naïve Bayes, k-nearest neighbor, random forest, support vector machines and deep neural networks

From: Deep-learning: investigating deep neural networks hyper-parameters and comparison of performance to shallow methods for modeling bioactivity data

Hyper-parameters Values explored Parameter
Bernoulli Naïve Bayes
 Alpha 1, 0.5, 0.1 Laplace/Lidstone smoothing parameter
 Fit_prior True, false Class prior probabilities. In case of false, a uniform prior was used
k-Nearest neighbor
 Nn 1, 3, 5, 7, 9, 11 Number of nearest neighbors
Random forest
 Ntrees 10, 50, 100, 300, 700, 1000 Number of trees
 Criterion Gini, entropy Functions used to measure the quality of each split
 Max_features Sqrt(n_features), log2(n_features) Number of features considered for each split
Support vector machines
 Kernel rbf Radial basis function
 C 103, 102, 10, 1 Cost
 γ 10−5, 10−4, 10−3, 10−2, 10−1 Gamma
 Kernel Linear Linear kernel
 C 103, 102, 10, 1, 10−1, 10−2, 10−3, 10−4 Cost
Deep neural networks
 η 1, 10−1, 10−2, 10−3, 10−4 Learning rate for the stochastic gradient descent (“SGD”)
 Momentum (μ) 0.9 Weight of the previous update
 Weight decay 0.0005  
 Epochs 300 Number of training epochs
 Batch size 256 mini-batch training size
 Hidden layers 1, 2, 3, 4 Number of hidden layers
 Number neurons 5, 10, 50, 100, 200, 500, 700, 1000, 1500, 2000, 2500, 3000, 3500 Number of neurons per hidden layer
 Activation function ReLU, Sigmoid, Tanh Neuron activation functions
 Regularization No, Dropout Regularization techniques
 Dropout (0%, 20%, 50%) input layer, 50% hidden layers % of neurons “dropped” using the Drop-out technique
 Weight and bias initiation Gaussian {SD: 0.01} Function used to initiate weights and biases.
 Loss function SoftmaxWithLoss Function used to minimize loss
 Output function Softmax Function used to calculate probability for predictions
 Number of classes 2 Binary classification