Hyper-parameters | Values explored | Parameter |
---|---|---|
Bernoulli Naïve Bayes | ||
Alpha | 1, 0.5, 0.1 | Laplace/Lidstone smoothing parameter |
Fit_prior | True, false | Class prior probabilities. In case of false, a uniform prior was used |
k-Nearest neighbor | ||
Nn | 1, 3, 5, 7, 9, 11 | Number of nearest neighbors |
Random forest | ||
Ntrees | 10, 50, 100, 300, 700, 1000 | Number of trees |
Criterion | Gini, entropy | Functions used to measure the quality of each split |
Max_features | Sqrt(n_features), log2(n_features) | Number of features considered for each split |
Support vector machines | ||
Kernel | rbf | Radial basis function |
C | 103, 102, 10, 1 | Cost |
γ | 10−5, 10−4, 10−3, 10−2, 10−1 | Gamma |
Kernel | Linear | Linear kernel |
C | 103, 102, 10, 1, 10−1, 10−2, 10−3, 10−4 | Cost |
Deep neural networks | ||
η | 1, 10−1, 10−2, 10−3, 10−4 | Learning rate for the stochastic gradient descent (“SGD”) |
Momentum (μ) | 0.9 | Weight of the previous update |
Weight decay | 0.0005 | |
Epochs | 300 | Number of training epochs |
Batch size | 256 | mini-batch training size |
Hidden layers | 1, 2, 3, 4 | Number of hidden layers |
Number neurons | 5, 10, 50, 100, 200, 500, 700, 1000, 1500, 2000, 2500, 3000, 3500 | Number of neurons per hidden layer |
Activation function | ReLU, Sigmoid, Tanh | Neuron activation functions |
Regularization | No, Dropout | Regularization techniques |
Dropout | (0%, 20%, 50%) input layer, 50% hidden layers | % of neurons “dropped” using the Drop-out technique |
Weight and bias initiation | Gaussian {SD: 0.01} | Function used to initiate weights and biases. |
Loss function | SoftmaxWithLoss | Function used to minimize loss |
Output function | Softmax | Function used to calculate probability for predictions |
Number of classes | 2 | Binary classification |