Skip to main content

Table 3 Optimal parameters for optimized ML models

From: POSEIDON: Peptidic Objects SEquence-based Interaction with cellular DOmaiNs: a new database and predictor

Model

Parameters

Package

Support vector machine

Kernel:”rbf”; C: 1.5; Gamma: “scale”

scikit-learn

Stochastic gradient descent

Loss: “squared_error”; Penalty: “l2″; Alpha: 0.00001; Learning rate:”adaptive”

k-nearest neighbors

N Neighbors: 2; P: 2; Algorithm: “brute”

Decision tree

Splitter: “best”; Criterion: "friedman_mse"; Maximum depth: 10; Minimum samples split: 3; Minimum samples leaf: 7; Minimum weight fraction leaf: 0.0; Maximum features: "auto"

Random forest

Number of estimators: 50; Criterion: "squared_error"; Maximum depth: 50; Minimum samples split: 3; Minimum samples leaf: 3; Minimum weight fraction leaf: 0.0

Extreme randomized trees

Number of estimators: 10; Criterion: "friedman_mse"; Maximum depth: 100; Minimum samples split: 10; Minimum samples leaf: 7; Minimum weight fraction leaf: 0.0

Extreme gradient boosting

Number of estimators: 50; Maximum depth: 10; Maximum leaves: 10; Learning rate: None; Booster: "dart"; Alpha: 1; Lambda: 3; Gamma: 0

xgboost

Deep neural network

Depth: 1; Layer size: 500; Use dropout: True; Dropout rate: 0.3; Epochs: 230; Learning rate: 0.0005

tensorflow

Forked neural network

Depth: 7; Dropout: 0.9; Use dropout: False; Learning Rate: 0.0001; Experimental layer size: 39; Cargo layer size: 239; Sequence anomalies layer size: 79; Whole-peptide features layer size: 155; Sequence encoding layer size: 850; Genomics layer size: 687; Anomalous position layer size: 45; Epochs: 170