From: QSAR-Co-X: an open source toolkit for multitarget QSAR modelling
No | Utility | QSAR-Co | QSAR-Co-X | Remarks |
---|---|---|---|---|
1 | Feature selection | One (GA) | Two (FS and SFS) | – |
2 | Reproducibility of linear modelling | Low | High | Given the same sample size and number of descriptors, GA produces different LDA models on different runs, whereas both the FS and SFS always yield the same model |
3 | Diagnosis of intercollinearity among variables | Not available | Available and automatically performed | Very helpful for ascertaining the robustness of the derived linear models |
4 | Dataset division options | Random, Kennard-Stone, Euclidean-based | Random, pre-defined, k-MCA | Since only the random division option is fast, the other QSAR-Co options were replaced to reduce computational time |
5 | Automatic generation of the validation set | Not available | Available | Unlike QSAR-Co, QSAR-Co-X allows generating both the screening and validation sets |
6 | Statistical parameters for the validation set | Manual calculations are required | Automatic calculation | Automatic calculation allows fast selection of the models |
7 | Number of Box-Jenkins operators available | One (pre-defined) | Four (three pre-defined and one user-specific) | Additional and more flexible operators were added to QSAR-Co-X |
8 | Yc randomisation | Not available | Available | A modified form of the Y-randomisation technique that incorporates the influence of experimental elements |
9 | Machine-learning tools | One (RF only) | Six (kNN, SVM, RF, NB, GB, and MLP) | QSAR-Co-X affords several non-linear modelling tools |
10 | Number of parameters that may be altered in RF modelling | 5 | 8 | QSAR-Co-X offers more flexibility for setting up RF models |
11 | Comparative analysis of multiple ML methods | Not possible | Possible | Useful to decide which ML method performs best |
12 | Hyperparameter tuning options for ML methods | Not available | Available | Extremely useful to find optimised non-linear models |
13 | User specific parameter settings for building non-linear models | For RF only | For kNN, SVM, RF, NB, GB, and MLP | – |
14 | Display of ROC plots (linear modelling) | For sub-training and test sets | For sub-training, test and validation sets | – |
15 | Condition-wise prediction | Not available | Available | Useful to understand how the developed model performs against individual experimental conditions, particularly for large datasets |