Visual analytics in cheminformatics: user-supervised descriptor selection for QSAR methods

Table 2 Prediction accuracy and cardinality for the best ten models obtained by Soto’s method [5]

Model	Predictive accuracy	Cardinality
M1 (Mn/MW, Sp, RHyDp, ETA_EtaP_F_L)	R² = 0.26 MAE = 4.62 RMSE = 8.14	4
M2 (Mn/MW, MDEO-11, D/Dr09, SMTIV)	R² = 0.32 MAE = 5.94 RMSE = 8.31	4
M3 (Mn/MW, nHBint4, nHBint10, ETA_dEpsilon_B)	R² = 0.56 MAE = 4.03 RMSE = 6.22	4
M4 (Mn/MW, nsCH3, nF6Ring, ALOGP2, RDCHI)	R² = 0.41 MAE = 3.94 RMSE = 6.75	5
M5 (Mn/MW, nROH, n6Ring, nHCsatu, ALOGP2)	R² = 0.68 MAE = 3.28 RMSE = 5.78	5
M6 (Mn/MW,nP, minHBa, T(O..P), ETA_Epsilon_3)	R² = 0.25 MAE = 4.48 RMSE = 7.20	5
M7 (Mn/MW, ETA_dEpsilon_B, C-005, SHaaCH, nHBint9,nCt)	R² = 0.31 MAE = 4.19 RMSE = 7.20	6
M8 (Mn/MW, ndssC, minHBint9, MSD, C-004, Mw/Mn (PDI), crosshead speed(CHS))	R² = 0.39 MAE = 3.92 RMSE = 6.86	7
M9 (Mn/MW, Pol, Wap, maxHAvin, nHAvin, MWC04)	R² = 0.15 MAE = 4.92 RMSE = 7.88	6
M10 (Mn/MW,maxHBint6, ETA_dEpsilon_A, TIC2, ndO, nHdCH2)	R² = 0.48 MAE = 4.02 RMSE = 7.09	6

The second column shows the predictive accuracy of the “best” model after applying 4-fold cross validation on three different methods (linear regression, decision trees, and neural networks). The parameter setup and predictive accuracy for all methods is available in the Additional file 1: Table S2.

ISSN: 1758-2946