Skip to main content
Fig. 5 | Journal of Cheminformatics

Fig. 5

From: Advancing material property prediction: using physics-informed machine learning models for viscosity

Fig. 5

Feature importance of descriptor-based LGBM models. Top 5 important features measured as the average magnitude of SHapley Additive exPLanations (SHAP) values (i.e. Mean |SHAP|) for LGBM models trained with A 2D descriptors only, B 2D and MD descriptors, and C MD descriptors only. Positive Mean |SHAP| indicates that the descriptor positively contributes to viscosity, whereas negative Mean |SHAP| indicates the converse. Descriptors with prefixes of “RD” and “MD” refer to RDKit and MD descriptors, respectively. The average Mean |SHAP| of twenty LGBM estimators is reported and the uncertainty is estimated by the computing standard deviation of the Mean |SHAP| values. The number of features correlated to the top features based on a Pearson’s r correlation coefficient cutoff greater or equal to 0.90 are shown in brackets and summarized here (parenthesis is Pearson’s r correlation to the top feature): \(^a\)RD_HeavyAtomMolWt (0.99), RD_ExactMolWt (1.00), RD_Chi0v (0.93), RD_LabuteASA (0.93); \(^b\)MD_SP (0.93); \(^c\)MD_RMSD (0.95)

Back to article page