Harnessing Shannon entropy-based descriptors in machine learning models to enhance the prediction accuracy of molecular properties

Table 2 List of the used datasets and comparison of descriptors in MLP-based deep neural network models

Dataset (Target ID)	Target variable	Sample size	kNN (MAE)	Morgan (MAE)	SEF (MAE)	SHED(MAE)	Source/reference
CHEMBL 3713062^a	BEI	3382	6.47	5.00 ± 0.13	3.70 ± 0.15	10.74 ± 0.48	EMBL-EBI
CHEMBL 204	BEI	1777	4.20	5.03 ± 0.20	4.23 ± 0.05	10.41 ± 0.30	EMBL-EBI
CHEMBL 2842	BEI	4164	4.90	4.50 ± 0.14	4.07 ± 0.08	9.76 ± 0.24	EMBL-EBI
CHEMBL 274	BEI	1950	2.64	3.54 ± 0.24	2.90 ± 0.05	4.95 ± 0.02	EMBL-EBI
CHEMBL 3974	BEI	725	3.80	5.00 ± 0.35	3.52 ± 0.11	9.23 ± 0.03	EMBL-EBI
CHEMBL 2820	BEI	663	2.70	3.33 ± 0.25	2.92 ± 0.11	4.58 ± 0.13	EMBL-EBI
CHEMBL 2815	BEI	3182	3.90	4.21 ± 0.21	3.84 ± 0.07	7.25 ± 0.02	EMBL-EBI
CHEMBL 4691	pCheMBL	859	2.25	2.14 ± 0.08	1.94 ± 0.03	2.70 ± 0.02	EMBL-EBI

^aThe scaling factor of MAE was 10⁵ and for the rest of the Target IDs the scaling factor was 10³

ISSN: 1758-2946