Can human experts predict solubility better than computers?

Table 3 Performance of median-based consensus classifiers, errors are absolute (unsigned) and are measured in log S units

Compound	ML error	Human error	Difference
4-Aminobenzoic acid	0.07	0.13	− 0.06
4-Aminosalicylic acid	0.23	0.76	− 0.53
Antipyrine	3.73	2.98	0.75
Chloramphenicol	0.35	0.39	− 0.04
Corticosterone	0.11	0.06	0.05
Dapsone	0.54	0.29	0.25
Primidone	0.06	0.14	− 0.08
Estrone	0.87	0.82	0.05
Alclofenac	0.30	0.12	0.18
5-Fluorouracil	0.46	0.62	− 0.16
Griseofulvin	0.44	0.25	0.19
Fluometuron	0.53	0.04	0.49
Fluconazole	1.09	0.70	0.39
Khellin	0.17	0.98	− 0.81
Clozapine	1.37	0.71	0.66
Norethisterone	0.63	0.63	0.00
Nicotinic acid	0.58	0.35	0.23
Perphenazine	0.16	0.16	0.00
Pteridine	2.22	3.02	− 0.80
Salicylamide	0.23	0.49	− 0.26
Sulfanilamide	0.54	0.14	0.40
Gliclazide	1.03	0.80	0.23
Trihexyphenidyl	1.98	1.45	0.53
Triphenylene	0.15	0.27	− 0.12
Mifepristone	1.57	2.00	− 0.43
Average	0.778	0.732	0.046

The difference is meaningfully signed, with a positive value where the human median-based classifier performed better on that compound and a negative value where the machine learning median-based classifier performed better

ISSN: 1758-2946