Evaluating uncertainty-based active learning for accelerating the generalization of molecular property prediction

Table 6 Active learning results

Target	Model	Sampling	Standard	Generalization
Target	Model	Sampling	Whole	OOD Data Bin	Whole	ID Bins
Solubility	MDM	Data Density (EB)	0.14%	0.25%	0.08%	0.03%
		Data Density (EB)	\(\varvec{p=0.015}\)	\(\varvec{p=0.013}\)	\(p=0.146\)	\(p=0.326\)
		Diversity (EB)	0.00%	0.13%	−0.03%	−0.07%
		Diversity (EB)	\(p=0.483\)	\(p=0.098\)	\(p=0.674\)	\(p=0.793\)
		MCDO	0.13%	0.08%	0.10%	0.11%
		MCDO	\(\varvec{p=0.032}\)	\(p=0.204\)	\(\varvec{p=0.036}\)	\(\varvec{p=0.024}\)
		OOD Only	–	1.93%	0.18%	−0.28%
		OOD Only	–	\(\varvec{p<0.001}\)	\(\varvec{p=0.007}\)	\(p=0.992\)
Solubility	GBM	Data density (EB)	−0.01%	0.24%	0.04%	−0.03%
		Data density (EB)	\(p=0.714\)	\(\varvec{p<0.001}\)	\(p=0.084\)	\(p=0.868\)
		GBM	0.06%	0.11%	0.05%	0.03%
		GBM	\(\varvec{p=0.002}\)	\(\varvec{p=0.012}\)	\(\varvec{p=0.045}\)	\(p=0.203\)
		OOD only	–	1.82%	0.14%	−0.31%
		OOD only	–	\(\varvec{p<0.001}\)	\(\varvec{p=0.009}\)	\(p=0.996\)
Redox	MDM	Data density (EB)	−0.09%	0.42%	0.06%	−0.05%
		Data density (EB)	\(p=0.986\)	\(\varvec{p<0.001}\)	\(p=0.055\)	\(p=0.900\)
		MCDO	0.01%	0.18%	0.05%	0.01%
		MCDO	\(p=0.377\)	\(\varvec{p=0.001}\)	\(p=0.102\)	\(p=0.487\)
		OOD only	–	2.09%	0.41%	−0.10%
		OOD only	–	\(\varvec{p<0.001}\)	\(\varvec{p<0.001}\)	\(p=0.977\)
Redox	GBM	Data density (EB)	0.02%	0.21%	0.06%	0.01%
		Data density (EB)	\(p=0.168\)	\(\varvec{p<0.001}\)	\(\varvec{p=0.003}\)	\(p=0.265\)
		GBM	0.02%	0.12%	0.05%	0.02%
		GBM	\(p=0.123\)	\(\varvec{p=0.007}\)	\(\varvec{p=0.013}\)	\(p=0.076\)
		OOD only	-	1.60%	0.11%	−0.29%
		OOD only	-	\(\varvec{p<0.001}\)	\(\varvec{p=0.025}\)	\(p=0.994\)

For the density method, EB refers to embedding-based similarity. Shown here are the percentage decrease in RMSE compared to random sampling. Shown below are the p-values of corresponding paired t-tests with the alternative hypothesis of AL performing better than random sampling. Significant test results are bold

ISSN: 1758-2946