Uncertain of uncertainties? A comparison of uncertainty quantification metrics for chemical data sets

Table 1 RMSE and UQ evaluation metrics for the 9 RF models trained on Crippen’s logP from [15]. The simulated values, \(NLL^{sim}\) and \(\rho _{rank}^{sim}\) is the average of 1000 simulated sets of test errors based on the predicted uncertainties. The number in parenthesis is the standard deviation of the 1000 values

\(N_{train}\)	RMSE	\(R^2\)	a	b	\(\rho _{rank}\)	\(\rho _{rank}^{sim}\)	\(A_{mis}\)	NLL	NLL\(^{sim}\)
100	1.29	0.84	0.62	0.62	0.11	0.19 (0.01)	0.05	1.73	1.46 (0.01)
500	1.09	0.85	0.64	0.45	0.11	0.19 (0.01)	0.03	1.51	1.39 (0.01)
1000	1.01	0.85	0.55	0.45	0.10	0.19 (0.01)	0.00	1.42	1.40 (0.01)
5000	0.93	0.81	0.57	0.42	0.10	0.18 (0.01)	0.01	1.35	1.29 (0.01)
10,000	0.90	0.82	0.58	0.40	0.11	0.19 (0.01)	0.01	1.32	1.24 (0.01)
20,000	0.86	0.86	0.58	0.37	0.11	0.18 (0.01)	0.01	1.26	1.21 (0.01)
50,000	0.81	0.88	0.61	0.31	0.11	0.19 (0.01)	0.00	1.19	1.16 (0.01)
100,000	0.77	0.85	0.67	0.26	0.13	0.20 (0.01)	0.00	1.15	1.12 (0.01)
150,000	0.75	0.91	0.69	0.23	0.14	0.21 (0.01)	0.01	1.11	1.09 (0.01)

ISSN: 1758-2946