Estimating the uncertainty profile for the logP2 data set. The model shown (logP2-1) has six hidden neurons and uses 40 structural descriptors as input. The voting thresholds (indicated by the vertical black dotted lines) was 16.5. The horizontal dotted lines running across the thresholds indicate where an error rate of 0.5 would fall. Error counts include a continuity correction of 0.5 (see text for details). (A) Distribution of predictions (blue) and errors (red) for the training pool, with fitted beta binomial distributions shown as dashed lines. (B) Distribution of observed error rates (red symbols) for the training pool and the uncertainty calculated from the fitted prediction and error beta binomials (dashed black line). (C) Distribution of predictions (blue) and errors (red) for the external validation set. Dashed lines represent the fitted beta binomial distributions for the corresponding training pool results, scaled to account for the larger size (10x) of the validation set. (D) Observed (red symbols) error rate profile for the validation set and uncertainty profile (dashed black line) estimated using the beta binomials fitted to the training pool.