Method | Description | Advantage | Disadvantage |
---|---|---|---|
Applicability Domain (AD) estimation | Provides an estimate of whether the assumptions of a model are fulfilled for a given input [42,43,44,45], e.g., distance to model AD provides a reliability based on whether a query compound is close to model training data | Provides estimates in uncertainty when making predictions for new compounds | Do not commonly take into account the uncertainty related to the underlying data |
Conformal Prediction | Produces error bands around the predictions, with the underlying assumption that inputs less similar to model training data should lead to less certain estimates. This is captured using a nonconformity measure, i.e., the nonconformity score for a new query compound is calculated [46,47,48] | Provides estimates in uncertainty when making predictions for new compounds | Do not commonly take into account the uncertainty related to the underlying data |
Probability Calibration | Addresses the question of obtaining accurate likelihoods of predictions based on the distributions of reference observations for a given dataset [36] | There are advantages related to specific calibration methodologies e.g., Isotonic regression methodology makes no assumptions on the curve form. Inductive methods must split data in order to create ‘proper’ calibration splits | Performance depends on the reference observations used Limitations related to specific calibration methodologies: e.g., Isotonic regression methodology requires a large number of calibration points and has a tendency to overfit |
Gaussian processes (GP, Bayesian methodology) | Probability distributions over possible functions are used to evaluate confidence intervals and decide based on those if one should refit the prediction in some region of interest [7] | Allow the incorporation of data prior knowledge The uncertainty of a fitted GP increases away from the training data | Gaussian processes can be computationally expensive (because of their non-parametric nature and they need to take into account all the training data each time they make a prediction) |