Learning Regional Uncertainty to Calibrate Predictive Models in Clinical Settings
Poster, Health Data Science Poster Session, Boston, MA, USA
The use of machine learning (ML) in clinical settings has grown rapidly in recent years. These models have shown strong performance in predicting disease risk, treatment outcomes, tumor detection from imaging, and more. Despite this, many clinicians remain hesitant to rely on ML in practice. A key reason is their black-box nature—it is often unclear why the model is making a particular prediction. That lack of transparency can be dangerous in clinical applications. The stakes are especially high when it comes to false positives and false negatives in clinical prediction. In the context of treatment decisions, false negatives can result in patients missing out on necessary care—potentially leading to serious health consequences—while false positives may expose patients to unnecessary treatments and harmful side effects, ultimately reducing their quality of life. This challenge is compounded by a common issue in ML classification: overconfidence. Many classifiers can achieve high accuracy (e.g., 80% or higher) while still assigning high probabilities to incorrect classifications. If a model is uncertain about a particular prediction, this uncertainty should be reflected in the model output. However, reporting uncertainty measures alongside predictions often only serves to complicate the decision-making process. We propose a local similarity-aware post-hoc smoothing algorithm for predictive probabilities. Our method identifies regions of uncertainty in the feature space and directly adjusts predicted probabilities using a shrinkage estimate to reduce overconfidence. In simulated data, this approach has reduced conditional misclassification rates by as much as 16%. We also show that, in terms of conditional misclassification, our smoothed probabilities consistently outperform the original model outputs—even for neural networks. This work underscores the importance of moving beyond accuracy as a pure measure of model success, and towards safe models that represent what they know and what they don’t.
