Extended Data Table 5 Future and cross-site generalizability experiments

From: A clinically applicable approach to continuous prediction of future acute kidney injury

  1. a, Model performance when trained before the time point tP and tested after tP, on both the entirety of the future patient population and on subgroups of patients for whom the model has or has not seen historical information during training. The model maintains a comparable level of performance on unseen future data, with a higher level of sensitivity of 59% for a time window of 48 h ahead of the AKI episode and a precision of 2 false positives per step for each true positive. The ranges correspond to bootstrap pivotal 95% confidence intervals with n = 200 bootstrap samples. Note that this experiment is not a replacement for a prospective evaluation of the model. b, Cohort statistics for a, shown for before and after the temporal split tP that was used to simulate model performance on future data. c, Comparison of model performance when applied to data from previously unseen hospital sites. Data were split across sites such that 80% of the data were in group A and 20% of the data were in group B. No site from group B was present in group A, and vice versa. The data were split into training, validation, calibration and test sets in the same way as in the other experiments. The table reports model performance when trained on site group A when evaluating on the test set within site group A versus the test set within site group B for predicting all AKI severities up to 48 h ahead of time. A comparable performance is seen across all key metrics. Bootstrap pivotal 95% confidence intervals are calculated using n = 200 bootstrap samples. Note that the model needs to be retrained to generalize from the population represented by the US Department of Veterans Affairs dataset to different demographics and sets of clinical pathways and hospital processes.