FormalPara Key Summary Points

Why carry out this study?

A limited number of studies on economic evaluation of deep learning (DL) for diabetic retinopathy (DR) screening in high-income countries have demonstrated the potential for cost-saving with DL from a healthcare provider perspective. However, very few studies evaluated DL for DR screening in a middle-income country (MIC).

This study was conducted to analyze the cost-utility of DL compared to trained non-physician human graders (HG) to screen DR over a lifetime horizon of patients from both societal and healthcare provider perspectives.

What was learned from this study?

Our cost-utility analysis showed that in the context of MICs, such as Thailand, screening for DR using DL, compared to HG, may cause a high incremental cost-effectiveness ratio (ICER) over a lifetime horizon of patients from a healthcare provider perspective. This was due to much higher sensitivity of DL compared to that of HG and led to a higher treatment cost. However, screening using DL caused less bilateral blindness, which in turn caused more cost-saving than HG from a societal perspective.

Our data provide an economic rationale for decision makers to expand DL-based DR screening in MICs with similar prevalence of diabetes, shortages of skilled human resources for primary screening, and low compliance to referrals for treatment. In general, DR screening using DL or HG could achieve greater cost-savings with higher compliance to referrals for treatment.

Policy makers should be aware of budget impact of treating more patients with sight-threatening DR (STDR) with clinical deployment of DL.

Introduction

Deep learning (DL), a type of artificial intelligence (AI), has been retrospectively and prospectively validated as an effective tool for screening of diabetic retinopathy (DR), one of the leading causes of blindness worldwide, with levels of performance on par with expert ophthalmologists [1,2,3].

To build further evidence toward real-world deployment of DL for DR, rigorous health economic evaluations are essential to ensure appropriate allocation of healthcare budgets. While a few economic evaluations conducted in high-income countries found a potential for DL to be cost-effective for DR screening, few studies have been conducted for resource-limited settings, such as middle-income countries [4].

Thailand, a middle-income country (MIC) with a population of 70 million people and a gross domestic product of US$ 7189.04 per capita in 2020, is one of the few countries where a nationwide screening program for DR has been well established. The program, supported by the Ministry of Public Health, has been conducted across all provinces and administered by provincial health officers. Most of the approximately 700 screening units throughout the country are at the primary care level and use grading of color fundus photographs (CFP) for identification of referrals to ophthalmologists. Given the shortage of ophthalmologists, the grading is performed by trained, non-physician healthcare personnel (HG).

To reduce demands on limited human resources and expand access to screening in the country, DL approaches have been evaluated [5]. We previously conducted a retrospective validation study of a DL model for detecting referable DR in Thai patients with diabetes. Using a dataset of > 10,000 CFPs from those in the Thai diabetes registry, we found the model to have a high sensitivity (0.97) and specificity (0.96) for the detection of referrable DR [6]. Following this, a prospective interventional study in nine primary health centers validated the DL model’s feasibility for real-time DR screening and confirmed the high sensitivity (0.91) and specificity (0.95) [7]. Human-centered evaluative research was also conducted, reporting that patients and health care professionals were receptive to deploying DL in the real world [5].

The cost burden of patients with DR without screening to detect referrals for timely intervention was demonstrated in a recent study on cost-effectiveness analysis of DR screening using DL in low-income patients at primary care centers in the US. The estimated cost incurred to patients who received routine care by ophthalmologists at referral centers without screening was 2082.91 USD per patient compared to 1596.99 USD when they received screening by DL [8].

Thus, to inform further implementation of DL for DR screening, we conducted this health economic evaluation to inform decision makers regarding the deployment of DL compared with HG in the context of MIC. The results of this study can be applied not only to Thailand but also to other MICs with similar prevalence of diabetes and limitations in resources.

Methods

We constructed a model for cost-utility analysis (CUA) using data derived from previously published validation studies of the DL system for DR screening in Thailand [6]. All the studies were approved by the Ethics Committee of Rajavithi Hospital; all patients provided written informed consent. This economic evaluation was conducted and reported according to Consolidated Health Economic Evaluation Reporting Standards (CHEERS) 2022 Checklist provided in Supplementary Material. This study was approved by the Institutional Review Board of Rajavithi Hospital, Thailand (COA: 087/2562) and conducted according to the Declaration of Helsinki.

Target Population

A hypothetical cohort of patients with type 2 diabetes, aged 40 years, not previously screened for DR, was modeled for this CUA. This cohort was selected because the national program predominantly screens type 2 diabetes patients > age 40 years and the model accounts for yearly screenings for non-referred patients. The patients were scheduled for annual DR screening and were referred for treatment if diagnosed with sight-threatening DR (STDR). STDR was defined as patients with either eye having diabetic macular edema (DME), severe non-proliferative DR (NPDR), or proliferative DR (PDR). These definitions are based on the International Clinical Classification of DR (ICCDR) [9, 10].

Model Structure

The CUA using a Markov model with an embedded decision tree was undertaken from both societal and provider perspectives. Costs and health outcomes were evaluated over a lifetime horizon of patients with a discount rate of 3% a year for future costs and outcomes. The decision tree (Fig. 1A) illustrates the screening processes for patients. They received annual screening at primary health centers by grading of CFP by either HG or DL until they exited. An exit was defined as when a patient was graded as referral, ungradable, or with death. The ungradable patient was defined when CFP of both eyes was ungradable or when CFP of either eye was ungradable and that of the fellow eye was graded as non-STDR. This proportion of ungradable patients was found to be similar in both DL and HG arms in our previous study [11].

Fig. 1
figure 1

(Adapted from the Markov model of Ben et al. [12]). BB bilateral blindness, DME diabetic macular edema, DR diabetic retinopathy, STDR sight-threatening diabetic retinopathy. Non-STDR includes mild and moderate non-proliferative diabetic retinopathy (NPDR) without DME. Severe NPDR and proliferative diabetic retinopathy (PDR) including DME were defined as the STDR health state

Structure of the cost utility analysis model. A Decision tree representing diabetic retinopathy screening options for patients with diabetes; B Markov model structure representing the clinical progression of diabetic retinopathy

All patients with STDR in both arms received additional confirmatory grading by retinal specialists. Patients confirmed to be true positives were referred for treatment. Those confirmed as false positives and those with negative screening results were re-screened in the following year.

A Markov model was developed using Microsoft Excel (Microsoft Corp., Redmond, WA) comprising five health states based on the natural history of the disease and a 1-year cycle length (Fig. 1B). Transitional probabilities in this model are shown in Table 1. According to the progression rates of DR, patients could remain stable or progress towards bilateral blindness or death. A half-cycle correction was applied to account for events and transitions that might occur at any point during the cycle.

Table 1 Input parameters used in the base case analysis

Screening Strategies

This model reflected differences in the diagnostic accuracy of DL as compared to HG. DL had higher sensitivity (0.950 for DL and 0.737 for HG) but slightly lower specificity than HG (0.980 for DL and 0.986 for HG) [11]. Another difference, not captured in this model, was that DL enabled immediate results, whereas HG results could be delayed by 1–2 weeks if primary gradings could not be completed at the point of care [5]. Given that the timing of results can lead to differences in the proportion of patients who adhere to follow-up at the referral centers for treatment [27, 28], for example, a prospective study of DR screening using DL to provide immediate results showed the adherence rate to referrals was 55.4% compared with the historical adherence rate of 18.7% at 1 year [27], a scenario analysis was performed to understand the potential impact of this differential compliance.

Model Input Parameters

Transitional Probabilities and Screening Performance

The baseline prevalence of the disease severity was obtained from our previous studies [11]. The transition probabilities were derived from the literature [11, 12]. The proportion of patients with bilateral DR was estimated at 75% in the model [29]. Since the relative risk of mortality of patients with diabetes and blindness had not yet been reported and the main causes of blindness in patients with DR were from STDR [30], we therefore applied the relative risk of mortality in patients with STDR. The compliance to follow-up treatment after true positive screening results had not yet been systematically quantified in the Thai health system. We therefore used 60% for both screening modalities at baseline based on random sampling from a few screening sites.

We used 50% as the screening uptake according to data from the Ministry of Public Health in the base case analysis. The age-specific all-cause mortality, derived from the Thai population’s life table data, was used and integrated with relative risks of mortality among patients in the different health states.

Costs

Direct medical costs of the screenings included the costs of fundus cameras, CFP capture, and image interpretation. The first two costs were the same in both arms. The average direct medical cost for image interpretation by HG was estimated at 56 Thai baht (THB) (~ US$ 1.7) per patient. Additional costs included setting up certified training courses. The cost for image interpretation and other related practice expenses using DL was set at the price of 32 THB (~ US$ 1) per patient, a pricing estimate comparable with HG interpretation. We probed this price per patient further in threshold analyses.

Total annual cost related to screening at primary health centers using DL, HG, and confirmation of referrals at tertiary health centers by retinal specialists were 375 (~ US$ 11.7), 399 (~ US$ 12.5), and 535 (~ US$ 16.7) THB, respectively. The details of these costs, including the cost of outpatient service, fundus photography, and visual acuity measurement, are given in Fig. 2.

Fig. 2
figure 2

Summary of details of screening and treatment costs. BB bilateral blindness, DL deep learning, DR diabetic retinopathy, DME diabetic macular edema, HG trained human graders, STDR sight-threatening diabetic retinopathy, THB Thai baht. Unit cost of treatment of DME includes cost of bevacizumab and intravitreal administration listed in Table 1. Unit cost of treatment of STDR includes cost of laser photocoagulation and vitrectomy (not shown in this figure but shown in Table 1). Treatment for DME includes cost of outpatient service, bevacizumab, intravitreal injection, and macular imaging by optical coherence tomography. Treatment for STDR in the first year covers the cost of outpatient service and cost of laser photocoagulation, or cost of vitrectomy and inpatient service

Cost of treatment for STDR without DME included the cost of pan-retinal photocoagulation (PRP), at the average of two sessions, at 7000 THB per patient. For PDR, the cost was the weighted average between a proportion of patients (two-third) treated with PRP and another proportion (one-third) treated with surgery [31], which was vitrectomy, at an average of 30,000 THB per patient. Patients with DME were treated with 17 doses of bevacizumab, the number of intravitreal injections during the first 5 years of treatment derived from clinical trials [22]. Total cost of treatment for DME was 59,503 THB in the first year, 11,388 THB in the second year, and 619 THB for subsequent years. The cost of bevacizumab was used since this medication is listed on Thailand’s National Essential List of Medicines and used as the first-line treatment for DME in the country.

Direct non-medical costs comprised costs of food, transportation, accommodation, equipment and facilities for patients, and productivity loss of caregivers, which was calculated based on gross national income per capita, assuming caregiving time at 4 h per day. These costs per patient are the same for both arms.

Indirect costs were omitted to avoid double counting in the CUA according to the recommendation from the Thai Health Technology Assessment guideline. All incurred costs were converted to 2020 values using the consumer price index for Thailand and were converted to US$ using the exchange rate as of 1 July 2021 of 32.02 THB per US$. The cost of blindness is comprised primarily of visual rehabilitation aids and services, including residential and community care [32]. In Thailand, visual rehabilitation clinics are operated only in a few training centers with low patient volumes [33]. Therefore, this cost is minimal compared to the treatment costs and is not considered in the health provider perspective. Disaggregated costs from both societal and provider perspectives are shown in Fig. 3.

Fig. 3
figure 3

Total and disaggregated costs for the two screening strategies (HG and DL). The costs are for the cost-utility analysis at base case from both societal and provider perspectives. BB bilateral blindness, DL deep learning, DME diabetic macular edema, HG trained human graders, STDR sight-threatening diabetic retinopathy. Costs of treatment of DME and STDR without DME are presented separately. We assumed no direct medical costs for bilateral blindness; all the values are Thai baht in 2020

Health Utilities

Since the health utility weights for patients with DR in Thailand or Asia are not available, we applied those from the next closest match, i.e., those associated with patients with DR at a primary care service in Brazil, another MIC [34]. In that study, the utility values were estimated using the Brazilian EuroQol five dimension (EQ-5D) tariffs from patients without DR, with non-STDR, STDR, and bilateral blindness. Since DME was not considered in that model, we assumed that those who progressed into STDR with and without DME had the same utility values.

All these input parameters are shown in detail in Table 1 and Fig. 2.

Result Presentation

Total cost, life-years (LY), quality-adjusted life-years (QALY), and disaggregated costs for each screening strategy were reported. The incremental cost-effectiveness ratios (ICERs) per QALY gained in THB for DL and HG were calculated by dividing the incremental cost by incremental QALY. The Thai societal willingness-to-pay (WTP) threshold was 160,000 THB (~ US$ 4997) per QALY gained [24].

Uncertainty Analyses

Parameter Uncertainty

To characterize the uncertainties of each input parameter, a one-way sensitivity analysis was performed and the percentage changes in the incremental net monetary benefit (iNMB) from the base case analysis were presented using a tornado diagram. Net monetary benefit of each screening method was calculated by multiplying the incremental QALY by the Thai societal WTP threshold at 160,000 THB/QALY and then subtracting the incremental cost. Parameter values were varied over the 95% confidence interval (CI) for the general parameters. The discount rates for costs and outcomes were varied from 0 to 6%.

The robustness of the results was evaluated by probabilistic sensitivity analyses (PSAs). All potential influencing parameters were varied simultaneously. The Monte Carlo simulation was run for 1000 iterations, and the results of PSA were shown as cost-effectiveness acceptability curves (CEACs) and a cost-effectiveness plane (CEP).

Scenario Analysis

We conducted a threshold analysis varying the unit costs of DR screenings by both strategies at the Thai societal WTP. We also analyzed the effect of varying compliance to referrals for treatment in terms of iNMB between both strategies when the unit costs were fixed at base case. All uncertainty analyses were performed from a societal perspective over a lifetime horizon of patients.

Results

From both perspectives, we found equal effectiveness in LY for HG and DL at 18.53, whereas QALYs were 12.857 and 12.862, respectively. From a societal perspective, DL was found to have a slightly lower incremental cost at 87 THB (~ US$ 2.6) with an iNMB of 771.5 THB (~ US$ 24.1) and being a dominant option compared to HG. From a provider perspective, DL was found to have higher incremental cost at 2195 THB (~ US$ 66.7) and the ICER was 512,955 THB (~ US$ 16,020) per QALY gained (Table 2). The CEP is shown in Fig. 4.

Table 2 Total costs, quality-adjusted life-years, and incremental cost-effectiveness ratios between the screening modalities from the base case analysis
Fig. 4
figure 4

Cost-effectiveness plane (CEP). DL deep learning, HG human graders, ICER incremental cost-effectiveness ratio, QALY quality-adjusted life-year, THB Thai baht

The reason for the difference between ICERs from a societal and provider perspective is presented in Fig. 3. The provider perspective did not incorporate the non-medical cost of bilateral blindness, which represents a substantial portion of lifetime costs (68.25% and 69.82% for DL and HG respectively from a societal perspective).

For parameter uncertainty, the tornado diagram (Fig. 5) demonstrates that the relative risk of DR progression in patients who received treatments was the most influential factor related to the change in iNMB. The PSA conducted from a societal perspective reinforces the robustness of the base case results. The CEAC (Fig. 6) illustrates that DL had the higher probability of being cost-effective at 84.6%, compared to HG at 15.4%, at the Thai WTP threshold.

Fig. 5
figure 5

One-way sensitivity analysis. Tornado diagram from one-way sensitivity analysis showing the percentage changes in incremental net monetary benefit (iNMB) of DL screening versus HG screening from the base case attributable to the change of each parameter. ATB antibiotics, BB bilateral blindness, CVD cardiovascular diseases, DL deep learning, DM diabetic mellitus, DME diabetic macular edema, DR diabetic retinopathy, HG human grader, IVB intravitreal bevacizumab, STDR sight-threatening diabetic retinopathy, THB Thai baht. Labels on the chart (next to bars) indicate input values (minimum and maximum) of each parameter; all the values of costs are Thai baht in 2020

Fig. 6
figure 6

Cost-effectiveness acceptability curves. The curves compare the probabilities of being cost-effective at different willingness to pay of screening using HG and DL in the base case scenario; all the values are Thai baht in 2020. DL deep learning, HG human graders, QALY quality-adjusted life-year, THB Thai baht

Concerning the unit costs, in the base case, when the unit cost of image interpretation by HG was fixed at 56.11 THB (~ US$ 1.7) per person, DL remained cost-effective from the societal perspective for its unit costs of image interpretation up to 150.5 THB (~ US$ 4) per person at the Thai societal WTP threshold. We found that screening by DL could remain cost-effective even if compliance with referrals for treatment were lower than that of HG. The threshold of this compliance for treatment of screening by DL was 43.57%. This means screening by DL remained cost-effective when at least 44% of referrals detected by DL received treatment compared to 60% compliance at base case for screening by HG. If compliance to referrals for treatment is even higher, screening using either strategy could achieve greater cost-savings with higher compliance to referrals for treatment as shown in Fig. 7.

Fig. 7
figure 7

Additional sensitivity analyses. For figure (A), the net monetary benefit (NMB) of each screening modality was calculated from total QALY of the intervention multiplied by the Thai willingness-to-pay threshold (160,000 THB per QALY) and then subtracting the total cost of the intervention. Incremental NMB (iNMB) was the difference between NMB of screening using DL and HG. Values in the table are iNMB of screening using DL and HG. Green cells represent the scenarios when the NMB of DL is greater than the NMB of HG (DL is preferred to HG for the screening). The uptake and compliance rates of HG screening were fixed as base case at 50% and 60%, respectively. Figure (B) shows iNMB between DL and HG screenings increases from zero when the compliance rate of DL screening is more than 44%

Discussion

In this DR screening program, we found DL had lower incremental costs compared to HG from a societal perspective. From a provider perspective, DL could detect more referral cases and therefore result in greater costs incurred from treatments. However, when the societal cost incurred from bilateral blindness, which imposes both humanistic and economic burden [32], was included, this model showed DL as a cost-saving screening tool. The saving of this societal cost, which mainly covers the costs of visual rehabilitation service by caregivers and community care, is hidden from the provider perspective.

In settings like Thailand, where a well-established screening program and allocation of hardware equipment already exist, we anticipate that costs for adding DL software into the screening workflow is minimal. Since the labor cost is low and patients with STDR usually comprise the minority of patients screened, the additional labor of retinal specialists to confirm the increased numbers of more positive cases of STDR detected by DL should incur minimal cost.

While the cost of AI for DR screening has not yet been standardized, Scotland et al. proposed that the cost should include costs of initial implementation and maintenance with a lifespan of 10 years. The authors estimated that the server-based AI software used in Scotland under universal healthcare should be charged at 0.14 pound (6.44 THB or ~ US$ 0.2) per patient annually in 2006 [35]. In our study, we set the price charge of DL, also server-based and used under universal health coverage, at US$ 1 per patient annually. In a recently published CUA conducted in rural China, which found that AI-based DR screening was more cost-effective compared with ophthalmologist screening or no screening, the AI was priced similarly at US$ 1.447 per patient, based on market quotation of the supplier [36].

Our CUA supports the lower incremental costs of DL and extend the body of evidence in a cost-minimization analysis [37] and cost-effectiveness analyses [35, 38], which evaluated cost-savings over a year of screening. Our finding that greater patient compliance to referrals can potentially augment cost-effectiveness for DL is a motivation to explore additional strategies for boosting this compliance, such as digital notifications sent to patients. Ideally, a system for monitoring compliance to referrals should be built into a screening program to measure its effectiveness. This compliance may be increased by the capability of DL to provide screening results immediately at the point of care [28]. Liu et al. found that DL could increase this compliance from 18.7% in the historical cohort to 55.4% [27]. Fuller et al. also found in a 5-year CUA that, with 54.9% compliance to referrals from pre-screening by DL compared to 11.5% compliance without pre-screening, DL could reduce costs by 23.3% [8].

A strength of our study is that our model evaluated cost-savings over a lifetime horizon of patients. Moreover, it was based on a nationwide screening program with data representing all health regions. Another strength is the inclusion of the cost of treatment of DME, which has rarely been considered in previously published economic evaluations of DL for DR screening. This inclusion is pivotal because the treatment of DME was the primary cost burden of treatment of STDR in our model. In a US study, the cost of treatment of DME using ranibizumab, which is ten times more expensive than bevacizumab, was included in this 5-year CUA comparing pre-screening of DR using DL versus no pre-screening (refer all patients) in low-income patients. Even with this tremendous cost increase, DL for the pre-screening of DR was still more cost-saving than no pre-screening [8]. Another strength is our economic evaluation followed the Consolidated Health Economic Evaluation Standards.

There were some limitations to this study. Certain data are not available for inclusion in the analysis, such as national data on referral compliance and the health utility weights of patients with DR. Second, while the use of the Markov model for capturing changes in clinical health states and the use of health state utility values are traditionally applied for economic evaluations, these applications may have some limitations. Different countries might utilize different classifications of the clinical states of DR. The utility values of each state might also be varied among countries [39]. Therefore, the utility values derived from the Brazil study may not exactly reflect the utilities of DR in Thailand.

Another limitation is that this model did not consider the possibility that DL may flag more ungradable patients than HG in real-world scenarios [40]. The patients with negative screening results (non-referrals) were not confirmed by experts; therefore, false negatives were not fully accounted for. DR screening programs in some countries implement a system for random sampling and review of 10% of negative screening results [2, 38]. According to our previous analysis [11], the proportion of false negatives by either DL or HG was a small fraction, and they would be identified in the next screening.

Conclusion

In conclusion, this CUA demonstrates that adoption of DL for a DR screening program in an MIC may result in lower societal costs over a lifetime of patients, providing economic rationale to expand DL-based DR screening in other MICs with similar prevalence of diabetes, clinical specialist shortages, and low compliance to referrals for treatment. However, policy makers should be aware of budget impact of treating more patients with STDR with clinical deployment of DL. DR screening using DL or HG could achieve greater cost-savings with higher compliance to referrals for treatment. This compliance rate should be monitored as part of DR screening programs.