Comparing Areas under Receiver Operating Characteristic Curves: Potential Impact of the “Last” Experimentally Measured Operating Point

A specific issue related to the selection of the analytic tool used when comparing the estimated performance of systems within the receiver operating characteristic (ROC) paradigm is reviewed. This issue is related to the possible effect of the last experimentally ascertained ROC data point in terms of highest true-positive and false-positive fractions. An example of a case is presented where the selection of a specific analysis approach could affect the study conclusion from being not statistically significant for parametric analysis and significant for nonparametric analysis. This is followed by recommendations that should help prevent misinterpretation of results.

© RSNA, 2008

References

  • 1 PisanoED, Gatsonis C, Hendrick E, et al. Diagnostic performance of digital versus film mammography for breast-cancer screening. N Engl J Med2005;353(17):1773–1783.
  • 2 FentonJJ, Taplin SH, Carney PA, et al. Influence of computer-aided detection on performance of screening mammography. N Engl J Med2007;356(14):1399–1409.
  • 3 GurD, Rockette HE, Armfield DR, et al. Prevalence effect in a laboratory environment. Radiology2003;228(1):10–14.
  • 4 SkaaneP, Balleyguier C, Diekmann F, et al. Breast lesion detection and classification: comparison of screen-film mammography and full-field digital mammography with soft-copy reading—observer performance study. Radiology2005;237(1):37–44.
  • 5 ShiraishiJ, Abe H, Li F, Engelmann R, MacMahon H, Doi K. Computer-aided diagnosis for the detection and classification of lung cancers on chest radiographs ROC analysis of radiologists' performance. Acad Radiol2006;13(8):995–1003.
  • 6 McClishDK. Analyzing a portion of the ROC curve. Med Decis Making1989;9(3):190–195.
  • 7 JiangY, Metz CE, Nishikawa RM. A receiver operating characteristic partial area index for highly sensitive diagnostic tests. Radiology1996;201(3):745–750.
  • 8 MetzCE. Some practical issues of experimental design and data analysis in radiological ROC studies. Invest Radiol1989;24(3):234–245.
  • 9 ObuchowskiNA, Rockette HE. Hypothesis testing of the diagnostic accuracy for multiple diagnostic tests: ANOVA approach with dependent observations. Communications in Statistics—Simulation and Computation1995;24:285–308.
  • 10 DorfmanDD, Berbaum KS, Metz CE. Receiver operating characteristic rating analysis: generalization to the population of readers and patients with the jackknife method. Invest Radiol1992;27(9):723–731.
  • 11 ToledanoAY. Three methods for analysing correlated ROC curves: a comparison in real data sets from multi-reader, multi-case studies with a factorial design. Stat Med2003;22(18):2919–2933.
  • 12 ObuchowskiNA, Beiden SV, Berbaum KS, et al. Multireader, multicase receiver operating characteristic analysis: an empirical comparison of five methods. Acad Radiol2004;11(9):980–995.
  • 13 BeidenSV, Wagner RF, Campbell G. Components-of-variance models and multiple-bootstrap experiments: an alternative method for random-effects, receiver operating characteristic analysis. Acad Radiol2000;7(5):341–349.
  • 14 BandosAI, Rockette HE, Gur D. A permutation test sensitive to differences in areas for comparing ROC curves from a paired design. Stat Med2005;24(18):2873–2893.
  • 15 DeLongER, DeLong DM, Clarke-Pearson DL. Comparing the area under two or more correlated receiver operating characteristic curves: a nonparametric approach. Biometrics1988;44(3):837–845.
  • 16 CampbellG, Douglas MA, Bailey JJ. Nonparametric comparison of two tests of cardiac function on the same patient population using the entire ROC curve. In: Computers in Cardiology 1988 Conference proceedings. New York, NY: IEEE,1988; 267–270.
  • 17 WieandHS, Gail MM, Hanley JA. A nonparametric procedure for comparing diagnostic tests with paired or unpaired data. Institute of Mathematical Statistics Bulletin1983;12:213–214.
  • 18 PesceLL, Metz CE. Reliable and computationally efficient maximum-likelihood estimation of “proper” binormal ROC curves. Acad Radiol 2007;14(7):814–829.
  • 19 DorfmanDD, Berbaum KS, Brandser EA. A contaminated binormal model for ROC data. I. Some interesting examples of binormal degeneracy. Acad Radiol2000;7(6):420–426.
  • 20 GlueckDH, Lamb MM, Lewin JM, Pisano ED. Two-modality mammography may confer an advantage over either full-field digital mammography or screen-film mammography. Acad Radiol2007;14(6):670–676.
  • 21 WagnerRF, Metz CE, Campbell G. Assessment of medical imaging systems and computer aids: a tutorial review. Acad Radiol2007;14(6):723–748.
  • 22 WeiJ, Hadjiiski LM, Sahiner B, et al. Computer-aided detection systems for breast masses: comparison of performances on full-field digital mammograms and digitized screen-film mammograms. Acad Radiol2007;14(6):659–669.
  • 23 RanaRS, Jiang Y, Schmidt RA, Nishikawa RM, Liu B. Independent evaluation of computer classification of malignant and benign calcifications in full-field digital mammograms. Acad Radiol2007;14(3):363–370.
  • 24 WalshSJ. Limitations to the robustness of binormal ROC curves: effects of model misspecification and location of decision threshold on bias, precision, size and power. Stat Med1997;16(6):669–679.
  • 25 GurD, Rockette HE, Bandos AI. “Binary” and “non-binary” detection tasks: are current performance measures optimal? Acad Radiol 2007;14(7):871–876.

Article History

Published in print: 2008