Table 1 Dataset summary for slides and cases used in model development and evaluation.

From: Determining breast cancer biomarker status and associated morphological features using deep learning

 

Tertiary teaching hospital

Medical laboratory

Tertiary teaching hospital

Tertiary

teaching hospital

TCGA(36 sites)

DLS Stage 1 (patch-level): uses paired H&E and IHC slides from custom sectioning protocol

  

NA

No. of cases(train / tune / test)

70 / 30/ 64

70 / 30/ 0

No. of H&E slides(Train / tune / test)

205 / 80/ 181

206 / 85/ 0

No. of patches

See Table 2

DLS Stage 2 (slide-level): uses biomarker status from the original report

 

Train

Tune

Test

No. of cases

164

100

164**

340

909

No. of H&E slides

466

291

1,377

2,313

961

ER status*

(pos / neg)

103 / 47

91 / 7

103 / 47

280 / 58

679 / 191

PR status*

(pos / neg)

93 / 56

85 / 13

93 / 56

251 / 87

573 / 283

Her2 status*

(pos / neg)

16 / 34

4 / 81

16 / 34

11 / 78

131 / 739

Nottingham grade

1 / 2 / 3

46 / 69/ 49

44 / 36/ 20

46 / 69/ 49

135 / 126/ 79

387 / 295 / 227

  1. *The total case counts for each biomarker are different based on availability of biomarker status in original pathology reports.
  2. **Stage 2 tune set includes the same cases as stage 2 train set from Tertiary Teaching Hospital dataset but different tumor-containing slides from those cases.