Table 1 Summary of data used in large-scale quantitative study

From: Similar image search for histopathology: SMILY

Dataset

Organ site(s)

Categories assessed

Number of slides in the database

Number of patches in the database

Number of slides in the query set

Number of patches in the query set

Organ-specific

Prostate

9 histologic features

15

45,000 (5,000 per feature)

5

9,000 (1000 per feature)

Multi-organ

Prostate, breast, colon

10 histologic features

45

87,000 (3,000 per feature/organa)

15

14,500 (500 per feature/organa)

Gleason gradingb

Prostate

Non-tumor and Gleason Patterns 3,4,5 (NT, GP3, GP4, GP5)

20

40,000 (10,000 in each category)

5

8,000 (2,000 in each category)

  1. To avoid biases in the evaluation, we randomly subsampled the original annotated regions, resulting in 5,000 patches per histologic feature per organ
  2. aIn our study, no lymphocytes were found upon non-exhaustive review of the prostate specimens, so the number of patches exclude this
  3. bNot every patch in this dataset was concurrently labeled with histologic features, so 4,000 database patches and 1,600 query patches with both types of annotations were used for assessing the simultaneous match of both Gleason pattern and histologic feature