Google Open Source Blog: healthcare

Overview

Digital pathology is changing the way pathology is practiced by making it easier to share images, collaborate with colleagues, and develop new AI algorithms that can improve the quality and cost of medical care. One of the biggest challenges of digital pathology is storing and managing the large volume of data generated. The Google Cloud Healthcare API provides a solution for this with a managed DICOM store, which is a secure, scalable, and performant way to store digital pathology images in a manner that is both standardized and interoperable.

However, performing image retrieval of specific patches (i.e. regions of interest) of a whole slide image (WSI) from the managed DICOM store using DICOMweb can be complex and requires DICOM format expertise. To address this, we are open sourcing EZ WSI (Whole Slide Image) DICOMWeb, a Python library that makes fetching these patches both efficient and easy-to-use.

How EZ WSI DICOMWeb works

EZ WSI DICOMweb facilitates the retrieval of arbitrary and sequential patches of a DICOM WSI from a DICOMWeb compliant Google Cloud Healthcare API DICOM store. Unlike downloading the entire DICOM series WSI and extracting patches locally from that file, which can increase network traffic, latency and storage space usage, EZ WSI DICOMweb retrieves only the necessary tiles for the desired patch directly through the DICOMweb APIs. This is simpler to use and abstracts away the following:

The need to fetch many tiles, which requires an understanding of DICOM data structure (e.g. offset & data hierarchy).
The need for a detailed understanding of the DICOMWeb APIs, REST payloads, and authentication, as well as addressing the possibility of redundant requests if several patches are fetched and there are overlapping tiles.
The need to decode images on the server if client side decoding is not supported, which increases the time it takes to transfer data and the size of the data being transferred.

EZ WSI DICOMWeb allows researchers and developers to focus on their ML tasks rather than the intricacies of DICOM. Developers do not need to have an in-depth understanding of DICOM data structuring or the DICOM API. The library provides a simple and intuitive functionality that allows developers to efficiently fetch DICOM images using only the Google Cloud Platform (GCP) Resource Name and DICOM Series path without any pixel recompression.

Case Study: Generating Patches for AI Workflows

A typical pathology WSI could be on the order of 40,000 pixels in length or width. However, an AI model that is trained to assess that WSI may only analyze a patch that is 512 x 512 pixels at a time. The way the model can operate over the entire WSI is by using a sliding windows approach. We demonstrate how that can be done using EZ WSI DICOMWeb.

First, we create a DicomSlide object using the DICOMweb client and interface. This can be done with just a few lines of code.

dicom_web_client = dicom_web.DicomWebClientImpl()
dwi = dicom_web_interface.DicomWebInterface(dicom_web_client)
ds = dicom_slide.DicomSlide(
dwi=dwi,
path=gcp_resource_name+dicom_series_path,
enable_client_slide_frame_decompression = True
)
ds.get_image(desired_magnification) # e.g. '0.625X'

This DicomSlide represents the entire WSI, as illustrated below.

Image of a WSI at the magnitude of 0.625X rendered by matplotlib

The above image leverages EZ WSI’s DicomSlide module to fetch an entire WSI at the requested magnification of 0.625X and uses matplotlib to render it, see the sample code for more details.

By providing coordinates, DicomSlide’s get_patch() method allows us to manually extract just the two sections of tissue at supported magnification with coordinates as pictured below.

tissue_patch = ds.get_patch(
desired_magnification, x=x_origin, y=y_origin, width=patch_width, height=patch_ height
)

Left tissue sample and right tissue sample at 0.625X magnitude, rendered by matplotlib

We can effectively zoom in on patches programmatically by reducing the window size and increasing the magnification using the same get patch method from above.

image of three panels showing the same interesting patch at 0.625, 2.5X, and 40X magnitude, rendered by matplotlib

Our ultimate goal is to generate a set of patches that can be used in a downstream AI application from this WSI.

image showing patch generation at 10X with 0.625X mask, rendered by matplotlib

To do this, we call PatchGenerator. It works by sliding a window of a specified size with a specified stride size across the image, heuristically ignoring tissue-less regions at a specified magnification level.

patch_gen = patch_generator.PatchGenerator(
slide=ds,
stride_size=stride_size, # the number of pixels between patches 
patch_size=patch_size, # the length and width of the patch in pixels
magnification=patch_magnification, # magnification to generate patches at
max_luminance=0.8, # defaults to .8, heuristic to evaluate where tissue is.
tissue_mask_magnification=mask_magnification,
)

The result is a list of patches that can be used as input into a machine learning algorithm.

image showing patch generation at 40X with 0.625X mask, rendered by matplotlib

Conclusion

We have built this library to make it easy to directly interact with DICOM WSIs that are stored in Google's DICOMWeb compliant Healthcare API DICOM store and extract image patches for AI workflows. Our hope is that by making this available, we can help accelerate the development of cutting edge AI for digital pathology in Google Cloud and beyond.

Links: Github, GCP-DICOMWeb

By Google HealthAI and Google Cloud Healthcare teams

The Healthcare Text Annotation Guidelines are blueprints for capturing a structured representation of the medical knowledge stored in digital text. In order to automatically map the textual insights to structured knowledge, the annotations generated using these guidelines are fed into a machine learning algorithm that learns to systematically extract the medical knowledge in the text. We’re pleased to release to the public the Healthcare Text Annotation Guidelines as a standard.

Google Cloud recently launched AutoML Entity Extraction for Healthcare, a low-code tool used to build information extraction models for healthcare applications. There remains a significant execution roadblock on AutoML DIY initiatives caused by the complexity of translating the human cognitive process into machine-readable instructions. Today, this translation occurs thanks to human annotators who annotate text for relevant insights. Yet, training human annotators is a complex endeavor which requires knowledge across fields like linguistics and neuroscience, as well as a good understanding of the business domain. With AutoML, Google wanted to democratize who can build AI. The Healthcare Text Annotation Guidelines are a starting point for annotation projects deployed for healthcare applications.

The guidelines provide a reference for training annotators in addition to explicit blueprints for several healthcare annotation tasks. The annotation guidelines cover the following:

The task of medical entity extraction with examples from medical entity types like medications, procedures, and body vitals.
Additional tasks with defined examples, such as entity relation annotation and entity attribute annotation. For instance, the guidelines specify how to relate a medical procedure entity to the source medical condition entity, or how to capture the attributes of a medication entity like dosage, frequency, and route of administration.
Guidance for annotating an entity’s contextual information like temporal assessment (e.g., current, family history, clinical history), certainty assessment (e.g., unlikely, somewhat likely, likely), and subject (e.g., patient, family member, other).

Google consulted with industry experts and academic institutions in the process of assembling the Healthcare Text Annotation Guidelines. We took inspiration from other open source and research projects like i2b2 and added context to the guidelines to support information extraction needs for industry-applications like Healthcare Effectiveness Data and Information Set (HEDIS) quality reporting. The data types contained in the Healthcare Text Annotation Guidelines are a common denominator across information extraction applications. Each industry application can have additional information extraction needs that are not captured in the current version of the guidelines. We chose to open source this asset so the community can tailor this project to their needs.

We’re thrilled to open source this project. We hope the community will contribute to the refinement and expansion of the Healthcare Text Annotation Guidelines, so they mirror the ever-evolving nature of healthcare.

By Andreea Bodnari, Product Manager and Mikhail Begun, Program Manager—Google Cloud AI

opensource.google.com