Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation

Hee-Jin Lee; Yaoyun Zhang; Kirk Roberts; Hua Xu

Leveraging existing corpora for de-identification of psychiatric notes using domain adaptation

AMIA Annu Symp Proc. 2018 Apr 16:2017:1070-1079. eCollection 2017.

Authors

Hee-Jin Lee¹, Yaoyun Zhang¹, Kirk Roberts¹, Hua Xu¹

Affiliation

¹ University of Texas Health Science Center at Houston, Houston, TX.

PMID: 29854175
PMCID: PMC5977650

Abstract

De-identification of clinical notes is a special case of named entity recognition. Supervised machine-learning (ML) algorithms have achieved promising results for this task. However, ML-based de-identification systems often require annotating a large number of clinical notes of interest, which is costly. Domain adaptation (DA) is a technology that enables learning from annotated datasets from different sources, thereby reducing annotation cost required for ML training in the target domain. In this study, we investigate the use of DA methods for deidentification of psychiatric notes. Three state-of-the-art DA methods: instance pruning, instance weighting, and feature augmentation are applied to three source corpora of annotated hospital discharge summaries, outpatient notes, and a mixture of different note types written for diabetic patients. Our results show that DA can increase deidentification performance over the baselines, indicating that it can effectively reduce annotation cost for the target psychiatric notes. Feature augmentation is shown to increase performance the most among the three DA methods. Performance variation among the different types of clinical notes is also observed, showing that a mixture of different types of notes brings the biggest increase in performance.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Data Anonymization*
Datasets as Topic
Diabetes Mellitus
Electronic Health Records*
Humans
Machine Learning*
Natural Language Processing
Outpatients
Patient Discharge Summaries
Psychiatry*

Abstract

Publication types

MeSH terms

Grants and funding