New methods to protect privacy when using patient health data to compare treatments

CONCLUSIONS: Our developed algorithms and software tool kit provide a new and more rigorous methodology that complements current deidentification and policy-based data registry practices. pSHARE can empower patients by providing rigorous and transparent privacy controls while contributing their data...

Full description

Bibliographic Details
Main Authors: Xiong, Li, Post, Andrew (Author), Jiang, Xiaoqian (Author), Ohno-Machado, Lucila (Author)
Corporate Author: Patient-Centered Outcomes Research Institute (U.S.)
Format: eBook
Language:English
Published: Washington, DC Patient-Centered Outcomes Research Institute 2021, [2021]
Series:Final research report
Online Access:
Collection: National Center for Biotechnology Information - Collection details see MPG.ReNa
Description
Summary:CONCLUSIONS: Our developed algorithms and software tool kit provide a new and more rigorous methodology that complements current deidentification and policy-based data registry practices. pSHARE can empower patients by providing rigorous and transparent privacy controls while contributing their data to PCOR. LIMITATIONS: The developed methodologies present inherent trade-offs between data privacy and data utility. Our customized pSHARE approach can be used to develop data registries optimized for specific PCOR studies that simultaneously guarantee patient privacy and empirical data utility (eg, by preserving longitudinal patterns), but this approach may not be versatile enough to support arbitrary types of PCOR studies (eg, those that require cross-sectional patterns).
In addition, we focused on developing methods for building data registries that respect the fine-grained personalized privacy preferences of patients rather than simple binary opt-in/opt-out preferences. We included patient and stakeholder engagement panels to ensure that the resulting pSHARE methodology was driven by patient perspectives. We also constructed data registries at Emory University and the University of California, San Diego (UCSD) using data extracted from clinical data warehouses to study the inherent trade-offs between privacy protection and utility of the data in PCOR studies. RESULTS: Project outcomes included (1) a suite of novel algorithms and techniques and a software tool kit for building data registries that rigorously protect patient privacy preferences; and (2) an evaluation of pSHARE using both publicly available data and data extracted from Emory and UCSD clinical data warehouses with insights on the trade-offs between data utility and patient privacy.
The project had 3 specific aims: (1) develop methods for establishing registries of private data, (2) develop methods for establishing registries that contain both private and consented data, and (3) develop methods for evaluating and tracking patient privacy risks and establishing data registries that take into account fine-grained patient privacy preferences. METHODS: The main challenge in designing DP methods is how to minimize the amount of noise added to the data so that data utility is preserved but a given DP constraint is not compromised. Our approach was patient centered, data driven, and research driven. To preserve data utility, we addressed the high dimensionality and high correlation of the data used in typical PCOR studies by explicitly modeling the cross-dimensional and temporal correlations.
BACKGROUND: Sharing and reusing clinical data is key to enabling patient-centered outcomes research (PCOR). Data registries established for conducting PCOR must ensure appropriate privacy and confidentiality protections as stated by the PCORI Methodology Committee. There is rising concern that current deidentification or "anonymization" practices insufficiently protect against reidentification and disclosure of private patient data. OBJECTIVES: The objective of this project was to develop a framework, which we named patient-centered Statistical Health informAtion RElease (pSHARE), for building patient-centered and privacy-preserving statistical data registries for PCOR using the rigorous differential privacy (DP) framework, which gives a provable guarantee on the privacy of patients who provide data. The main goal was to optimize the trade-off between data utility (ie, minimal noise) and data privacy (ie, DP constraints satisfied) in the data registry.
The project described here includes a patient engagement plan that involved a series of stakeholder panels from which we gained a preliminary understanding of patient attitudes toward sharing data with researchers as well as patient privacy preferences. Large-scale studies, such as patient surveys, are needed to provide a broader and deeper understanding of patient privacy preferences and attitudes toward adoption of the developed methodology
Physical Description:1 PDF file (63 pages) illustrations